Tuesday, April 29, 2008

xml processing using sed - post1


$ cat studentinfo.xml
<?xml version="1.0" encoding="UTF-8"?>
<StudentInfo Version="1">
<Student>
<StudentId>SID469</StudentId>
<ClassId>21</ClassId>
<Location>AA</Location>
</Student>
<Student>
<StudentId>CSI150</StudentId>
<ClassId>71</ClassId>
<Location>AX</Location>
</Student>
<Student>
<StudentId>PIA687</StudentId>
<ClassId>1</ClassId>
<Location>AP</Location>
</Student>
...
...
</StudentInfo>

Purpose:
--------------
Make all ClassId's to "2"

Solution:
--------------
$ sed -e 's/\(ClassId\)\([0-9]*\).*\(\/ClassId\)/\1>'2'<\3/' studentinfo.xml > studentinfo.xml.bak

Now:
-----------
$ cat studentinfo.xml.bak
<?xml version="1.0" encoding="UTF-8"?>
<StudentInfo Version="1">
<Student>
<StudentId>SID469</StudentId>
<ClassId>2</ClassId>
<Location>AA</Location>
</Student>
<Student>
<StudentId>CSI150</StudentId>
<ClassId>2</ClassId>
<Location>AX</Location>
</Student>
<Student>
<StudentId>PIA687</StudentId>
<ClassId>2</ClassId>
<Location>AP</Location>
</Student>
...
...
</StudentInfo>

Wednesday, April 16, 2008

Find Access,Modify,Change Time,Inode of a file - stat command

Inode number of a file can be found from ls output itself.

$ ls -i

e.g.

$ ls -i ch.sh
6127772 ch.sh

All statistics like Access Time,Modify Time,Change Time,Inode number etc are given by stat command:

$ stat ch.sh
File: `ch.sh'
Size: 182 Blocks: 8 IO Block: 4096 regular file
Device: 803h/2051d Inode: 6127772 Links: 1
Access: (0644/-rw-r--r--) Uid: (27357/ jsaikia) Gid: ( 600/ staff)
Access: 2008-04-16 11:26:37.800868721 +0530
Modify: 2008-04-15 12:36:19.342458624 +0530
Change: 2008-04-15 12:36:19.344458539 +0530


Difference between Modify Time and Change Time:
---------------------------------------------------

You might be wondering, whats the difference between Change Time and Modify Time of a file as shown by stat.

Change Time: If you change permission,ownership, rename the file etc, this time gets affected.

Modify Time: If you change the contents, Modify time will change (This has a affect to change Access,Change Time as well)

Saturday, April 12, 2008

Min,max and average using awk

The report.txt contains the working hours details of employee C1 in a particular week.

$ cat report.txt
Office1:Cl:8:Mon
Office1:Cl:1:Tue
Office2:Cl:5:Wed
Office1:Cl:3:Thu
Office3:Cl:6:fri

Required Output:
------------------
minimum hours:1(Tue)
maximum hours:8(Mon)
average:4.6

Here is the code:
----------------
$ awk 'BEGIN {FS=":"}
min=="" {
min=max=$3 ; minday=maxday=$4
}
{
if ($3 > max) {max = $3; maxday = $4};
if ($3 < min) {min = $3; minday = $4};
total += $3
count += 1
}
END {
print "minimum hours:" min"("minday")";
print "maximum hours:" max"("maxday")";
print "average:" total/count;
}
' report.txt

Print last field - grep

$ cat rank.txt
Bash:P:Tue:7
NW:F:Mon:4
DB:P:Tue:8
SE:P:Mon:8

Print the last field. i.e.

Required Output:
---------------
7
4
8
8

Ways:
------
The normal awk way of doing:
$ awk 'BEGIN {FS=":"} {print $NF}' rank.txt

Using sed:
$ sed -n 's/.*://;p' rank.txt

Using grep:
$ grep -o '[^:]*$' rank.txt

Combine every 3 lines as one - Awk

$ cat details.txt
AA XX
ID23145
Singapore,1982
BB YZY
SD1243
Delhi(India),1980
DD AS
ASD324
Dubai,1981

Purpose:
-------
Combine every 3 lines as one i.e.

Required Output
----------------
AA XX,ID23145,Singapore,1982
BB YZY,SD1243,Delhi(India),1980
DD AS,ASD324,Dubai,1981


If line number is divisible by 3 then put a new line(\n) else put a comma(,) i.e.

$ awk '{printf("%s%s", $0, (NR%3 ? "," : "\n"))}' details.txt

Join multiple lines using AWK

Some lines in output.txt are broken.

$ cat output.txt
a:b:c:1:2:3:2.3:henry
s:d:f:2:1:4:
54:user5
d:q:w:5:6:
3:5.2:alex
y:m:n:3:4:1:5.6:eiam

Output Required:
----------------
a:b:c:1:2:3:2.3:henry
s:d:f:2:1:4:54:user5
d:q:w:5:6:3:5.2:alex
y:m:n:3:4:1:5.6:eiam


$ awk '
BEGIN {
FS=":";
maxFLD=8;
}
{
while (NF < maxFLD || $0 ~ /\:$/ ) {
getline record;
$0 = $0 record
}
print $0
}
' output.txt


** While current record ($0) is less than 8 fields (NF) or is incomplete (ending with :), update the current record by appending the next record.

Print a column using sed

$ cat emails.out
user1@hotmail.com
user2@gmail.com
user3@yahoo.com
user4@rediffmail.com
user5@aolmail.com

Required Output:
-----------------

hotmail
gmail
yahoo
rediffmail
aolmail

Solutions:
Using 2 FS with awk
$ awk 'BEGIN{FS="[@,.]"} {print $2}' emails.out

sed solution:
$ sed 's_\(.*\)@\(.*\)\.\(.*\)_\2_' emails.out


Using cut will be a two way process in this case
$ cut -d@ -f2 emails.out | cut -d. -f1

Similarly, if we do not use two FS in the above awk code, we would need a two way process like this
$ awk -F@ '{print $2}' emails.out | awk -F. '{print $1}'


So to print the UserName($1),SHELL(last field),homedir(6th field) of /etc/passwd file, the sed solution would be:

$ sed 's_\(.*\):\(.*\):\(.*\):\(.*\):\(.*\):\(.*\):\(.*\)_\1,\7,\6_' /etc/passwd

And the awk solution is:

$ awk 'BEGIN {FS=":"; OFS=","; print "UserName","SHELL","homedir"} {print $1,$NF,$6}' /etc/passwd

Round float values using sprintf - AWK

$ cat usagealarm.txt
Element1|23.4567|11:32PM|OK
Element2|45.2134|10:33PM|OK
Element3|21.5217|11:52PM|OK
Element4|29.4367|11:18PM|OK
Element5|27.4577|12:72PM|OK

The 2nd values, we need to round to 2 digits right of decial point.

i.e.

required output:

Element1|23.46|11:32PM|OK
Element2|45.21|10:33PM|OK
Element3|21.52|11:52PM|OK
Element4|29.44|11:18PM|OK
Element5|27.46|12:72PM|OK

$ awk 'BEGIN{OFS=FS="|"} { $2=sprintf("%.2f",$2)}1' usagealarm.txt

or

$ awk 'BEGIN{OFS=FS="|"} { $2=sprintf("%.2f",$2)} {print}' usagealarm.txt

Friday, April 11, 2008

Awk and Eval

Suppose you have a function named
f_factory ()
which needs 2 parameters "inputfile" and "flag"

And you have a configuration file named ctrl.cfg of the following format

$ cat ctrl.cfg
#Sl No|Input File|Falg|Vol
1|/opt/ctrl.txt|tr_2|2
2|/usr/local/ms/ts.txt|tr_3|3
3|/opt/dec.txt|tr_4|2


Assignment: You have to execute(call) the function with all the "inputfile" "flag" entries from ctrl.cfg.

i.e.

f_factory "/opt/ctrl.txt" tr_2
f_factory "/usr/local/ms/ts.txt" tr_3
f_factory "/opt/dec.txt" tr_4

Solution:
---------
$ for line in `awk -F "|" '!/#/ && NF!=0 {print}' ctrl.cfg`
> do
> IFILE=`echo $line | awk -F "|" '{print $2}'`
> FLAG=`echo $line | awk -F "|" '{print $3}'`
> f_factory $IFILE $FLAG
> done

or

IFILE=`echo $line | awk -F "|" '{print $2}'`
FLAG=`echo $line | awk -F "|" '{print $3}'`

can be replaced with a single line using eval.


$ for line in `awk -F "|" '!/#/ && NF!=0 {print}' ctrl.cfg`
> do
> eval $(echo "$line" | awk -F'|' '{print "IFILE="$2";FLAG="$3}')
> f_factory "${IFILE}" "${FLAG}"
> done

Tuesday, April 8, 2008

Creating multiple subdirectory levels at one time - Linux

Suppose you have to create the following directories and sub-directories:
dir1
dir1/dir2
dir1/dir2/dir3

Solution:
$ mkdir -p dir1/dir2/dir3

Done!!

Thursday, April 3, 2008

Linux flip command - alternative of dos2unix,unix2dos

I was struggling to install unix2dos and dos2unix(debian tofrodos package) on my debian box, then I came across "flip" command which does the same thing.

Usage:
----------
flip - do newline conversions between **IX and MS-DOS

-u Convert to **IX format (CR LF => LF, lone CR or LF unchanged, trailing control Z removed,
embedded control Z unchanged).

-m Convert to MS-DOS format (lone LF => CR LF, lone CR unchanged).


A example. myfile.txt is a dos file with ^Ms at the end of each line.
$ file myfile.txt
myfile.txt: ASCII text, with CRLF, LF line terminators

$ flip -u myfile.txt
$ file myfile.txt
myfile.txt: ASCII text

Again converting to dos format:

$ flip -m myfile.txt
$ file myfile.txt
myfile.txt: ASCII text, with CRLF line terminators

© Jadu Saikia www.UNIXCL.com