Tuesday, August 26, 2008

Sort date in ddmmyyyy format - awk and bash script


Input file is having first field as ddmmyyyy format.

$ cat myf.dat
12082008;pull done;ret=34;Y
08072008;push hanged;s=3;N
15082008;pull done;ret=34;Y
01062008;psuh done;ret=23;Y
18082007;old entry;old;N

Required output: We need to sort the above file based on first field date in ddmmyyyy format; so that the final output after sort should be:

18082007;old entry;old;N
01062008;psuh done;ret=23;Y
08072008;push hanged;s=3;N
12082008;pull done;ret=34;Y
15082008;pull done;ret=34;Y

The solution is divided into 3 steps:

1) Adding a temporary field to the beginning. This field is nothing but the yyyymmdd format of the corresponding first field.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat

20080812,12082008;pull done;ret=34;Y
20080708,08072008;push hanged;s=3;N
20080815,15082008;pull done;ret=34;Y
20080601,01062008;psuh done;ret=23;Y
20070818,18082007;old entry;old;N

2) Now Doing a numeric sort.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat | sort -n

20070818,18082007;old entry;old;N
20080601,01062008;psuh done;ret=23;Y
20080708,08072008;push hanged;s=3;N
20080812,12082008;pull done;ret=34;Y
20080815,15082008;pull done;ret=34;Y

3) Removing the temporary field from beginning.

$ awk '{
tempfield=sprintf("%s%s%s",substr($1,5),substr($1,3,2),substr($1,1,2))
print tempfield","$0
}' FS=";" myf.dat | sort -n | cut -d"," -f2

18082007;old entry;old;N
01062008;psuh done;ret=23;Y
08072008;push hanged;s=3;N
12082008;pull done;ret=34;Y
15082008;pull done;ret=34;Y

The above is the required output.

No comments:

© Jadu Saikia www.UNIXCL.com