Wednesday, November 17, 2010

Sort file based on last field - UNIX


From SORT(1) man page:

-n, --numeric-sort
compare according to string numerical value

-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)

-t, --field-separator=SEP
use SEP instead of non-blank to blank transition

1) File with same number of fields in each line


$ cat file.txt
IN,90,453
US,12,120
NZ,89,200
WI,20,500
TS,12,124


$ sort -t, -n -k3 file.txt
US,12,120
TS,12,124
NZ,89,200
IN,90,453
WI,20,500

2) File with uneven number of fields in each line

$ cat file.txt
IN,90,453
US,12,1,120
NZ,89,200
WI,500
TS,12,124

Solution : The idea is join the last field(column) of each line at the beginning of each of the lines (with a different delimiter, in this case I am using pipe "|")

$ awk 'BEGIN {FS=","; OFS="|"}{print $NF,$0}' file.txt
453|IN,90,453
120|US,12,1,120
200|NZ,89,200
500|WI,500
124|TS,12,124

Then sort on 1st field with pipe ("|") delimited.

$ awk 'BEGIN {FS=","; OFS="|"}{print $NF,$0}' file.txt | sort -n -t"|" -k1
120|US,12,1,120
124|TS,12,124
200|NZ,89,200
453|IN,90,453
500|WI,500

Then discard the first field i.e. print the 2nd field with pipe ("|") delimited.

$ awk 'BEGIN {FS=","; OFS="|"}{print $NF,$0}' file.txt | sort -n -t"|" -k1 | awk -F "|" '{print $NF}'
US,12,1,120
TS,12,124
NZ,89,200
IN,90,453
WI,500

Which is same as:

$ awk 'BEGIN {FS=","; OFS="|"}{print $NF,$0}' file.txt | sort -n -t"|" -k1 | awk -F "|" '{print $2}'

or

$ awk 'BEGIN {FS=","; OFS="|"}{print $NF,$0}' file.txt | sort -n -t"|" -k1 | cut -d"|" -f2


A question to unstableme readers:
- Is there a better way to directly mention the last field number (as counting field number incase line has huge number of fields is never going to be easy) ? Thanks in advance, much appreciated.

Related posts:
- UNIX sort strings by string length using awk in Bash
- Changing temporary directory for UNIX sort command
- Sort date in ddmmyyyy format using awk

3 comments:

djohngo said...

Gawk's asorti function comes in handy here.

gawk -F "," {arr[$NF]=$0}END{j = asorti(arr, sorted); for(i=1; i<=j; i++){print arr[sorted[i]]}} test.txt

This fails if there is more than one line with identical sort fields.

Jadu Saikia said...

@djohngo, thanks for the awk asorti solution. Its useful.

Jadu Saikia said...

Tried the same with python http://pythonstarter.blogspot.com/2010/11/python-sort-file-based-on-last-field.html

© Jadu Saikia www.UNIXCL.com