## Tuesday, March 18, 2008

### Performing "join" using AWK

\$ cat file1.txt
Alex:ID23:How:23:2004
Aina:ID12:Thomas:14:2003
Ciam:ID13:Dev:23:2000
Alot:ID34:Ya:24:2004
Brian:ID64:Low:25:1999

\$ cat file2.txt
ID13:12,300
ID12:34,300
ID64:50,000

#Join both the files based on common 1st filed of file2.txt and 2nd field of file1.txt
#i.e. required Output:

Ciam:ID13:Dev:23:2000:12,300
Aina:ID12:Thomas:14:2003:34,300
Brian:ID64:Low:25:1999:50,000

Solution1:
\$ awk '
BEGIN {FS=OFS=":"}
NR==FNR{arr[\$1]=\$2;next}
\$2 in arr{print \$0,arr[\$2]}
' file2.txt file1.txt

Aina:ID12:Thomas:14:2003:34,300
Ciam:ID13:Dev:23:2000:12,300
Brian:ID64:Low:25:1999:50,000

Solution2:
\$ awk '
BEGIN {FS=":"}
NR==FNR{arr[\$2]=\$0;next}
\$1 in arr && \$0=arr[\$1] FS \$2
' file1.txt file2.txt

Ciam:ID13:Dev:23:2000:12,300
Aina:ID12:Thomas:14:2003:34,300
Brian:ID64:Low:25:1999:50,000

Using "join":
Same thing if I have to do with "join" command, here are the steps:

\$ cat file1.txt
Alex:ID23:How:23:2004
Aina:ID12:Thomas:14:2003
Ciam:ID13:Dev:23:2000
Alot:ID34:Ya:24:2004
Brian:ID64:Low:25:1999

#Sort file1.txt based on 2nd field, store the sorted output on file1.txt.srt
\$ sort -t ":" +1 -2 -o file1.txt.srt file1.txt

\$ cat file1.txt.srt
Aina:ID12:Thomas:14:2003
Ciam:ID13:Dev:23:2000
Alex:ID23:How:23:2004
Alot:ID34:Ya:24:2004
Brian:ID64:Low:25:1999

\$ cat file2.txt
ID13:12,300
ID12:34,300
ID64:50,000

#Sort file2.txt based on 1st field, store the sorted output on file2.txt.srt
\$ sort -t ":" +0 -1 -o file2.txt.srt file2.txt

\$ cat file2.txt.srt
ID12:34,300
ID13:12,300
ID64:50,000

#Now perform "join" operation.
\$ join -t ":" -1 2 -2 1 -o "1.1 1.2 1.3 1.4 1.5 2.2" file1.txt.srt file2.txt.srt
Aina:ID12:Thomas:14:2003:34,300
Ciam:ID13:Dev:23:2000:12,300
Brian:ID64:Low:25:1999:50,000

More about "join" can be found here