Wednesday, September 26, 2012

Compare two numeric fields of two files - UNIX awk


Following example will show how we can use UNIX paste and join command with awk to find difference of two numeric columns of two similar files.

e.g. Input file:
$ cat file1.txt 
Aug-1,100
Aug-2,220
Aug-4,230
Aug-5,100

$ cat file2.txt 
Aug-1,100
Aug-2,120
Aug-4,400
Aug-5,250
Required output:
As you can see both file1.txt and file2.txt has got first field (comma separated) in the same order. We will have to find out the corresponding (2nd field of file1.txt - 2nd field of file2.txt) and the output required is something like this:
Aug-1,100,100,0
Aug-2,220,120,100
Aug-4,230,400,-170
Aug-5,100,250,-150
Solution#1:
$ paste -d "," file1.txt file2.txt
Aug-1,100,Aug-1,100
Aug-2,220,Aug-2,120
Aug-4,230,Aug-4,400
Aug-5,100,Aug-5,250

$ paste -d "," file1.txt file2.txt | awk 'BEGIN {FS=OFS=","} {print $1,$2,$NF,$2-$NF}'
Aug-1,100,100,0
Aug-2,220,120,100
Aug-4,230,400,-170
Aug-5,100,250,-150
Solution#2:
$ join -t, -1 1 -2 1 file1.txt file2.txt 
Aug-1,100,100
Aug-2,220,120
Aug-4,230,400
Aug-5,100,250

$ join -t, -1 1 -2 1 file1.txt file2.txt | awk ' BEGIN {FS=OFS=","} {print $0,$2-$NF}'
Aug-1,100,100,0
Aug-2,220,120,100
Aug-4,230,400,-170
Aug-5,100,250,-150
Some similar posts:
- A small tutorial on UNIX paste command
- Performing join using awk
- Join in bash
- Perform outer join in UNIX Bash
- Find diff based on field - UNIX Bash

1 comment:

Unknown said...


< file1.txt sed -e 's/\(.*\),/[\1] /' |
dc -e "
### Footprint ################
# macros: a b c d e q ?
# stacks: M X
# registers: / N 0
############################
[q]sq
[,]s/
[SM z 0 b z lN1- >c]sc
[zsN lax lcx lMpc]sd
[SXs0 z 0 <e]se
[? z 0 =q dlX-LXr ldx l?x]s?
#####

$(< file2.txt sed -e 's/\(.*\),/[\1] /')
lex

l?x
"

© Jadu Saikia www.UNIXCL.com