Wednesday, August 26, 2009

Search and print output using awk


Input files:

$ cat file1
id89
id21
id90
id12

$ cat file2
0|id12|QE|T
4|id89|AX|N
8|id20|AU|K
9|id90|AW|P
3|id21|PP|A
7|id13|LP|O

Required: Look-up the file1 fields(Ids) in file2(in 2nd field) and print the full record from file2.

Already I have made a lot of posts to perform file look-up in awk. Lets see some more alternatives to this.


$ awk 'NR==FNR{_[$1];next}$2 in _' FS=\| file1 file2

0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A

Another awk solution would be:

$ awk '{
if (NF==1)
_[$0]=$0
else
for ( i in _)
if ($2==i)
print
}
' FS=\| file1 file2

0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A

A simple bash script:

for id in $(cat file1)
do
awk -F "|" -v x=$id '$2==x {print}' file2
done

4|id89|AX|N
3|id21|PP|A
9|id90|AW|P
0|id12|QE|T


Using bash join(1) : Here the files needs to be sorted on the field of join.
Using bash process substitution technique to avoid creation of temporary files.

$ join -t"|" -j1 1 -j2 2 <(sort file1) <(sort -t"|" -k2 file2)

id12|0|QE|T
id21|3|PP|A
id89|4|AX|N
id90|9|AW|P


Related post:
- Delete lines based on another file using awk
-Update file based on another file using awk
-Perform join using awk
-Update a file based on another file using sed

3 comments:

rattus said...

I keep forgetting about _join_. Thanks for the reminders :)
Did you do some performance measurements with the various approaches?

priyanka said...

searching in awk : compare position (2nd col ) from file 1 to 2nd and third coloumn in file two . if value of 2nd coloumn of file one is in between value of 2nd and 3rd column of file2 print true and if not then false
file 1
id pos
p1 35
p1 78
p1 108
file 2
id start end
p1 1 100
p1 150 306
output
p1 35 true
p1 78 true
p1 108 false

Jadu Saikia said...

@priyanka,

Please correct me if I misunderstood your query:

Something like ?

$ cat file1.txt
id pos
p1 35
p1 78
p1 108

$ cat file2.txt
id start end
p1 1 10
p1 1 200
p1 150 306

$ paste file1.txt file2.txt
id pos id start end
p1 35 p1 1 10
p1 78 p1 1 200
p1 108 p1 150 306

$ paste file1.txt file2.txt | awk 'NR > 1 {if($2>=$4 && $2 <=$NF ) {print $1,$2,"yes"} else {print $1,$2,"no"}}'

Output:
p1 35 no
p1 78 yes
p1 108 no

© Jadu Saikia www.UNIXCL.com