Input files:
$ cat file1
id89
id21
id90
id12
$ cat file2
0|id12|QE|T
4|id89|AX|N
8|id20|AU|K
9|id90|AW|P
3|id21|PP|A
7|id13|LP|O
Required: Look-up the file1 fields(Ids) in file2(in 2nd field) and print the full record from file2.
Already I have made a lot of posts to perform file look-up in awk. Lets see some more alternatives to this.
$ awk 'NR==FNR{_[$1];next}$2 in _' FS=\| file1 file2
0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A
Another awk solution would be:
$ awk '{
if (NF==1)
_[$0]=$0
else
for ( i in _)
if ($2==i)
}
' FS=\| file1 file2
0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A
A simple bash script:
for id in $(cat file1)
do
awk -F "|" -v x=$id '$2==x {print}' file2
done
4|id89|AX|N
3|id21|PP|A
9|id90|AW|P
0|id12|QE|T
Using bash join(1) : Here the files needs to be sorted on the field of join.
Using bash process substitution technique to avoid creation of temporary files.
$ join -t"|" -j1 1 -j2 2 <(sort file1) <(sort -t"|" -k2 file2)
id12|0|QE|T
id21|3|PP|A
id89|4|AX|N
id90|9|AW|P
Related post:
- Delete lines based on another file using awk
-Update file based on another file using awk
-Perform join using awk
-Update a file based on another file using sed
3 comments:
I keep forgetting about _join_. Thanks for the reminders :)
Did you do some performance measurements with the various approaches?
searching in awk : compare position (2nd col ) from file 1 to 2nd and third coloumn in file two . if value of 2nd coloumn of file one is in between value of 2nd and 3rd column of file2 print true and if not then false
file 1
id pos
p1 35
p1 78
p1 108
file 2
id start end
p1 1 100
p1 150 306
output
p1 35 true
p1 78 true
p1 108 false
@priyanka,
Please correct me if I misunderstood your query:
Something like ?
$ cat file1.txt
id pos
p1 35
p1 78
p1 108
$ cat file2.txt
id start end
p1 1 10
p1 1 200
p1 150 306
$ paste file1.txt file2.txt
id pos id start end
p1 35 p1 1 10
p1 78 p1 1 200
p1 108 p1 150 306
$ paste file1.txt file2.txt | awk 'NR > 1 {if($2>=$4 && $2 <=$NF ) {print $1,$2,"yes"} else {print $1,$2,"no"}}'
Output:
p1 35 no
p1 78 yes
p1 108 no
Post a Comment