Thursday, October 2, 2008

awk - update file based on another file


classvi.txt is the student info database file for class vi of "unstable international school".

$ cat classvi.txt
Student1
Name Anna K
ID VI45

Student2
Name Anil Pri
ID VI12

Student3
Name Xena Len
ID VI90

...
...

Recently HNU university introduced university serial ID for each and every student of "unstable international school". The mapping of the school ID number to University ID number is sent in a file classvi_UID.txt.

$ cat classvi_UID.txt
VI90 UID43434
VI13 UID21221
VI12 UID31322
VI45 UID26767
VI01 UID12323
...
...

Output required: We need to update classvi.txt with the corresponding University ID from classvi_UID.txt, so that the final output will be:

Student1
Name Anna K
ID VI45 (UID26767)

Student2
Name Anil Pri
ID VI12 (UID31322)

Student3
Name Xena Len
ID VI90 (UID43434)

...
...


The awk solution:

$ awk 'NR==FNR { UID[$1] = $2; next }
/^ID/ { print $0,"("UID[$2]")"; next }1' classvi_UID.txt classvi.txt

Related posts related to NR==FNR can be found here

6 comments:

Anant Khaitan said...

Hey gone through your posts and am highly impressed.. you use awk and others as if you are doing some SQL injection.. Added you in Blog Roll and you have acquired a new reader...

mintu said...

Hi,

I am looking for an awk script which should help me to meet the following requirement:

File1 has records in following format



Code:
INF: FAILEd RECORD AB1234
INF: FAILEd RECORD PQ1145
INF: FAILEd RECORD AB3215
INF: FAILEd RECORD AB6114
.........................
.........................
Each record(basically the ID 'AB1234', 'PQ1145' and so on) of File1, should be validated against the following File2, which has format like:


Code:
AB0001-AB1111
AB1112-AB2222
.............
.............
PQ1111-PQ2222
.............
XY1234-XY5678
.............
Note that File2 has different ranges for code AB, PQ, XY and so on, so validation won't be straight forward and hence code should be compared first and then the ID should check if it lies in any of the ranges above, associated to that particular code. If it does then the record should be filtered out from File1.

I'll really appreciate if anyone please help me out. Thanks

mintu said...

Hi,

I am looking for an awk script which should help me to meet the following requirement:

File1 has records in following format



Code:
INF: FAILEd RECORD AB1234
INF: FAILEd RECORD PQ1145
INF: FAILEd RECORD AB3215
INF: FAILEd RECORD AB6114
.........................
.........................
Each record(basically the ID 'AB1234', 'PQ1145' and so on) of File1, should be validated against the following File2, which has format like:


Code:
AB0001-AB1111
AB1112-AB2222
.............
.............
PQ1111-PQ2222
.............
XY1234-XY5678
.............
Note that File2 has different ranges for code AB, PQ, XY and so on, so validation won't be straight forward and hence code should be compared first and then the ID should check if it lies in any of the ranges above, associated to that particular code. If it does then the record should be filtered out from File1.

I'll really appreciate if anyone please help me out. Thanks

Unknown said...

@mintu,

As I understood, assuming:

$ cat file1.txt
INF: FAILEd RECORD AB1234
INF: FAILEd RECORD PQ1145
INF: FAILEd RECORD AB3215
INF: FAILEd RECORD AB6114

$ cat file2.txt
AB0001-AB1111
AB1112-AB2222
PQ1111-PQ2222
XY1234-XY5678

Output required:
INF: FAILEd RECORD AB3215
INF: FAILEd RECORD AB6114

Could you please confirm if my assumption is correct. Thanks.

Unknown said...

I tried the same qn in different way following your other examples and got the desired output:) Your materials are very good and easy to understand.

awk -F" " 'NR==FNR {a[$1]=$2;next} { $2 = ($2 in a) ? $2 "("a[$2]")" : $2; print }' file2 file1

Unknown said...

@Sadhun Ganesh, thanks a lot.

© Jadu Saikia www.UNIXCL.com