Input files:
$ cat file1.txt
timestamp,records,attempts
1264723200,96,6117
1264809600,856,7298
1264896000,904,8628
1264982400,0,0
1265155200,644,1786
$ cat file2.txt
timestamp,records,attempts
1264723200,96,6117
1264809600,859,7298
1264896000,904,8628
1264982400,0,0
1265155200,641,1786
Required:
If file1.txt and file2.txt do not have same values in the 2nd field (i.e 'records' field) in any one of the lines (both the files are sorted on first field), print "BAD", else "GOOD".
As the above two files are already sorted based on first field (i.e. timestamp), we can use join command to join the lines of the two files and then use awk to perform the comparison between two fields of the joined output.
$ join -t, -1 1 -2 1 file1.txt file2.txt
Output:
timestamp,records,attempts,records,attempts
1264723200,96,6117,96,6117
1264809600,856,7298,859,7298
1264896000,904,8628,904,8628
1264982400,0,0,0,0
1265155200,644,1786,641,1786
Lines for which the 2nd and 4th fields in the above output do not match.
$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\,
1264809600,856,7298,859,7298
1265155200,644,1786,641,1786
How many lines like that ?
$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4 {++c} END {print c}' FS=\,
2
or you can use 'wc -l' for the count
$ join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\, | wc -l
2
Now based on the above count, you can conclude if there is difference between file1.txt and file2.txt based on the 2nd field (i.e. records).
$ unmatched_count=$(join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4 {++c} END {print c}' FS=\,)
$ [ -z $unmatched_count ] && echo "OK" || echo "BAD"
Output:
BAD
or
$ unmatched_count=$(join -t, -1 1 -2 1 file1.txt file2.txt | awk '$2 != $4' FS=\, | wc -l)
$ [ $unmatched_count -eq 0 ] && echo "OK" || echo "BAD"
Output:
BAD
Related posts:
- Bash join command explained
- Perform outer join using awk
- Performing join using awk
- Search and print output using awk
7 comments:
If you don't need to store the count.
join -t, -1 1 -2 1 1.txt 2.txt | awk '$2 != $4 { exit 1 }' FS=\, && echo BAD || echo GOOD
@Derek, thanks for the solution.
Hi Jadu Saikia,
I have a problem with checkpoint log. Can you help me !?
Example:
logA.csv: num;date;time;dst;src;service...
logB.csv:
num;date;time;src,service;dst...
logC.csv:
num;date;time;src;service;dst...
File log.csv contains:
num;date;time;src;dst;service
001;01march2010,00:01:11,1.1.1.1;2.2.2.2;137
...
...
I want to filter some fields which is important from csv files.
How can i do that !?
Thanks for helping me !
Can you give me your email !?
@KHUNGBO, thanks for your comment. It would be very helpful if you put some examples of your expected output (as you mentioned : I want to filter some fields which is important from csv files.)... Please post your input and your expected output. I would definitely help you on this. Thanks.
Hi Jadu Saikia,
log1.csv
num;date;time;action;src;dst;proto;service;s_port;d_port;more_information
000;01032010;00:00:01;accept;tcp;1.1.1.1;2.2.2.2;tcp;http;33232;80;more_information
001;01032010;00:00:02;accept;tcp;1.1.1.1;3.3.3.3;tcp;http;33233;80;more_information
002;01032010;00:00:04;accept;tcp;1.1.1.1;2.2.2.2;tcp;http;33232;80;more_information
...
log2.csv
num;date;time;action;proto;service;src;dst;s_port;d_port;more_information
000;01032010;01:00:01;accept;tcp;http;1.1.1.2;3.3.3.3;33232;80;more_information
001;01032010;01:00:02;accept;tcp;http;1.1.1.2;4.4.4.4;33233;80;more_information
002;01032010;01:00:04;accept;tcp;http;1.1.1.2;5.4.3.2;33232;80;more_information
log3.csv
num;date;time;action;proto;src;dst;service;s_port;d_port;more_information
001;01032010;02:00:01;accept;tcp;1.1.1.2;3.3.3.3;http;33232;80;more_information
002;01032010;02:00:02;accept;tcp;1.1.1.2;4.4.4.4;http;33233;80;more_information
003;01032010;02:00:04;accept;tcp;1.1.1.2;5.4.3.2;http;33232;80;more_information
Notice these field is messed, not stable
I want create a csv file which filtered some field from that csv files
num;date;time;action;src;dst;service
other fields don't need.
which num field is new field which sorted by date and time
Thanks for help me !
My email: khungbo33@gmail.com
@KHUNGBO,
Let me know if this helps:
$ awk -v colseq="num;date;time;action;src;dst;service" '
BEGIN {
FS=OFS=";"
colsN=split(colseq, colsA, FS)
}
FNR==1 {
for(i=1; i<=NF; i++)
headA[$i] = i
print colseq
next
}
{
for(i=1; i<=colsN; i++)
printf("%s%s", $headA[colsA[i]], (i==colsN) ? ORS : OFS)
}' log*.csv
Output:
num;date;time;action;src;dst;service
000;01032010;00:00:01;accept;tcp;1.1.1.1;tcp
001;01032010;00:00:02;accept;tcp;1.1.1.1;tcp
002;01032010;00:00:04;accept;tcp;1.1.1.1;tcp
num;date;time;action;src;dst;service
000;01032010;01:00:01;accept;1.1.1.2;3.3.3.3;http
001;01032010;01:00:02;accept;1.1.1.2;4.4.4.4;http
002;01032010;01:00:04;accept;1.1.1.2;5.4.3.2;http
num;date;time;action;src;dst;service
001;01032010;02:00:01;accept;1.1.1.2;3.3.3.3;http
002;01032010;02:00:02;accept;1.1.1.2;4.4.4.4;http
003;01032010;02:00:04;accept;1.1.1.2;5.4.3.2;http
You might also find these two post useful:
http://unstableme.blogspot.com/2008/10/find-column-number-based-on-field.html
http://unstableme.blogspot.com/2008/03/print-field-based-on-header-awk.html
Keep in touch. Thanks. // Jadu
Thanks a lot , Jadu Saikia
I'll try it !
Post a Comment