Wednesday, May 28, 2008

Associative array in awk


Associative arrays are one of awk's most powerful features. I have already made some posts on this: awk report generation, count number of occurrence of the patterns in a particular field etc.

Here some more examples for you. To Start with a simple but good example:

sal.txt contains the working details of employees for a particular week(Name Day Post EffValue)


$ cat sal.txt
Jex Mon clerk 12001
Aji Tue sales 13003
Jex Wed clerk 13123
Salna Thu sales 34000
Aji Mon sales 13123


Now if you want to count how many times each employee worked in that week (Basically counting the number of occurrences of each "first field" in sal.txt)


$ awk '{count[$1]++}END{for(j in count) print j,count[j]}' sal.txt
Aji 2
Jex 2
Salna 1


And if you want add the EffValues(4th field) of each employee separately in that week


$ awk '{arr[$1]+=$4} END {for (i in arr) {print i,arr[i]}}' sal.txt
Aji 26126
Jex 25124
Salna 34000


Combining the above 2 together(Name NumDays TotalEff):


$ awk '{a[$1]++;b[$1]=b[$1]+$NF}END{for (i in a) print i,a[i],b[i]}' sal.txt
Aji 2 26126
Jex 2 25124
Salna 1 34000


Was that good ? Here is some more.

status_nw1.txt contains information about some hosts (Time Status IP ICost)


$ cat status_nw1.txt
09:21:10 UP 172.21.67.60 20
09:21:11 UP 172.21.67.62 30
09:21:12 NA 172.21.67.60 10
09:21:12 DN 172.21.67.60 40
09:21:13 UP 172.21.67.66 45
09:21:14 DN 172.21.67.62 90
09:21:15 NA 172.21.67.66 35
09:21:16 NA 172.21.67.66 15
09:21:16 DN 172.21.67.66 25



Problem1:
Find the number of occurrences of each unique ip(3rd field) while adding their ICost(last field) values (As we did above in the previous example)


$ awk '{a[$3]++;b[$3]=b[$3]+$NF}END{for (i in a) print i,a[i],b[i]}' status_nw1.txt
172.21.67.60 3 70
172.21.67.62 2 120
172.21.67.66 4 120


Problem2:
To calculate the number of occurrences of each status(2nd field) corresponding to each IPs(3rd field)


$ awk '{count["Number of "$2" for "$3]++}END{for(j in count) print j":",count[j]}' status_nw1.txt
Number of UP for 172.21.67.66: 1
Number of NA for 172.21.67.60: 1
Number of DN for 172.21.67.60: 1
Number of NA for 172.21.67.66: 2
Number of DN for 172.21.67.62: 1
Number of UP for 172.21.67.60: 1
Number of UP for 172.21.67.62: 1
Number of DN for 172.21.67.66: 1


Problem2A:
As above, but only for status=NA


$ awk '$2=="NA" {count["Number of "$2" for "$3]++}END{for(j in count) print j":",count[j]}' status_nw1.txt
Number of NA for 172.21.67.60: 1
Number of NA for 172.21.67.66: 2


Problem3:
As you can see all the times(1st field) in status_nw1.txt are in the hour:min of 09:21, now if you have to find out Number of status=NA for the time "09:21" (not 09:21.XX), here is a way:


$ awk '$2=="NA" {count["Number of "$2" for "substr($1,1,5)]++}END{for(j in count) print j"(time):",count[j]}' status_nw1.txt
Number of NA for 09:21(time): 3


I got a solution in unix.com which solves Problem2 and Problem3 together in a single script, here it is:


$ awk '{a[$4 " in " $5]++
b[$4" for "$1":"$2]++;next}
END{
for ( i in a )
print "No of "i":" a[i]
print "---------------------------"
for ( i in b )
print "No of "i"(time):" b[i]

}' FS="( )|(:)" status_nw1.txt
No of UP in 172.21.67.66:1
No of DN in 172.21.67.60:1
No of DN in 172.21.67.62:1
No of NA in 172.21.67.60:1
No of DN in 172.21.67.66:1
No of UP in 172.21.67.60:1
No of NA in 172.21.67.66:2
No of UP in 172.21.67.62:1
---------------------------
No of UP for 09:21(time):3
No of DN for 09:21(time):3
No of NA for 09:21(time):3


Related Post to awk substr function as I used above:

- Printing last 2 characters using awk substr function
- About awk substr function

No comments:

© Jadu Saikia www.UNIXCL.com