Input File: resVI.txt is a portion of the overall results of the class VI annual sports.
$ cat resVI.txt
AA:100m:Monday
DD:200m:Monday
AA:400m:Friday
AA:LOngJump:Tuesday
CC:HighJump:Wed
DD:1000m:Wed
BB:60kgarmrest:Mon
Now we have to calculate how many prizes each of the students(first field) won.
i.e.
Output Required:
AA (3 prizes)
BB (1 prizes)
CC (1 prizes)
DD (2 prizes)
Just we have to calculate the count of occurrences of each first filed in resVI.txt, as each line in resVI.txt corresponds to a prize in a particular category of sports.
Awk code using array:
$ awk '{count[$1]++}END{for(j in count) print j,"("count[j]" prizes)"}' FS=: resVI.txt
Individual steps would have been like this:
$ awk '$1 ~ /AA/ {++c} END {print c}' FS=: resVI.txt
3
$ awk '$1 ~ /BB/ {++c} END {print c}' FS=: resVI.txt
1
$ awk '$1 ~ /CC/ {++c} END {print c}' FS=: resVI.txt
1
$ awk '$1 ~ /DD/ {++c} END {print c}' FS=: resVI.txt
2
Similar Post of mine: Number of files modified in each month using awk
27 comments:
great example, have you got anything else with awk and counting instances ...
contact@fir3net.com
Thanks so much for this post.
I was working on a similar example and was worried about counting.this was so useful to me
AA 30
AA 30
AA 5
AA 33
BB 32
BC 30
BD 3
BC 38
AA 33
EE 34
BE 30
how do i count the sum of all $2 for each $1 reoccurance ??
@Arun, thanks for the question.
Are you looking for something like this ?
$ awk '
{a[$1]++;b[$1]=b[$1]+$2} END{for (i in a) print i,a[i],b[i]}' ar.txt
AA 5 131
BB 1 32
BC 2 68
BD 1 3
BE 1 30
EE 1 34
Please let me know if I have misunderstood your requirement.
i don't understand on what basis count[$1] and count[j]
are related
I always get hung up on stupid things. It's the story of my life.
Still any light you can cast would be gratefully received.
@Marmot,
$ cat file.txt
AA 30
AA 30
AA 5
AA 33
BB 32
BC 30
BD 3
BC 38
AA 33
EE 34
BE 30
$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt
AA 5
BB 1
BC 2
BD 1
BE 1
EE 1
Here arr[$1]++ records the count of occurrence of each unique $1 values of file.txt in associative array 'arr' . And then we use the for construct to retrieve/print the count for each of the $1 values. Please let me know if this helps.
What about sorting these results(let's say by the 2nd column)?
@Tambrea Cosmin, something like this ?
$ cat file.txt
AA 30
AA 30
AA 5
AA 33
BB 32
BC 30
BD 3
BC 38
AA 33
EE 34
BE 30
$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt | sort -n -k2
BB 1
BD 1
BE 1
EE 1
BC 2
AA 5
$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt | sort -nr -k2
AA 5
BC 2
EE 1
BE 1
BD 1
BB 1
Thanks for the examples!
What about doing the average number of occurences?
For instances:
AA
AA
BB
BB
AA
CC
CC
I would like to get the value 2.33, as AA is repeated 3 times and BB,CC 2 times.
Thanks in advance.
@Adri,
something like this ?
$ cat file.txt
AA
AA
BB
BB
AA
CC
CC
$ awk '{arr[$1]++} END {for(i in arr) print i,NR/arr[i]}' file.txt
AA 2.33333
BB 3.5
CC 3.5
Also if you want to calculate the percentage of each occurrence you can refer to http://unstableme.blogspot.com/2008/09/calculate-percentage-using-awk-in-bash.html
Hope this helps.
Thanks a lot for the question.
Thanks for you fast answer Jadu!
Actually it is not exactly what I need.
I'm interested only in getting ONE value.
If I have the following file:
AA
AA
BB
BB
CC
CC
This value should be 2, as each variable is repeated twice
If I have the following file:
AA
AA
AA
BB
BB
CC
CC
The value should be 2,33 as AA is repeated 3 times, BB twice and CC twice.
I hope my explanation are clearer.
The point is that I can't use NR
because there is other lines that are not relevant in my file. If there is 7 lines with AA,BB or CC (as in my 1st example), the value "7" should be obtain thanks to a sum of these values but not using NR, because NR might be 10 (if I have 3 other lines in my file).
Thanks for your help!
how we check that how mant instances are running of C program
at a particular time?
BalaJ
@Jadu Saikia, your awk group count & Sum is useful to me. Thanks alot.
I want to calculate the number of occurrences of each letter in each column.
A C T
A A T
B T A
A C B
B C C
C C B
T A T
C A A
What if I want to print counts only above 2? or equal to 2?
@Sargar, you can do a comparison before printing, something like:
$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] > 2) print i,arr[i]}}' file.txt
AA 5
$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] >= 2) print i,arr[i]}}' file.txt
AA 5
BC 2
$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] == 2) print i,arr[i]}}' file.txt
BC 2
Hi, I have a file with 2 colums like that:
A B
A C
A F
B C
B D
C A
C B
D B
I would like to count how many times a letter appear in the left and in the right column. I would like something like:
A 3 1
B 2 3
C 2 2
D 1 1
F 0 1
@Valerio,something like this ?
$ cat file.txt
A B
A C
A F
B C
B D
C A
C B
D B
$ awk 'BEGIN {OFS=","} {A[$1]++; B[$2]++} END {for(i in B) {print i,A[i],B[i]}}' file.txt
A,3,1
B,2,3
C,2,2
D,1,1
F,,1
Yes, thank you really much!!
Wait, there is just a little problem, I want also that it put out a 0 when it can t find anything, is that possible?
@Valerio,
This should work:
$ awk '{A[$1]++; B[$2]++} END {for(i in B) {print i,A[i]=="" ? 0:A[i], B[i]=="" ? 0:B[i]}}' file.txt
A 3 1
B 2 3
C 2 2
D 1 1
F 0 1
you can also refer my awk if-else post http://unstableme.blogspot.in/2009/09/if-else-examples-in-awk-bash.html
Hope this helps. Thanks.
How do I count the unique values in column 2 depending on the unique values of column 1 in a file like this :
$ cat file.txt
AA 30
AA 30
AA 5
AA 33
BB 32
BC 30
BD 3
BC 38
AA 33
EE 34
BE 30
where the output would be :
AA 3
BB 1
BC 2
BD 1
BE 1
EE 1
Thanks :) Great blog by the way.
@Frozen_toes Thanks for the question.
A not so good way would be:
$ awk '{arr[$1" "$2]++} END {for(i in arr) print i,arr[i]}' file.txt
EE 34 1
BC 30 1
AA 30 2
BB 32 1
BE 30 1
AA 33 2
AA 5 1
BC 38 1
BD 3 1
$ awk '{arr[$1" "$2]++} END {for(i in arr) print i,arr[i]}' file.txt | awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}'
AA 3
BB 1
BC 2
BD 1
BE 1
EE 1
Hope this helps. Thanks.
How would I go from this
AA 30
AA 30
BB 32
BB 30
CC 3
CC 33
to
AA 60
BB 62
CC 36
Thanks
@shabhonam this you can achieve through this:
$ cat file.txt
AA 30
AA 30
BB 32
BB 30
CC 3
CC 33
$ awk '{arr[$1]+=+$2} END{for (i in arr) print i,arr[i]}' file.txt
AA 60
BB 62
CC 36
Thanks alot.
Good morning I have a file with 8 columns and I want the following output: group by the second column, sum the five column, count the ocorrence of the third column
Post a Comment