Input File: resVI.txt is a portion of the overall results of the class VI annual sports.

$ cat resVI.txt

AA:100m:Monday

DD:200m:Monday

AA:400m:Friday

AA:LOngJump:Tuesday

CC:HighJump:Wed

DD:1000m:Wed

BB:60kgarmrest:Mon

Now we have to calculate how many prizes each of the students(first field) won.

i.e.

Output Required:

AA (3 prizes)

BB (1 prizes)

CC (1 prizes)

DD (2 prizes)

Just we have to calculate the count of occurrences of each first filed in resVI.txt, as each line in resVI.txt corresponds to a prize in a particular category of sports.

Awk code using array:

$ awk '{count[$1]++}END{for(j in count) print j,"("count[j]" prizes)"}' FS=: resVI.txt

Individual steps would have been like this:

$ awk '$1 ~ /AA/ {++c} END {print c}' FS=: resVI.txt

3

$ awk '$1 ~ /BB/ {++c} END {print c}' FS=: resVI.txt

1

$ awk '$1 ~ /CC/ {++c} END {print c}' FS=: resVI.txt

1

$ awk '$1 ~ /DD/ {++c} END {print c}' FS=: resVI.txt

2

Similar Post of mine: Number of files modified in each month using awk

## 27 comments:

great example, have you got anything else with awk and counting instances ...

contact@fir3net.com

Thanks so much for this post.

I was working on a similar example and was worried about counting.this was so useful to me

AA 30

AA 30

AA 5

AA 33

BB 32

BC 30

BD 3

BC 38

AA 33

EE 34

BE 30

how do i count the sum of all $2 for each $1 reoccurance ??

@Arun, thanks for the question.

Are you looking for something like this ?

$ awk '

{a[$1]++;b[$1]=b[$1]+$2} END{for (i in a) print i,a[i],b[i]}' ar.txt

AA 5 131

BB 1 32

BC 2 68

BD 1 3

BE 1 30

EE 1 34

Please let me know if I have misunderstood your requirement.

i don't understand on what basis count[$1] and count[j]

are related

I always get hung up on stupid things. It's the story of my life.

Still any light you can cast would be gratefully received.

@Marmot,

$ cat file.txt

AA 30

AA 30

AA 5

AA 33

BB 32

BC 30

BD 3

BC 38

AA 33

EE 34

BE 30

$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt

AA 5

BB 1

BC 2

BD 1

BE 1

EE 1

Here arr[$1]++ records the count of occurrence of each unique $1 values of file.txt in associative array 'arr' . And then we use the for construct to retrieve/print the count for each of the $1 values. Please let me know if this helps.

What about sorting these results(let's say by the 2nd column)?

@Tambrea Cosmin, something like this ?

$ cat file.txt

AA 30

AA 30

AA 5

AA 33

BB 32

BC 30

BD 3

BC 38

AA 33

EE 34

BE 30

$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt | sort -n -k2

BB 1

BD 1

BE 1

EE 1

BC 2

AA 5

$ awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}' file.txt | sort -nr -k2

AA 5

BC 2

EE 1

BE 1

BD 1

BB 1

Thanks for the examples!

What about doing the average number of occurences?

For instances:

AA

AA

BB

BB

AA

CC

CC

I would like to get the value 2.33, as AA is repeated 3 times and BB,CC 2 times.

Thanks in advance.

@Adri,

something like this ?

$ cat file.txt

AA

AA

BB

BB

AA

CC

CC

$ awk '{arr[$1]++} END {for(i in arr) print i,NR/arr[i]}' file.txt

AA 2.33333

BB 3.5

CC 3.5

Also if you want to calculate the percentage of each occurrence you can refer to http://unstableme.blogspot.com/2008/09/calculate-percentage-using-awk-in-bash.html

Hope this helps.

Thanks a lot for the question.

Thanks for you fast answer Jadu!

Actually it is not exactly what I need.

I'm interested only in getting ONE value.

If I have the following file:

AA

AA

BB

BB

CC

CC

This value should be 2, as each variable is repeated twice

If I have the following file:

AA

AA

AA

BB

BB

CC

CC

The value should be 2,33 as AA is repeated 3 times, BB twice and CC twice.

I hope my explanation are clearer.

The point is that I can't use NR

because there is other lines that are not relevant in my file. If there is 7 lines with AA,BB or CC (as in my 1st example), the value "7" should be obtain thanks to a sum of these values but not using NR, because NR might be 10 (if I have 3 other lines in my file).

Thanks for your help!

how we check that how mant instances are running of C program

at a particular time?

BalaJ

@Jadu Saikia, your awk group count & Sum is useful to me. Thanks alot.

I want to calculate the number of occurrences of each letter in each column.

A C T

A A T

B T A

A C B

B C C

C C B

T A T

C A A

What if I want to print counts only above 2? or equal to 2?

@Sargar, you can do a comparison before printing, something like:

$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] > 2) print i,arr[i]}}' file.txt

AA 5

$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] >= 2) print i,arr[i]}}' file.txt

AA 5

BC 2

$ awk '{arr[$1]++} END {for(i in arr) {if (arr[i] == 2) print i,arr[i]}}' file.txt

BC 2

Hi, I have a file with 2 colums like that:

A B

A C

A F

B C

B D

C A

C B

D B

I would like to count how many times a letter appear in the left and in the right column. I would like something like:

A 3 1

B 2 3

C 2 2

D 1 1

F 0 1

@Valerio,something like this ?

$ cat file.txt

A B

A C

A F

B C

B D

C A

C B

D B

$ awk 'BEGIN {OFS=","} {A[$1]++; B[$2]++} END {for(i in B) {print i,A[i],B[i]}}' file.txt

A,3,1

B,2,3

C,2,2

D,1,1

F,,1

Yes, thank you really much!!

Wait, there is just a little problem, I want also that it put out a 0 when it can t find anything, is that possible?

@Valerio,

This should work:

$ awk '{A[$1]++; B[$2]++} END {for(i in B) {print i,A[i]=="" ? 0:A[i], B[i]=="" ? 0:B[i]}}' file.txt

A 3 1

B 2 3

C 2 2

D 1 1

F 0 1

you can also refer my awk if-else post http://unstableme.blogspot.in/2009/09/if-else-examples-in-awk-bash.html

Hope this helps. Thanks.

How do I count the unique values in column 2 depending on the unique values of column 1 in a file like this :

$ cat file.txt

AA 30

AA 30

AA 5

AA 33

BB 32

BC 30

BD 3

BC 38

AA 33

EE 34

BE 30

where the output would be :

AA 3

BB 1

BC 2

BD 1

BE 1

EE 1

Thanks :) Great blog by the way.

@Frozen_toes Thanks for the question.

A not so good way would be:

$ awk '{arr[$1" "$2]++} END {for(i in arr) print i,arr[i]}' file.txt

EE 34 1

BC 30 1

AA 30 2

BB 32 1

BE 30 1

AA 33 2

AA 5 1

BC 38 1

BD 3 1

$ awk '{arr[$1" "$2]++} END {for(i in arr) print i,arr[i]}' file.txt | awk '{arr[$1]++} END {for(i in arr) print i,arr[i]}'

AA 3

BB 1

BC 2

BD 1

BE 1

EE 1

Hope this helps. Thanks.

How would I go from this

AA 30

AA 30

BB 32

BB 30

CC 3

CC 33

to

AA 60

BB 62

CC 36

Thanks

@shabhonam this you can achieve through this:

$ cat file.txt

AA 30

AA 30

BB 32

BB 30

CC 3

CC 33

$ awk '{arr[$1]+=+$2} END{for (i in arr) print i,arr[i]}' file.txt

AA 60

BB 62

CC 36

Thanks alot.

Good morning I have a file with 8 columns and I want the following output: group by the second column, sum the five column, count the ocorrence of the third column

Post a Comment