## Sunday, September 28, 2008

### Sum of and group by using awk

\$ cat cont_details.txt
continent|mval|cval|kval
SA|2345|10|2.3
AF|123|12|4.5
SA|89|12.67
OC|890|10|2.3
EU|24|45|2.4
AF|90|10|10
NA|5678|12|89
AF|345|12|3.5
OC|90|78|5.6
OC|23|12|4.5
SA|1234|12|6.7
EU|90|12|10
AF|12|12|34
SA|909|12|56

Output required:

Select continent, sum(mval),sum(cval),sum(kval) group by continent

i.e. required output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Awk solution:

\$ awk -F"|" '
NR==1 {print}
NR!=1 {OFS="|";a[\$1]+=\$2;b[\$1]+=\$3;c[\$1]+=\$NF}
END{for (i in a){print i,a[i],b[i],c[i]}}
' cont_details.txt

Output:

continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4

Related post:
Implement group by clause functionality using awk

Ruturaj Vartak said...

that is great! thanks a lot

Prashant Joshi said...

Nice one

yoosfg said...

Excellent.. Much needed help indeed..Thanks

awkseeker said...

I used two of your tricks and perfect work for me. I just wonder how does your script work, because you refer to \$1 and then your add with \$2, it is a big question in my head. Could you please explain more clear for me. Thank you very much for your contribution here. Hope to hear from you, my professor.

Unknown said...

@awkseeker,
I am putting an example below that will help you to understand awk associative-array. Please let me know if you still have any doubt on this.

\$ cat file1.txt
Alex,147,1
Brian,056,7
Nanda,559,2
Lisa,221,0
Nanda,268,2
Alex,200,5
Alex,408,8

i)
#a[\$1] registers the first field of the lines seen in the associative-array "a" (arrays are always associative in Awk)

\$ awk -F "," '{a[\$1]} END {for (i in a) print i}' file.txt

Nanda
Lisa
Alex
Brian

ii)
#Count of first field, sum of 2nd field, sum of 3rd field group by each first field

\$ awk -F "," '{a[\$1]++; b[\$1]+=\$2; c[\$1]+=\$3 } END {for (i in a) print i,a[i],b[i],c[i]}' file1.txt

Nanda 2 827 4
Lisa 1 221 0
Alex 3 755 14
Brian 1 56 7

iii)
#Sum of 2nd field, sum of 3rd field group by each first field

\$ awk -F "," '{a[\$1]+=\$2; b[\$1]+=\$3 } END {for (i in a) print i,a[i],b[i]}' file1.txt

Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

iv)
#Sum of 2nd field, sum of 3rd field group by each first field, same as ii)

\$ awk -F "," '{a[\$1]+=\$2; b[\$1]+=\$3 } END {for (i in b) print i,a[i],b[i]}' file1.txt

Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

Hope this helps.

awkseeker said...

Thank Juda, your blog is so valuable and save huge time to my work. I use you trick to do tabular text file and much more efficiency compare to traditional sql.

I am new to awk and have some basic of turbo c in previous time. however, your good explanation make me more clear and I would bother you if some doubts.

Best regards,

Unknown said...

@Awkseeker : thanks so much for your comment. Its really inspiring. Feel free to ask any doubt on Unix one liner, awk, sed and I will definitely try to help.

awkseeker said...
This comment has been removed by the author.
awkseeker said...
This comment has been removed by the author.
cef_luxan said...

nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;

Thnk's

cef_luxan said...

nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;

Thnk's

Unknown said...

@cef_luxan, something like this?

\$ cat file.txt
US|A|23
IN|A|12
US|B|12
LK|C|20
US|B|12

\$ awk -F"|" '
{OFS="|";a[\$1]+=\$NF}
END{for (i in a){print i,a[i]}}
' file.txt

LK|20
IN|12
US|47

\$ awk -F"|" '
{OFS="|";a[\$1"|"\$2]+=\$NF}
END{for (i in a){print i,a[i]}}
' file.txt

IN|A|12
LK|C|20
US|A|23
US|B|24

cef_luxan said...

awkseeker said...

Hi Juda, love to bother you again. I tried some SOAP and got the result in big XML. I want to use awk to archive the needed parameters only. Could you help me.

Unknown said...

@awkseeker, sure, could you please put your input and expected output. Thanks.

Anonymous said...

this post is start for me ... I stumped at samething that I not sure as to how start on ... I tried so of your script that i honestly lost touch ... please help
COL1 COl2 COL3
12222 AUH FLUEH
12222 SSC OPERA
12222 SSC OPERA
(SSC shows 2 occurences & AUH1 Occurences)
12223 SSC OPERA
12223 AUH FLUEH
12223 SSC OPERA
12223 SSC OPERA
(SSC shows 3 occurences & AUH1 Occurences)

RESULT -
12222 SSC-2 AUH-1 OPERA-2 FLUEH-1
12223 SSC-3 AUH-1 OPERA-3 FLUEH-1

How shold I do this?

Juan C. Rivera said...

I have an input like this

Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

I need to sum the last column only, my output would be:
25

Unknown said...

@Juan, you can do something like this:

\$ cat file.txt
Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7

\$ awk '{sum+=\$NF} END {print sum}' file.txt
25

Which is same as:
\$ awk '{sum+=\$3} END {print sum}' file.txt
25

You can check this post for other alternatives to sum numbers in a file in UNIX bash shell.

http://unstableme.blogspot.in/2009/11/sum-of-numbers-in-file-unix.html

Hope this helps.

Anonymous said...

Great explanation above of how to do Group By's in Awk! However suppose you need to sum up 1000's of columns? Is there a better solution then the one above?

Terry

Anonymous said...

Sorry I meant what happens if you need to sum 1000's of columns but group by only a few columns,...

Terry

Anonymous said...

Ok I came up with the following solution for cases where you have 1000's of columns and want to do a Group By:

gawk -F"|" 'NR==1 {print}
NR!=1 {OFS="|";
for (i=1; i<=43; i++)
a[i,\$1]+=\$(i+1)
}
END {for (i in a)
b[substr(i,index(i,"\034")+1)]="";
for (i in b){output="";
for (j=1; j<=43; j++)
output = output "," a[j,i]
print i,output
}
}' cont_details.txt

Feel free to post it on your blog! :)

Terry

Anonymous said...

One mistake in the code I send you earlier,... All occurrences of '43' should be replaced by the number of columns you have,...

Terry

awkseeker said...

Hi Juda, I have tried to comment, but it seem not to be success, so wold bother you again.
infile:
info1=1,2
info2=a
info3=a,s
info1=1
info2=a,d
info3=f
info1=1
info2=a
info3=s

output:
1,2|a|a,s
1|a,d|f
1|a|s

tried to apply array, but no idea how to start. please advise.

Unknown said...

@awkseeker something like this ?

\$ awk -F "=" '{print \$NF}' infile.txt | awk 'NR%3{printf \$0"|";next;}1'

1,2|a|a,s
1|a,d|f
1|a|s

awkseeker said...

How cool this simple code is!!!!
I would further bother you on code. please explain 'condition {dosomething;next;}1'
why there are no if condition needed?

Sincerely

Unknown said...

@awkseeker, Thanks.

Actually its acting like a if condition.

Something like, following two are same:

\$ awk 'NR==3' infile.txt

\$ awk '{if(NR==3) {print \$0}}' infile.txt

I have made your query a blog post today, you can have a look.

http://www.unixcl.com/2013/07/unix-merge-multiple-consecutive-lines.html

Also one post on awk if else that you might like:

http://www.unixcl.com/2009/09/if-else-examples-in-awk-bash.html

Thanks!

RajIyengar said...

Nice work.

Suppose I have data in 2 rows and want to show it in one row :)

Eg. Raw Data

date value flg asset
20131001 1234 B AAA
20131001 1423 S AAA
20131001 1324 B AAB
20131001 4321 S AAB

Expected output:

date b-value s-value asset
20131001 1234 1423 AAA
20131001 1324 4321 BBB

How would I do that?

Thanks.

Unknown said...

< cont_details.txt \
perl -wMstrict -Mvars='@v,\$i' -F'\|' -lane '
print,next if \$. == 1;
\$v[\$_]->{\$F[0]} += \$F[\$_+1] for 0..2}{\$,="|";
\$i=\$_,print \$i, map { \$_->{\$i} } @v for sort keys %{\$v[0]}

'