Sunday, September 28, 2008
Sum of and group by using awk
$ cat cont_details.txt
continent|mval|cval|kval
SA|2345|10|2.3
AF|123|12|4.5
SA|89|12.67
OC|890|10|2.3
EU|24|45|2.4
AF|90|10|10
NA|5678|12|89
AF|345|12|3.5
OC|90|78|5.6
OC|23|12|4.5
SA|1234|12|6.7
EU|90|12|10
AF|12|12|34
SA|909|12|56
Output required:
Select continent, sum(mval),sum(cval),sum(kval) group by continent
i.e. required output:
continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4
Awk solution:
$ awk -F"|" '
NR==1 {print}
NR!=1 {OFS="|";a[$1]+=$2;b[$1]+=$3;c[$1]+=$NF}
END{for (i in a){print i,a[i],b[i],c[i]}}
' cont_details.txt
Output:
continent|mval|cval|kval
NA|5678|12|89
OC|1003|100|12.4
AF|570|46|52
SA|4577|46.67|77.67
EU|114|57|12.4
Related post:
Implement group by clause functionality using awk
Subscribe to:
Post Comments (Atom)
© Jadu Saikia www.UNIXCL.com
27 comments:
that is great! thanks a lot
Excellent.. Much needed help indeed..Thanks
I used two of your tricks and perfect work for me. I just wonder how does your script work, because you refer to $1 and then your add with $2, it is a big question in my head. Could you please explain more clear for me. Thank you very much for your contribution here. Hope to hear from you, my professor.
@awkseeker,
Thanks for reading this blog.
I am putting an example below that will help you to understand awk associative-array. Please let me know if you still have any doubt on this.
$ cat file1.txt
Alex,147,1
Brian,056,7
Nanda,559,2
Lisa,221,0
Nanda,268,2
Alex,200,5
Alex,408,8
i)
#a[$1] registers the first field of the lines seen in the associative-array "a" (arrays are always associative in Awk)
$ awk -F "," '{a[$1]} END {for (i in a) print i}' file.txt
Nanda
Lisa
Alex
Brian
ii)
#Count of first field, sum of 2nd field, sum of 3rd field group by each first field
$ awk -F "," '{a[$1]++; b[$1]+=$2; c[$1]+=$3 } END {for (i in a) print i,a[i],b[i],c[i]}' file1.txt
Nanda 2 827 4
Lisa 1 221 0
Alex 3 755 14
Brian 1 56 7
iii)
#Sum of 2nd field, sum of 3rd field group by each first field
$ awk -F "," '{a[$1]+=$2; b[$1]+=$3 } END {for (i in a) print i,a[i],b[i]}' file1.txt
Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7
iv)
#Sum of 2nd field, sum of 3rd field group by each first field, same as ii)
$ awk -F "," '{a[$1]+=$2; b[$1]+=$3 } END {for (i in b) print i,a[i],b[i]}' file1.txt
Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7
Hope this helps.
Thank Juda, your blog is so valuable and save huge time to my work. I use you trick to do tabular text file and much more efficiency compare to traditional sql.
I am new to awk and have some basic of turbo c in previous time. however, your good explanation make me more clear and I would bother you if some doubts.
Best regards,
@Awkseeker : thanks so much for your comment. Its really inspiring. Feel free to ask any doubt on Unix one liner, awk, sed and I will definitely try to help.
nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;
Thnk's
nice tutor,
How about grouping with more than a column, i.e:
Data:
a, b, c, d
Require:
select a, b, sum(c), sum(d) group by a,b;
Thnk's
@cef_luxan, something like this?
$ cat file.txt
US|A|23
IN|A|12
US|B|12
LK|C|20
US|B|12
$ awk -F"|" '
{OFS="|";a[$1]+=$NF}
END{for (i in a){print i,a[i]}}
' file.txt
LK|20
IN|12
US|47
$ awk -F"|" '
{OFS="|";a[$1"|"$2]+=$NF}
END{for (i in a){print i,a[i]}}
' file.txt
IN|A|12
LK|C|20
US|A|23
US|B|24
Hi Mr. Jadu,
Thank you for your excellent answer
Hi Juda, love to bother you again. I tried some SOAP and got the result in big XML. I want to use awk to archive the needed parameters only. Could you help me.
@awkseeker, sure, could you please put your input and expected output. Thanks.
this post is start for me ... I stumped at samething that I not sure as to how start on ... I tried so of your script that i honestly lost touch ... please help
COL1 COl2 COL3
12222 AUH FLUEH
12222 SSC OPERA
12222 SSC OPERA
(SSC shows 2 occurences & AUH1 Occurences)
12223 SSC OPERA
12223 AUH FLUEH
12223 SSC OPERA
12223 SSC OPERA
(SSC shows 3 occurences & AUH1 Occurences)
RESULT -
12222 SSC-2 AUH-1 OPERA-2 FLUEH-1
12223 SSC-3 AUH-1 OPERA-3 FLUEH-1
How shold I do this?
HI Jadu,
I have an input like this
Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7
I need to sum the last column only, my output would be:
25
@Juan, you can do something like this:
$ cat file.txt
Nanda 827 4
Lisa 221 0
Alex 755 14
Brian 56 7
$ awk '{sum+=$NF} END {print sum}' file.txt
25
Which is same as:
$ awk '{sum+=$3} END {print sum}' file.txt
25
You can check this post for other alternatives to sum numbers in a file in UNIX bash shell.
http://unstableme.blogspot.in/2009/11/sum-of-numbers-in-file-unix.html
Hope this helps.
Hi Jadu,
Great explanation above of how to do Group By's in Awk! However suppose you need to sum up 1000's of columns? Is there a better solution then the one above?
Terry
Sorry I meant what happens if you need to sum 1000's of columns but group by only a few columns,...
Terry
Hi Jadu,
Ok I came up with the following solution for cases where you have 1000's of columns and want to do a Group By:
gawk -F"|" 'NR==1 {print}
NR!=1 {OFS="|";
for (i=1; i<=43; i++)
a[i,$1]+=$(i+1)
}
END {for (i in a)
b[substr(i,index(i,"\034")+1)]="";
for (i in b){output="";
for (j=1; j<=43; j++)
output = output "," a[j,i]
print i,output
}
}' cont_details.txt
Feel free to post it on your blog! :)
Terry
Hi Jadu,
One mistake in the code I send you earlier,... All occurrences of '43' should be replaced by the number of columns you have,...
Terry
Hi Juda, I have tried to comment, but it seem not to be success, so wold bother you again.
infile:
info1=1,2
info2=a
info3=a,s
info1=1
info2=a,d
info3=f
info1=1
info2=a
info3=s
output:
1,2|a|a,s
1|a,d|f
1|a|s
tried to apply array, but no idea how to start. please advise.
@awkseeker something like this ?
$ awk -F "=" '{print $NF}' infile.txt | awk 'NR%3{printf $0"|";next;}1'
1,2|a|a,s
1|a,d|f
1|a|s
How cool this simple code is!!!!
I would further bother you on code. please explain 'condition {dosomething;next;}1'
why there are no if condition needed?
Sincerely
@awkseeker, Thanks.
Actually its acting like a if condition.
Something like, following two are same:
$ awk 'NR==3' infile.txt
$ awk '{if(NR==3) {print $0}}' infile.txt
I have made your query a blog post today, you can have a look.
http://www.unixcl.com/2013/07/unix-merge-multiple-consecutive-lines.html
Also one post on awk if else that you might like:
http://www.unixcl.com/2009/09/if-else-examples-in-awk-bash.html
Thanks!
Nice work.
Suppose I have data in 2 rows and want to show it in one row :)
Eg. Raw Data
date value flg asset
20131001 1234 B AAA
20131001 1423 S AAA
20131001 1324 B AAB
20131001 4321 S AAB
Expected output:
date b-value s-value asset
20131001 1234 1423 AAA
20131001 1324 4321 BBB
How would I do that?
Thanks.
< cont_details.txt \
perl -wMstrict -Mvars='@v,$i' -F'\|' -lane '
print,next if $. == 1;
$v[$_]->{$F[0]} += $F[$_+1] for 0..2}{$,="|";
$i=$_,print $i, map { $_->{$i} } @v for sort keys %{$v[0]}
'
Post a Comment