Tuesday, October 30, 2012

Time bucketing using Awk in Unix


Each line of input file file.txt is having UNIX epoch timestamp and some value.
$ cat file.txt 
Epoch,Value
1351605000,120
1351605060,130
1351605120,340
1351605180,200
1351605240,120
1351605300,890
1351605360,124
1351605420,450
1351605480,120
1351605540,120
1351605600,200
1351605660,120
1351605720,340
1351605780,670
1351605840,990
The UNIX epoch timestamps are of 1 minute interval. Converting the UNIX epoch timestamps to human readable format using awk 'strftime' function:
$ awk 'BEGIN {FS=OFS=","} NR != 1 {$1=strftime("%c",$1)} {print}' file.txt 
Output:
Epoch,Value
Tue 30 Oct 2012 01:50:00 PM GMT,120
Tue 30 Oct 2012 01:51:00 PM GMT,130
Tue 30 Oct 2012 01:52:00 PM GMT,340
Tue 30 Oct 2012 01:53:00 PM GMT,200
Tue 30 Oct 2012 01:54:00 PM GMT,120
Tue 30 Oct 2012 01:55:00 PM GMT,890
Tue 30 Oct 2012 01:56:00 PM GMT,124
Tue 30 Oct 2012 01:57:00 PM GMT,450
Tue 30 Oct 2012 01:58:00 PM GMT,120
Tue 30 Oct 2012 01:59:00 PM GMT,120
Tue 30 Oct 2012 02:00:00 PM GMT,200
Tue 30 Oct 2012 02:01:00 PM GMT,120
Tue 30 Oct 2012 02:02:00 PM GMT,340
Tue 30 Oct 2012 02:03:00 PM GMT,670
Tue 30 Oct 2012 02:04:00 PM GMT,990
Required: Group each 5 minutes entries together to one time bucket and perform a corresponding sum(Value) of the entries. E.g. All entries from '01:50:00 PM' (1351605000) till '01:54:00 PM' (1351605240) should be grouped together to '01:50:00 PM' (1351605000) time bucket, '01:55:00 PM' (1351605300) till '01:59:00 PM' (1351605540) to '01:55:00 PM' (1351605300) bucket and so on.
$ awk 'BEGIN {FS=OFS=","}  NR != 1 {arr[$1-($1%300)]+=$2} END {for (i in arr) print i,arr[i]}' file.txt 
Output:
 
1351605000,910
1351605300,1704
1351605600,2320
Converting UNIX epoch timestamps to human readable format:
$ awk 'BEGIN {FS=OFS=","}  NR != 1 {arr[$1-($1%300)]+=$2} END {for (i in arr) print i,arr[i]}' file.txt  | awk '
BEGIN {FS=OFS=","} {$1=strftime("%c",$1)} {print}'
Output:
Tue 30 Oct 2012 01:50:00 PM GMT,910
Tue 30 Oct 2012 01:55:00 PM GMT,1704
Tue 30 Oct 2012 02:00:00 PM GMT,2320
Related posts:
- Grouping files using Awk in Bash shell
- Awk convert epoch to date format
- SUM and GROUP BY using Awk in UNIX post1 post2
- COUNT number of occurrences using Awk
- Awk Associative array

3 comments:

Derek Schrock said...

Keep in mind this is gawk not awk. You awk is either symlinked to gawk or is the gawk binary?

Derek Schrock said...

awk doesn't have strftime, this is gawk not awk. Either your awk in PATH is a symlink to gawk or the awk binary is really gawk. gawk is fine however it's best to let me people know this might not be a very portable solution since gawk isn't POSIX.

Unknown said...

@Derek Schrock, thanks.

© Jadu Saikia www.UNIXCL.com