Friday, October 17, 2008

system in awk - external command execution


Definition:
system(cmd-line) : Execute the command cmd-line, and return the exit status.

system() in awk returns the exit status of the command rather than its actual output. The command runs, writes its output to standard output, and its exit status ("0") is what gets returned and assigned to the variable.

Input file: My input file contains unix epoch time as the first field with geo_continent as the 2nd field.

$ cat g_details.txt
1219071600|AF
1219158000|AF
1220799600|AS
1220886000|AS
1220972400|EU
1221058800|OC

Required: Convert the unix epoch times (1st field) to human readable date time format.

I had one single line script (epochcnvrt) for converting the epoch time to human readable date time format.

$ cat epochcnvrt
date --date '1970-01-01 UTC '$1' seconds'

So this is how I can execute my script epochcnvrt on each line first field of the above file.

$ awk '{ system("sh epochcnvrt "$1)} {print $2}' FS="|" g_details.txt

The output:

Mon Aug 18 15:00:00 UTC 2008
AF
Tue Aug 19 15:00:00 UTC 2008
AF
Sun Sep 7 15:00:00 UTC 2008
AS
Mon Sep 8 15:00:00 UTC 2008
AS
Tue Sep 9 15:00:00 UTC 2008
EU
Wed Sep 10 15:00:00 UTC 2008
OC

5 comments:

Michael Chen said...

it is an interesting example. I have an opposite problem, in my file I have date in human readable format, and I would like to convert it to seconds since epoch time, see the second field. adapting your script didn't work out. Any suggestions?
=========================
41538,2000-06-12 13:25:38.000,1.470000
=========================

Also in the epchcvrt, the usage of '$1' is confusing, shouldn't we use "$1" to let the shell expand the parameter? but your usage works, I am kind of lost. any explanation? thanks!

Jadu Kumar Saikia said...

Michale Chen,

Thanks for commenting.

For the opposite one (i.e. human readable to epoch seconds) you need to have your convert script like this:

$ cat myep
date +%s -d"$1"

As

$ date +%s -d"2000-06-12 13:25:38"
960796538

But it does not support the date format you have mentioned (as it has .000); probably you need to remove this extra piece.

Please let me know if you need any help on this.

And the other question: your confusion is valid, but in this case as I am using single quote ' as the outer quote, inside double quote is not going to work, you need to put single quote for the shell to expand the variable.

e.g.

$ VAR=1221058800

$ echo $VAR
1221058800

$ echo "$VAR"
1221058800

$ echo '$VAR'
$VAR

$ echo 'my var is '$VAR''
my var is 1221058800

$ echo 'my var is "$VAR"'
my var is "$VAR"

Please let me know for any doubt or queries; I would love to answer them, thank you.

//Jadu

Michael Chen said...

thanks. however even if I delete the extra .000, the code doesn't work as expected. the calculation is not what I expected, i.e., 2000-06-12 23:01:06 should be 960865266 seconds from Epoch. However the script would give me 960782400
seconds from Epoch, which actually corresponds to 2000-06-12 00:00:00. it seems that in awk the usage
--------------------
system("command"$3)
--------------------
is not going to expand as what we expected if $3 itself contains a space. in this case, $3="2000-06-12 23:01:06" in awk, but the system command only get the first part of it due to the space. the usage
--------------------
system("command"'$3')
--------------------
will simply be wrong again, since awk is prevented from expanding $3.

my conclusion is that awk can only pass field to system command if the field doesn't contain space and other special characters meaningful for the shell. Within shell programming, we can use quota to escape these special characters, however using quota inside awk, such as "$3", we prevent awk to send out the field.

By the way, it seems that single quota within single quota is better to read as two consecutive pair of single quotas. in your example
'my variable is '$1''
is equivalent to
'my variable is '$1

all the best,
Michael

Martin said...

@Michael Chen: I've managed to pass strings with spaces to the command line by escaping the quotes:
--------------------
system("command\""$3"\"")
--------------------
That is, the variable needs to be quoted in the shell, so you'll have to pass the quotes from awk, too

newfather888 said...

Excellent Jaidu
I have used your example to a certain extent but with getline and
my epochcvd converts to yyyymmddhh24miss format.

I have a csv and this works fine with one epoch field:

awk -F, '{ system("epochcvd "$1 | getline $1) close(epochcvd)} {print $1","$2"}'

Input of
1220886000,AF
1220799600,AF
1347260420,AS

returns this Output:
20080908160000,AF
20080907160000,AF
20120910080020,AS

But now I have a third field which is also an epoch timestamp

My input of
1347260420,AS,1347260430
1220886000,US,1220886020
1220799600,US,1220799620

turns out like this:
20120910080020,AS,20120910080030
20080908160000,US,1220886000
20080907160000,US,1220799600

i.e. only the first record gets all the fields converted

My new awk statement is
awk -F, '{ system("epochcvd "$1 | getline $1 ) }
{ system("epochcvd "$3 | getline $3 ) close(epochvd) }
{print $1","$2","$3"}'

Can you see anything wrong?

© Jadu Saikia www.UNIXCL.com