Sunday, April 27, 2014

Unix xargs parallel execution of commands


Xargs has option that allows you to take advantage of multiple cores in your machine. Its -P option which allows xargs to invoke the specified command multiple times in parallel. From XARGS(1) man page:
-P max-procs
   Run up to max-procs processes at a time; the default is 1.  If max-procs is 0, xargs will run as many processes as possible at a time.   Use  the  -n  option
   with -P; otherwise chances are that only one exec will be done.

-n max-args
    Use at most max-args arguments per command line.  Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the  -x  
    option is given, in which case xargs will exit.

-i[replace-str]
    This option is a synonym for -Ireplace-str if replace-str is specified, and for -I{} otherwise.  This option is deprecated; use -I instead.
Let me try to give one example where we can make use of this parallel option avaiable on xargs. e.g. I got these 8 log files (each one is of 1.5G size) for which I have to run a script named count_pipeline.sh which does some calculation around the log lines in the log file.
$ ls -1 *.out
log1.out
log2.out
log3.out
log4.out
log5.out
log6.out
log7.out
log8.out
The script count_pipeline.sh takes nearly 20 seconds for a single log file. e.g.
$ time ./count_pipeline.sh log1.out

real 0m20.509s
user 0m20.967s
sys 0m0.467s
If we have to run count_pipeline.sh for each of the 8 log files one after the other, total time needed:
$ time ls *.out | xargs -i ./count_pipeline.sh {}           

real 2m45.862s
user 2m48.152s
sys 0m5.358s
Running with 4 parallel processes at a time (I am having a machine which is having 4 CPU cores):
$ time ls *.out | xargs -i -P4 ./count_pipeline.sh {} 

real 0m44.764s
user 2m55.020s
sys 0m6.224s
We saved time ! Isn't this useful ? You can also use -n1 option instead of the -i option that I am using above. -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args).
$ time ls *.out | xargs -n1 -P4 ./count_pipeline.sh

real 0m43.229s
user 2m56.718s
sys 0m6.353s

Related posts:
- UNIX handle kill when no PID available 

Friday, July 26, 2013

Unix - merge multiple consecutive lines

Input file:
$ cat infile.txt 
aid=33
pw=3
nn=90
aid=32
pw=30
nn=70
aid=56
pw=3
nn=93
Required:
Combine or merge every three consecutive lines of the above file so that the output becomes:
aid=33,pw=3,nn=90
aid=32,pw=30,nn=70
aid=56,pw=3,nn=93
Awk solution: If line number is divisible by 3 then put a new line(\n) else put a comma(,) i.e.
$ awk '{printf("%s%s", $0, (NR%3 ? "," : "\n"))}' infile.txt 
aid=33,pw=3,nn=90
aid=32,pw=30,nn=70
aid=56,pw=3,nn=93
Another way using Awk:
$ awk 'NR%3{printf $0",";next;}1' infile.txt 
aid=33,pw=3,nn=90
aid=32,pw=30,nn=70
aid=56,pw=3,nn=93
Using UNIX paste command:
$ paste -d"," - - - < infile.txt 
aid=33,pw=3,nn=90
aid=32,pw=30,nn=70
aid=56,pw=3,nn=93
A bash command line solution:
$ while read line1; do read line2; read line3; echo "$line1,$line2,$line3"; done < infile.txt 
aid=33,pw=3,nn=90
aid=32,pw=30,nn=70
aid=56,pw=3,nn=93
Related posts:
  1. Join multiple lines using Awk
  2. Combine related consecutive lines using Awk
  3. Merging lines in UNIX

Wednesday, May 8, 2013

Unix - Append 0 to single digit date

Input file file.txt has dates in month/day/year format.
$ cat file.txt 
3/4/2013
3/10/2013
10/4/2013
12/10/2012
Required: Add prefix 0 to first and second field if its a single digit.

Awk solution:
$ awk 'BEGIN {FS=OFS="/"} 
    { 
 if (length($1) == 1) $1="0"$1
 if (length($2) == 1) $2="0"$2
        { print }
}' file.txt
Output:
03/04/2013
03/10/2013
10/04/2013
12/10/2012
Related posts:
- A newbie tutorial on Unix Awk
- Awk if else
- Convert date format in unix using awk and sed

© Jadu Saikia www.UNIXCL.com