Sunday, April 27, 2014

Unix xargs parallel execution of commands

Xargs has option that allows you to take advantage of multiple cores in your machine. Its -P option which allows xargs to invoke the specified command multiple times in parallel. From XARGS(1) man page:
-P max-procs
   Run up to max-procs processes at a time; the default is 1.  If max-procs is 0, xargs will run as many processes as possible at a time.   Use  the  -n  option
   with -P; otherwise chances are that only one exec will be done.

-n max-args
    Use at most max-args arguments per command line.  Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the  -x  
    option is given, in which case xargs will exit.

    This option is a synonym for -Ireplace-str if replace-str is specified, and for -I{} otherwise.  This option is deprecated; use -I instead.
Let me try to give one example where we can make use of this parallel option avaiable on xargs. e.g. I got these 8 log files (each one is of 1.5G size) for which I have to run a script named which does some calculation around the log lines in the log file.
$ ls -1 *.out
The script takes nearly 20 seconds for a single log file. e.g.
$ time ./ log1.out

real 0m20.509s
user 0m20.967s
sys 0m0.467s
If we have to run for each of the 8 log files one after the other, total time needed:
$ time ls *.out | xargs -i ./ {}           

real 2m45.862s
user 2m48.152s
sys 0m5.358s
Running with 4 parallel processes at a time (I am having a machine which is having 4 CPU cores):
$ time ls *.out | xargs -i -P4 ./ {} 

real 0m44.764s
user 2m55.020s
sys 0m6.224s
We saved time ! Isn't this useful ? You can also use -n1 option instead of the -i option that I am using above. -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args).
$ time ls *.out | xargs -n1 -P4 ./

real 0m43.229s
user 2m56.718s
sys 0m6.353s

Related posts:
- UNIX handle kill when no PID available 


Ole Tange said...

If you like xargs -P you might want to check out GNU Parallel, which has much better control of how the jobs are run:

Unknown said...

@Ole Tange, thanks a lot for sharing this, this is useful.

© Jadu Saikia