Xargs has option that allows you to take advantage of multiple cores in your machine.
Its -P option which allows xargs to invoke the specified command multiple times in parallel.
From XARGS(1) man page:
Related posts:
- UNIX handle kill when no PID available
-P max-procs Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option with -P; otherwise chances are that only one exec will be done. -n max-args Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the -x option is given, in which case xargs will exit. -i[replace-str] This option is a synonym for -Ireplace-str if replace-str is specified, and for -I{} otherwise. This option is deprecated; use -I instead.Let me try to give one example where we can make use of this parallel option avaiable on xargs. e.g. I got these 8 log files (each one is of 1.5G size) for which I have to run a script named count_pipeline.sh which does some calculation around the log lines in the log file.
$ ls -1 *.out log1.out log2.out log3.out log4.out log5.out log6.out log7.out log8.outThe script count_pipeline.sh takes nearly 20 seconds for a single log file. e.g.
$ time ./count_pipeline.sh log1.out real 0m20.509s user 0m20.967s sys 0m0.467sIf we have to run count_pipeline.sh for each of the 8 log files one after the other, total time needed:
$ time ls *.out | xargs -i ./count_pipeline.sh {} real 2m45.862s user 2m48.152s sys 0m5.358sRunning with 4 parallel processes at a time (I am having a machine which is having 4 CPU cores):
$ time ls *.out | xargs -i -P4 ./count_pipeline.sh {} real 0m44.764s user 2m55.020s sys 0m6.224sWe saved time ! Isn't this useful ? You can also use -n1 option instead of the -i option that I am using above. -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args).
$ time ls *.out | xargs -n1 -P4 ./count_pipeline.sh real 0m43.229s user 2m56.718s sys 0m6.353s
Related posts:
- UNIX handle kill when no PID available
2 comments:
If you like xargs -P you might want to check out GNU Parallel, which has much better control of how the jobs are run: http://pi.dk/1/ http://www.gnu.org/software/parallel/parallel_tutorial.html
@Ole Tange, thanks a lot for sharing this, this is useful.
Post a Comment