Tuesday, February 17, 2009

Handling argument list too long - bash


I have nearly 200,000 files in one of my log directory out of which number of files of the name format "ka.log.*" is 120,000. So whenever I try to do apply some command such as rm, ls or cp etc on those big set of "ka.log.*" files, I used to get

$ ls ka.log.*
bash: /bin/ls: Argument list too long

$ cp ka.log.* new/
bash: /bin/cp: Argument list too long

$ mv ka.log.* new/
bash: /bin/mv: Argument list too long

$ rm ka.log.*
bash: /bin/rm: Argument list too long

"Argument list too long" error for the above commands is due to the limitation of the command (rm, mv, ls, cp) to handle large number of files(arguments).

Linux 'find' command is useful to perform these operations (ls, cp, mv or rm etc) on such big set of files/arguments.

e.g.

To copy those "ka.log.*" to directory /somedir

$ find . -name "ka.log.*" -exec cp {} /somedir/ \;

Looping through while:

find . -name "ka.log.*" | while read FILE
do
...
<some operation on $FILE>
...
done


Another way is to assign the file names to a variable, e.g.

FILES=$(echo /mydir/ka.log.*)

for FILE in $FILES
do
...
<some operation on $FILE>
...
done

12 comments:

housetier said...

maybe also xargs could help dealing with so many files. Just read about it somewhere on nixCraft.tld so its still fresh in my memory

Ian Kelling said...

I gotta call you out:

FILES=$(echo /mydir/ka.log.*)

instead do

FILES=(/mydir/ka.log.*)

good stuff though.

Jadu Saikia said...

@housetier, ya xargs can be sued with find for operations like rm .. etc. Thanks for commenting.
e.g.
$ find . -name "ka.log.*" | xargs rm

Jadu Saikia said...

@Ian, thanks for commenting.

FILES=(/mydir/ka.log.*)
is not going to work (even for smaller set of files); variable FILES in going to hold only one file name here.

FILES=$(echo /mydir/ka.log.*)
is going to work here.

Jadu Saikia said...

One interesting article on the same topic
http://www.linuxjournal.com/article/6060

Ian Kelling said...

your echo thing does not handle whitespace, etc correctly. Mine puts them all in an array, which does handle whitespace etc correctly. Saying it only puts 1 file? I'm dissapointed in you.

Jadu Saikia said...

@Ian, please dont;t be disappoint with me :-)
I feel I have to prove with an example:

$ ls new10/ka.log.*
new10/ka.log.1 new10/ka.log.2 new10/ka.log.3

$ FILES=(./new10/ka.log.*)
$ echo $FILES
./new10/ka.log.1

$ FILES1=$(echo ./new10/ka.log.*)
$ echo $FILES1
./new10/ka.log.1 ./new10/ka.log.2 ./new10/ka.log.3

leprasmurf said...

find . -name "ka.log.*" -exec cp {} /somedir/ \;

wouldn't handle whitespace either.

find . -name "ka.log.*" -exec cp '{}' /somedir/ \;

Jadu Saikia said...

@leprasmurf, thanks for your comment. Keep in touch.

navaho said...

Hello Jadu,

I stumbled on your blog through google looking for something else, but was browsing through your "bash tricks" posts because, well, an old dog can always learn some new ones... ;-)

Anyway, I just wanted to let you know that the method that Ian suggests works perfectly. Though perhaps you used it incorrectly yielding only one result -- as the for..in has to be slightly modified.

Try this:

FILES=(/tmp/*)
for FILE in "${FILES[@]}"; do
cp "$FILE" /somedir/
done;

(Note those quotes around the ${FILES[@]} are necessary to make whitespaces being handled correctly.)

IMHO this method is much cleaner than using "echo".

Jadu Saikia said...

My bad. Thanks Navaho, Ian.

I went wrong as:

$ FILES=(/root/demo/*)

I was trying this (and I just saw one file)

$ echo $FILES
/root/demo/log.1

I could have tried:

$ echo "${FILES[@]}"

And this works perfectly:

$ for FILE in "${FILES[@]}"; do ls $FILE; done

Thank you so much for pointing this.

Anirudh said...

> find . -name "ka.log.*" -exec cp {} /somedir/ \;

The above construct is the safest way to copy files (even ones that have whitespaces or IFS characters
in them)

Someone mentioned that we need '{}'
Please note that , that is ***unneccessary***
Simply bcoz, -exec is not shell.
It is the shell that strips of the white spaces etc. before handing off the commands to utilities.
With -exec you bypass the shell altogether & hand over the command to the "cp" util in this case.

However, there's one issue with the find command here in that of the speed. It invokes the "cp" util. once per file, which can be slow, esp. when thousands of files are to be copied.
We can speed it up by means of the {} + construct in place of {} \;

like as,

find . -type f -name ka.log.\* -exec sh -c 'shift $1; cp -- "$@" /somedir/.' 2 1 {} +

© Jadu Saikia www.UNIXCL.com