Sunday, January 10, 2010

Randomize lines using Linux rl command


Input file:

$ cat leader.txt
Mr B
Mrs C
Mrs A
Mrs X
Mr Y

Question: How to read a random line from the above file in Unix command line?

A few solutions:

Using UNIX/Linux shuf command:

$ shuf -n 1 leader.txt
Mr B

Using sed with bash RANDOM variable:

$ sed -n $((RANDOM%$(wc -l < leader.txt)+1))p leader.txt
Mrs C

Recently I came across UNIX/Linux rl command (randomize-lines) and the solution using 'rl' will be:

$ rl -c 1 leader.txt

'rl' is similar to UNIX/Linux 'shuf' utility which can be used to randomizes the lines in a file.

rl was developed by Arthur de Jong

On my ubuntu desktop, I installed rl utility in this way:

$ sudo apt-get install randomize-lines

From rl(1) man page:

rl reads lines from a input file or stdin, randomizes the lines and outputs a
specified number of lines.
It does this with only a single pass over the input while trying to use as little
memory as possible.

Few of its important command lines options are:

-c, --count=N
Select the number of lines to be returned in the output.
If this argument is omitted all the lines in the file will be returned in random order.
If the input contains less lines than specified and the --reselect option below is not
specified a warning is printed and all lines are returned in random order.

-o, --output=FILE
Send randomized lines to FILE instead of stdout.

-d, --delimiter=DELIM
Use specified character as a "line" delimiter instead of the newline character.

-n, --line-number
Output lines are numbered with the line number from the input file.

-r, --reselect
When using this option a single line may be selected multiple times.
The default behavior is that any input line will only be selected once.
This option makes it possible to specify a --count option with more lines than
the file actually holds.

The program uses the rand() system random function.
This function returns a number between 0 and RAND_MAX,
which may not be very large on some systems.
This will result in non-random results for files containing more lines than RAND_MAX.

Few of 'rl' common uses:

Randomize lines of file 'leader.txt':

$ rl leader.txt
Mr Y
Mrs C
Mrs X
Mr B
Mrs A

With -n option:

$ rl -n leader.txt
2: Mrs C
1: Mr B
3: Mrs A
5: Mr Y
4: Mrs X

Sending stdout to a file using -o option:

$ rl -o /tmp/leader.txt.sfl -n -c 3 leader.txt
$ cat /tmp/leader.txt.sfl
1: Mr B
4: Mrs X
2: Mrs C

Shuffle the words of a sentence/string:

$ echo -n "A,B,C,D,E," | rl -d ","

And a practial use inspired from 'rl' man page:

Play a random .mp3 song (from ~/lovesongs/ directory) after 5 minutes:

$ sleep 300 ; play $(find ~/lovesongs/ -name "*.mp3" -print | rl -c 1)

3 comments:

Jake said...

Ah, that's good to know; thanks. Last time I did this I used:

$ cat file | sort -R | head -n 1

If I remember correctly, there is no -R flag on OS X and possibly other versions of sort.

Jadu Saikia said...

@Jake, ya -R is a good option with sort.

$ cat file | sort -R --random-source=/dev/urandom
(Anyway this random source is by default)

--random-source=/dev/zero
always gives the same order.

Thanks for your comment.

Linux/Unix/System Admin etc said...

very nice tutorials...Thanks for your time.

© Jadu Saikia www.UNIXCL.com