Wednesday, December 23, 2009

Bash cat command space issue explained


Input file contains some 4 student names like this:

$ cat file.txt
Alex C M
Peter S
Dhiren K
Prahlad G N

Required: I was trying to produce the following output:

1) Alex C M [3]
2) Peter S [2]
3) Dhiren K [2]
4) Prahlad G N [3]

i.e. a serial number, Name of the student, number of words in his name.
Lets try with bash for loop like this:

$ cat lp1.sh
#!/bin/sh

c=0
for line in $(cat file.txt)
do
((c+=1))
numfields=$(echo $line | awk '{print NF}')
echo "$c) $line [$numfields]"
done

And the output it produced !

$ ./lp1.sh
1) Alex [1]
2) C [1]
3) M [1]
4) Peter [1]
5) S [1]
6) Dhiren [1]
7) K [1]
8) Prahlad [1]
9) G [1]
10) N [1]

So what went wrong ?
I tried echo "$line" as well, same output.

In the above example, we need to take care of the Bash IFS environmental variable. From Bash man page:

IFS:
The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read built in command.
The default value is
<space><tab><newline>.

And since the lines in the input file got lines with spaces in between, above script is behaving in that way.
We can temporarily change the IFS in the shell script like this:

$ cat lp3.sh
#!/bin/sh

OLD_IFS=$IFS
IFS=$'\n'
c=0
for line in $(cat file.txt)
do
((c+=1))
numfields=$(echo $line | awk '{print NF}')
echo "$c) $line [$numfields]"
done
IFS=$OLD_IFS

Output:

$ ./lp3.sh
1) Alex C M [3]
2) Peter S [2]
3) Dhiren K [2]
4) Prahlad G N [3]

Bash 'while loop' used in the below way also works without changing the IFS:

$ cat lp2.sh
#!/bin/sh

c=0
while read line
do
((c+=1))
numfields=$(echo $line | awk '{print NF}')
echo "$c) $line [$numfields]"
done < "file.txt"

Output:

$ ./lp2.sh
1) Alex C M [3]
2) Peter S [2]
3) Dhiren K [2]
4) Prahlad G N [3]

Note: The above example is taken mainly to show the use of Bash IFS variable. Using awk, the above can be done easily like this:

$ awk '{print NR")",$0,"["NF"]"}' file.txt
$ awk '{++c}{print c")",$0,"["NF"]"}' file.txt

Some related posts:

- Bash script while loop sum issue explained
- Bash script for sequential subtraction of numbers
- Use of until loop in bash scripting

1 comment:

Karan Bohra said...

Input file contains some 4 student names like this:

$ cat file.txt
Alex C M
Peter S
Dhiren K
Prahlad G N

Required: I was trying to produce the following output:

1) Alex C M [3]
2) Peter S [2]
3) Dhiren K [2]
4) Prahlad G N [3]
###################################

Here's how one can do that wholly with "sed":

< file.txt \
sed -e '
#
# Notation:
#
# $. = current line number
# $_ = current line
# $# = number of fields in line
# ps = pattern space register
# hs = hold space register

1{x;s/.*/0/;x}

/./!s/^/ /

H # hs <= $.-1 \n $_

y/\t/ /
s/[ ][ ][ ]*/ /g
s/[^ ][^ ]*/!/g
s/[ ][ ]*//g

:a
s/!!!!!!!!!!/_/g
s/_\([0-9]*\)$/_0\1/
s/!!!!!!!!!/9/
s/!!!!!!!!/8/
s/!!!!!!!/7/
s/!!!!!!/6/
s/!!!!!/5/
s/!!!!/4/
s/!!!/3/
s/!!/2/
s/!/1/
y/_/!/
ta
/./!s/^/0/

H # hs <= $.-1 \n $_ \n $#
g # ps <= $.-1 \n $_ \n $#
s/\n.*// # ps <= $.-1

x
s/\n/&&/; s/.*\n\n//
x

# hs <= $_ \n $#
# ps <= $.-1

:b
s/9\(_*\)$/_\1/
tb

s/^\(_*\)$/1\1/; tc
s/8\(_*\)$/9\1/; tc
s/7\(_*\)$/8\1/; tc
s/6\(_*\)$/7\1/; tc
s/5\(_*\)$/6\1/; tc
s/4\(_*\)$/5\1/; tc
s/3\(_*\)$/4\1/; tc
s/2\(_*\)$/3\1/; tc
s/1\(_*\)$/2\1/; tc
s/0\(_*\)$/1\1/

:c
y/_/0/

# hs <= $_ \n $#
# ps <= $.

G # ps <= $. \n $_ \n $#
h # hs <= ps

s/\n/) /; s/\n/ [/; s/$/]/
# ps <= $.) $_ [$#]

x
s/\n.*//
x

# hs <= $.
# ps <= $.) $_ [$#]
'


Another method with bash:

k=1
while IFS= read -r Line; do
set -f; set X $Line; shift
echo "$k) $Line [$#]"
k=$(expr $k + 1)
done < file.txt

Or this,

perl -pale '$_ = "$.) $_ [@{[0+@F]}]"' < file.txt

© Jadu Saikia www.UNIXCL.com