Monday, August 31, 2009

Sort strings by length using awk and bash


Input file:

$ cat usragnt.txt
Win3.11 for Windows 3.11
WinNT3.51 for Windows NT 3.11
WinNT4.0 for Windows NT 4.0
Windows NT 5.0 for Windows 2000
Windows NT 5.1 for Windows XP
Windows NT 5.2 for Windows Server 2003; Windows XP x64
Windows NT 6.0 for Windows Vista
Win95 for Windows 95
Win98 for Windows 98
Win 9x 4.90 for Windows Me
WindowsCE for Windows CE


Required: I was trying to find the length of the longest useragent value from the above file.
This is how we can sort a particular field based on its length.

The awk one liner:

$ awk '{ print length($0),$0 | "sort -n"}' usragnt.txt

Output:

20 Win95 for Windows 95
20 Win98 for Windows 98
24 Win3.11 for Windows 3.11
24 WindowsCE for Windows CE
26 Win 9x 4.90 for Windows Me
27 WinNT4.0 for Windows NT 4.0
29 WinNT3.51 for Windows NT 3.11
29 Windows NT 5.1 for Windows XP
31 Windows NT 5.0 for Windows 2000
32 Windows NT 6.0 for Windows Vista
54 Windows NT 5.2 for Windows Server 2003; Windows XP x64

Sunday, August 30, 2009

Linux comm command brief tutorial

From COMM(1) man page, the options available are:

-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files

Input files:

$ cat a.txt
sl-9023
sl-2112
sl-9029
sl-1210
sl-1215

$ cat b.txt
sl-9029
sl-9023
sl-1215
sl-2112
sl-9012
sl-9016


1) To find only those lines which are common to both the files

$ comm -12 a.txt b.txt
sl-9029

But the above output is wrong as we can see there are nearly 4 lines common between a.txt and b.txt.

As the man pages of Linux COMM(1) command says:

comm - compare two sorted files line by line

Lets sort the files:

$ sort -o /tmp/a.txt.srt a.txt
$ sort -o /tmp/b.txt.srt b.txt

Now

$ comm -12 /tmp/a.txt.srt /tmp/b.txt.srt
sl-1215
sl-2112
sl-9023
sl-9029

or Using bash process substitution technique (without creating those temporary files)

$ comm -12 <(sort a.txt) <(sort b.txt)
sl-1215
sl-2112
sl-9023
sl-9029

From Linux GREP(1) man pages,
-f FILE, --file=FILE (Obtain patterns from FILE, one per line)

$ grep -f a.txt b.txt
sl-9029
sl-9023
sl-1215
sl-2112

2) Find lines which are unique to first file (a.txt) only (w.r.t 2nd file b.txt)

$ comm -23 <<(sort a.txt) <(sort b.txt)
sl-1210

3) Find lines which are unique to second file (b.txt) only (w.r.t 1st file a.txt)

$ comm -13 <(sort a.txt) <(sort b.txt)
sl-9012
sl-9016

With no options, produce three-column output.
Column one contains lines unique to FILE1,
column two contains lines unique to FILE2,
and column three contains lines common to both files.

$ comm <(sort a.txt) <(sort b.txt)
sl-1210
sl-1215
sl-2112
sl-9012
sl-9016
sl-9023
sl-9029

Thursday, August 27, 2009

Bash - save command without executing it

In Bash, if you have typed a very long command (applicable to shorter commands too :-)), and then realize you don't want to execute it yet, don't delete it.

Then what ?

Ans :

Simply append a # to the beginning of the line (command), and then hit enter.

So bash is not going to execute this command (as its commented), but will store it in history. So later you can go back, remove the # from the front, and execute it.

e.g.





Lets see my history (output of history command)





Well, if you are wondering how I have added the "command execution timestamp" to the history command output, here is the tip

And if you want to know how I made my Primary bash prompt (PS1) a colorful one, here is another tip

Wednesday, August 26, 2009

Search and print output using awk

Input files:

$ cat file1
id89
id21
id90
id12

$ cat file2
0|id12|QE|T
4|id89|AX|N
8|id20|AU|K
9|id90|AW|P
3|id21|PP|A
7|id13|LP|O

Required: Look-up the file1 fields(Ids) in file2(in 2nd field) and print the full record from file2.

Already I have made a lot of posts to perform file look-up in awk. Lets see some more alternatives to this.


$ awk 'NR==FNR{_[$1];next}$2 in _' FS=\| file1 file2

0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A

Another awk solution would be:

$ awk '{
if (NF==1)
_[$0]=$0
else
for ( i in _)
if ($2==i)
print
}
' FS=\| file1 file2

0|id12|QE|T
4|id89|AX|N
9|id90|AW|P
3|id21|PP|A

A simple bash script:

for id in $(cat file1)
do
awk -F "|" -v x=$id '$2==x {print}' file2
done

4|id89|AX|N
3|id21|PP|A
9|id90|AW|P
0|id12|QE|T


Using bash join(1) : Here the files needs to be sorted on the field of join.
Using bash process substitution technique to avoid creation of temporary files.

$ join -t"|" -j1 1 -j2 2 <(sort file1) <(sort -t"|" -k2 file2)

id12|0|QE|T
id21|3|PP|A
id89|4|AX|N
id90|9|AW|P


Related post:
- Delete lines based on another file using awk
-Update file based on another file using awk
-Perform join using awk
-Update a file based on another file using sed

Sunday, August 16, 2009

Favorite command line trick in Bash

A question to all unstableme followers, readers and visitors

Whats your favorite command line trick in Bash ?

My favorite one is Bash process substitution:


diff <(sort file1) <(sort file2)


Please comment your favorite one. Bash useful command, command line trick, One liner, vi editor tip or anything related to bash. Please.

Thanks,
unstableme

Friday, August 14, 2009

Awk - find column number of a pattern

Input file contains the result of annual day sports meet for a class.

The format is:

gameId:1st Place Student ID,2nd Place Student ID and so on


$ cat file.txt
Gid034:s9823,s1290,s9034,s1230
Gid309:s9034,s5678,s1293,s4590
Gid124:s2145,s9008,s2381,s0234
Gid213:s9012,s9034,s8913,s9063


Required:
We need to search the above file and find out the details (Game ID, line number where the student is found and its Rank in that game) of student ID say "s9034"

The awk solution:


$ awk '
BEGIN{FS="[:,,]"}
{ for(i=1;i<=NF;i++){
if ($i ~ /\<s9034\>/)
{print "GID="$1"(line no "NR")","Rank " i-1 } }
}' file.txt


output of the above script:


GID=Gid034(line no 1) Rank 3
GID=Gid309(line no 2) Rank 1
GID=Gid213(line no 4) Rank 2


Important learn from the above post:

- How to specify multiple field separator i.e. FS in AWK (read my earlier post)

Note this:

$ echo "x:a,b,c" | awk 'BEGIN{FS="[:,,]"} {print $1,$3}'

o/p:

x b

Wednesday, August 12, 2009

Bash script for sequential subtraction of numbers

Input file epoch.txt contains UNIX epoch time in the following format.

$ cat epoch.txt
1249887102
1249887121
1249887181
1249887241
1249887433
1249887481
1249887541
1249887601
1249887661

Required:
Subtract 1st number from 2nd number(i.e. 2nd-1st),3rd number minus 2nd number and so on ...

The bash script:

#!/bin/sh

FILE=epoch.txt
N=1
total=$(sed -n '$=' $FILE)
until [ "$N" -eq $total ]
do
S1=$N
((S2=N+1))
N=$S2
VAL1=$(sed -n "$S1 p" $FILE)
VAL2=$(sed -n "$S2 p" $FILE)
echo "$VAL2 - $VAL1 = $((VAL2 - VAL1))"
done




$ sh difcal.sh
1249887121 - 1249887102 = 19
1249887181 - 1249887121 = 60
1249887241 - 1249887181 = 60
1249887433 - 1249887241 = 192
1249887481 - 1249887433 = 48
1249887541 - 1249887481 = 60
1249887601 - 1249887541 = 60
1249887661 - 1249887601 = 60

Wednesday, August 5, 2009

Remove path from find result in Bash

Find all .sh files from current directory.

$ find . -name "*.sh"

./als.sh
./engine/a.sh
./engine/mysc.sh
./tools/new/slv.sh
./tools/instant-parse/iparse.sh

Now if we need to remove the path part from the above find command output:

$ find . -name "*.sh" -exec basename {} \;

als.sh
a.sh
mysc.sh
slv.sh
iparse.sh


Same as

$ find . -name "*.sh" | xargs -i basename {}

And using sed:

$ find . -name "*.sh" | sed 's!.*/!!'

i.e. replace all characters until last / by nothing (here ! is used as the sed delimiter instead of the usual /)

Sunday, August 2, 2009

Array sorting with Awk asort function

asort is a gawk-specific extension which sorts an array. More of asort can be found here and here

Lets see some array sorting example using awk asort function.
Input file:

$ cat file
KP
AS
DN

AO
BM
SE

ZW
ER
AI

Required output: Sort records in individual paragraph (each paragraph being separated by a newline), i.e. the output required is:

AS
DN
KP

AO
BM
SE

AI
ER
ZW


Awk solution using asort:

$ awk '
BEGIN{RS=""}
{
n=split($0,arr)
asort(arr)
for(i=0;i<=n;i++)
print arr[i]
}' file


Another example using asort: Horizontal sorting of fields in a file using awk asort.

Input file:

$ cat file
435 121 1 32
87 644 12 34
323 121 111 10


Required output: Sort the records in each row in descending order. i.e. output required is:

435 121 32 1
644 87 34 12
323 121 111 10

Awk solution:

$ awk '{
split($0,a)
asort(a)
for(i=NF;i>0;i--){
printf("%s ",a[i])
}
print ""
}' file

© Jadu Saikia www.UNIXCL.com