Thursday, July 9, 2009

Print all duplicate lines using awk


Input file:

$ cat cnr.txt
HIDDENHAUSEN|99.60
FIEBERBRUNN|99.07
MELLENDORF|99.04
HERBSTEIN|99.02
ACHTERWEHR|98.82
GOLM|98.82
PARA|98.82
BOGEN|98.61
SAINTANDRE|98.55
OLSZTYN|98.61
HYDERABAD|99.02

Output required: Print those lines for which 2nd field has occurred more than once. i.e. required o/p:

HERBSTEIN|99.02
ACHTERWEHR|98.82
GOLM|98.82
PARA|98.82
BOGEN|98.61
OLSZTYN|98.61
HYDERABAD|99.02

Awk solution:

$ awk 'NR==FNR && a[$2]++ {b[$2];next} $2 in b' FS="|" cnr.txt cnr.txt

Related posts:
- Difference between awk NR and FNR variables
- Posts on awk NR==FNR
- Awk FNR variable usage example
- Remove duplicates based on field using awk
- Remove duplicates from file without sorting using awk

1 comment:

Karan Bohra said...

sed -e '

$!{
N;s/^/\n/;D
}

/\n$/!G

/^\n$/d

/^[^|]*|m/{
s/^\([^|]*|\)m/\1/
P;D
}

/^\([^|]*|\)\([0-9][0-9]*\.[0-9][0-9]*\)\(\n.*|\)\(\2\n\)/!D

:mark
s//\1\2\3m\4/
tmark

P;D

' cnr.txt

© Jadu Saikia www.UNIXCL.com