Thursday, September 10, 2009

Replace duplicate line with blank - awk


I just received an query as a comment on one of my older post on "removing duplicates based on fields using awk"

Question was:

Any Idea on how to replace duplicate line with blank line instead of deleting them?
e.g.

Input:

test1
test1
test2
test2
test2
test3

Output:

test1

test2


test3

Thought of making it a separate post here.

The solution using awk:

$ awk 'x[$0]++ {$0=""} {print}' file.txt


Related post:
- Remove duplicate without sorting file using awk

6 comments:

handband said...

How would you go about the following using awk?

file1
Data Field

Data Field
Data Field

Data Field

file2
a - Insert Data
b - Insert Data
c - Insert Data
d - Insert Data
e - Insert Data
f - Insert Data

Output file
file3
Data Field
b - Insert Data
Data Field
Data Field
e - Insert Data
Data Field

Jadu Saikia said...

@handband, thanks for your query. I can think of this quick solution:

$ paste -d "|" f1 f2 | awk -F "|" '
{ if ( $1=="" ) { print $2}
else {print $1}
}' > f3

I will post incase I find a better one. Thanks. Keep visiting.

Jadu Saikia said...

And a post on normal merging of 2 files

http://unstableme.blogspot.com/2008/03/merge-alternate-lines-of-files-bashawk.html

handband said...

Jadu,

Thanks! That worked perfectly!

Karan Bohra said...

< file1 sed -e 's/.*/[$&]/' |
dc -e "
$(< file2 sed 's/.*/[&]/')
[q]sq
[SM]sa
[z 1 <a SM z 0 <b]sb
[LMps0]sc
[pLMs0]sd
[? z 0 =q dZ0=c dZ0!=d c l?x]s?
lbx l?x
"

Karan Bohra said...



## 'Bash'
unset prev_line
while IFS= read -r line
do
case ${prev_line++} in ?) case $prev_line in "$line") echo;continue;; esac;; esac
printf '%s\n' "$line"; prev_line=$line
done < yourfile


## 'Perl'
perl -lpe '$_=""if$h{$_}++' < yourfile


## 'Sed'
sed -ne '
G
/^\(.*\)\n\1$/s///p
/./!d
P;s/\n.*//;h
' < yourfile

© Jadu Saikia www.UNIXCL.com