Thursday, September 10, 2009

Replace duplicate line with blank - awk


I just received an query as a comment on one of my older post on "removing duplicates based on fields using awk"

Question was:

Any Idea on how to replace duplicate line with blank line instead of deleting them?
e.g.

Input:

test1
test1
test2
test2
test2
test3

Output:

test1

test2


test3

Thought of making it a separate post here.

The solution using awk:

$ awk 'x[$0]++ {$0=""} {print}' file.txt


Related post:
- Remove duplicate without sorting file using awk

4 comments:

handband said...

How would you go about the following using awk?

file1
Data Field

Data Field
Data Field

Data Field

file2
a - Insert Data
b - Insert Data
c - Insert Data
d - Insert Data
e - Insert Data
f - Insert Data

Output file
file3
Data Field
b - Insert Data
Data Field
Data Field
e - Insert Data
Data Field

Jadu Saikia said...

@handband, thanks for your query. I can think of this quick solution:

$ paste -d "|" f1 f2 | awk -F "|" '
{ if ( $1=="" ) { print $2}
else {print $1}
}' > f3

I will post incase I find a better one. Thanks. Keep visiting.

Jadu Saikia said...

And a post on normal merging of 2 files

http://unstableme.blogspot.com/2008/03/merge-alternate-lines-of-files-bashawk.html

handband said...

Jadu,

Thanks! That worked perfectly!

© Jadu Saikia www.UNIXCL.com