Thursday, April 9, 2009

Replace field other than first occurrence in awk - bash


Graphically my requirement was something like this:

I had a csv file which looks like this:


I had to remove all the first fields other than the first occurrence of each unique first field.


This is how I achieved that.

Input file:
$ cat file.txt
3245,1239035917,560,0
3245,1239035919,560,0
3245,1239035947,460,0
3247,1239035967,60,0
3247,1239036917,560,0
3295,1239035917,70,0

Awk solution:

$ awk -F "," '!a[$1]++{print $0; next}{gsub ($1,""); print}' file.txt

Output:
3245,1239035917,560,0
,1239035919,560,0
,1239035947,460,0
3247,1239035967,60,0
,1239036917,560,0
3295,1239035917,70,0

Done.
Please put any more alternative in the comment section.

2 comments:

Jadu Saikia said...

I just did the python alternative to this

fp = open("file.txt", "rU")
lines = fp.readlines()
fp.close()

f_f=" "
for line in lines:
f=line.split(",")
if f[0]==f_f:
print ","+",".join(f[1:]).rstrip()
else:
f_f=f[0]
print line.rstrip()

Jadu Saikia said...

The above comment is to be used with proper python indentation.

© Jadu Saikia www.UNIXCL.com