Thursday, November 19, 2009

Convert fixed length file to csv - awk


Input file:

$ cat data.txt
k12582927001611USNA
k12582990001497INAS
k12583053001161LNEU

Required output:

k,1258292700,1611,US,NA
k,1258299000,1497,IN,AS
k,1258305300,1161,LN,EU

Awk solutions:

Using GAWK(1)

From GAWK(1) man page:

If the FIELDWIDTHS variable is set to a space separated list of numbers, each field is expected to have fixed width, and gawk splits up the record using the specified widths. The value of FS is ignored.
Assigning a new value to FS overrides the use of FIELDWIDTHS, and restores the default behavior.

$ awk -v FIELDWIDTHS='1 10 4 2 2' -v OFS=',' '
{ $1=$1 ""; print }
' data.txt

Another alternative using awk substr function

$ awk '{
one=substr($0,1,1)
two=substr($0,2,10)
three=substr($0,12,4)
four=substr($0,16,2)
rest=substr($0,18)
printf ("%s,%s,%s,%s,%s\n", one, two, three, four, rest)
}' data.txt

6 comments:

Slestak said...

Is it pretty trivial to go the other direction? CSV to fixed width? Say I have a csv file that I want to format 30L,15R,10L?

Jadu Saikia said...

@Slestak, thanks for your comment. Do you mean this ?

http://unstableme.blogspot.com/2009/07/text-alignment-with-awk-printf-function.html

Keep in touch.

John Dondapati said...

I extended it to include trimming values.

Here's the command I used to parse about 850k records and it created a CSV in about 65 seconds.

Thanks a bunch Man!

awk '{
ROLLNO=substr($0,1,10)
DIST=substr($0,11,2)
CNAME=substr($0,13,30)

YR1PC1=substr($0,43,2)
YR1MKS1=substr($0,45,3)
YR1IND1=substr($0,48,1)
YR1RS1=substr($0,49,1)

YR1PC2=substr($0,50,2)
YR1MKS2=substr($0,52,3)
YR1IND2=substr($0,55,1)
YR1RS2=substr($0,56,1)

YR1PC3=substr($0,57,2)
YR1MKS4=substr($0,59,3)
YR1IND3=substr($0,62,1)
YR1RS3=substr($0,63,1)

YR1PC4=substr($0,64,2)
YR1MKS4=substr($0,66,3)
YR1IND4=substr($0,69,1)
YR1RS4=substr($0,70,1)

YR1PC5=substr($0,71,2)
YR1MKS5=substr($0,73,3)
YR1IND5=substr($0,76,1)
YR1RS5=substr($0,77,1)

YR1PC6=substr($0,78,2)
YR1MKS6=substr($0,80,3)
YR1IND6=substr($0,83,1)
YR1RS6=substr($0,84,1)

TOTAL=substr($0,85,3)
RESULT=substr($0,88,6)
ADD_FLG=substr($0,94,1)
Linkno=substr($0,95)

gsub(/[[:space:]]*/,"",CNAME );
gsub(/[[:space:]]*/,"",RESULT);
gsub(/[[:space:]]*/,"",YR1MKS1);
gsub(/[[:space:]]*/,"",YR1IND1);
gsub(/[[:space:]]*/,"",YR1MKS2);
gsub(/[[:space:]]*/,"",YR1IND2);
gsub(/[[:space:]]*/,"",YR1MKS3);
gsub(/[[:space:]]*/,"",YR1IND3);
gsub(/[[:space:]]*/,"",YR1MKS4);
gsub(/[[:space:]]*/,"",YR1IND4);
gsub(/[[:space:]]*/,"",YR1MKS5);
gsub(/[[:space:]]*/,"",YR1IND5);
gsub(/[[:space:]]*/,"",YR1MKS6);
gsub(/[[:space:]]*/,"",YR1IND6);

gsub(/[[:space:]]*/,"",TOTAL);


printf ("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", ROLLNO, DIST, CNAME, YR1PC1, YR1MKS1, YR1IND1, YR1RS1, YR1PC2, YR1MKS2,YR1IND2 ,YR1RS2 ,YR1PC3 ,YR1MKS3 ,YR1IND3 ,YR1RS3 ,YR1PC4 ,YR1MKS4 ,YR1IND4 ,YR1RS4 ,YR1PC5 ,YR1MKS5 ,YR1IND5 ,YR1RS5 ,YR1PC6 ,YR1MKS6 ,YR1IND6 ,YR1RS6 ,TOTAL ,RESULT ,ADD_FLG ,Linkno )
}' YR1WEB.TXT > YR1WEB.CSV

Jadu Saikia said...

@John Dondapati, that's great. Thanks for sharing.

nasri.najib said...

Thank you for your help John. It work for me in SunOS as part of a script.

My code snippet:
awk '{
currencyOne=substr($0, 3, 3);
currencyTwo=substr($0, 6, 3);
printf ("MDS,%s/%s\n", currencyOne,currencyTwo}' test.dat > answer.log

Anbu said...

awk '{sub(".","&,");sub(".{12}","&,");sub(".{17}","&,");sub(".{20}","&,")} 1' data.txt

© Jadu Saikia www.UNIXCL.com