Sunday, December 6, 2009

Awk - numbering lines ignoring blank lines


In one of my earlier post I have already discussed on numbering lines in a file using awk, here is a similar post to number lines ignoring 'blank lines' present.

Input file 'file.txt' has got first 2 lines fixed as HEADER line, followed by a number of record lines (^k).

$ cat file.txt
h1|456|v1|1
h2|190|-|5
k|rn|90.67|12|90
k|rn|90.43|22|35
k|rn|90.62|71|90
k|rn|90.51|16|96
k|rn|90.37|18|71

Required: In the above file, replace the 2nd field in the record lines (i.e. ^k) with the serial record number (starting with zero i.e. 0).
i.e. required output:

h1|456|v1|1
h2|190|-|5
k|0|90.67|12|90
k|1|90.43|22|35
k|2|90.62|71|90
k|3|90.51|16|96
k|4|90.37|18|71

The solution using awk NR variable:

$ awk '
BEGIN {FS=OFS="|"}
$1=="k" {$2=NR-3} {print}
' file.txt

i.e. for the lines("|" delimited) where first field is "k", replace 2nd field with "NR-3" and then print the new output.
NR is the ordinal number of the current record. A post describing awk NR variable can be found here

Now if the input file contains certain blank lines, something like this:

$ cat file.txt
h1|456|v1|1
h2|190|-|5
k|rn|90.67|12|90
k|rn|90.43|22|35


k|rn|90.62|71|90

k|rn|90.51|16|96
k|rn|90.37|18|71

Executing the above awk one liner:

$ awk '
BEGIN {FS=OFS="|"}
$1=="k" {$2=NR-3} {print}
' file.txt

Output:

h1|456|v1|1
h2|190|-|5
k|0|90.67|12|90
k|1|90.43|22|35


k|4|90.62|71|90

k|6|90.51|16|96
k|7|90.37|18|71

As you can see the record numbers are not in serial order, as using NR will also count for the blank lines present.
A different solution:

$ awk '
BEGIN{FS=OFS="|"}
!/^$/ {++c}
$1=="k" {$2=c-3} {print}
' file.txt

Output:

h1|456|v1|1
h2|190|-|5
k|0|90.67|12|90
k|1|90.43|22|35


k|2|90.62|71|90

k|3|90.51|16|96
k|4|90.37|18|71

Related posts using Awk:

- Replace a field with different values using awk
- Subdividing a file into sub-files using awk and bash
- Finding blank columns in a file - awk
- A practical example using awk

No comments:

© Jadu Saikia www.UNIXCL.com