Monday, January 19, 2009

Count non empty fields in a file - awk


Input file: "4w_prod_u_1.txt" is the attendance report for production unit1 of a factory for a particular week.
If the employee was present for a day there will be a "y" for that day, if absent nothing will be put.

$ cat 4w_prod_u_1.txt
#Name,Designation,attendance details
Naveen A,Unit Mngr,y,y,,y,y,y
S Puri,Driller,,,y,,y,y
N Nandan,Factory Mngr,y,y,y,y,y,
K Asif,Unit Mngr,,,y,y,y,y

Required:
We need to count total days an employee was present in that week, basically to count total number of non empty fields after the first two fields in each line (except the first hear line) or to be more specific counting total number of "y" for each line (after the 1st two fields)

The awk solution:

$ awk > 4w_prod_u_1.out '
BEGIN {print "Name[Num Days Present]"}
!/^#/ {cnt=0; for (i=3; i<=NF; i++)
{if ($i != "") {cnt++} } {print $1,"["cnt" days]"} }
' FS=, 4w_prod_u_1.txt

Output file:
$ cat 4w_prod_u_1.out
Name[Num Days Present]
Naveen A [5 days]
S Puri [3 days]
N Nandan [5 days]
K Asif [4 days]

or the

if ($i != "")

in the above solution can be

if ($i == "y")
will produce the same result.

10 comments:

JuanPablo said...

a awk question,
is posible use a bash command in awk
example, fila have epoch dates

cat file
1231254900
1231255800
1231256700

with
$ date -d @1231256700
Tue Jan 6 12:45:00 CLST 2009
convert to human date

how convert every vaulue in file to human date?
cat file | awk 'a=$(date -d @$1) {print a}'


thanks

Unknown said...

Juan,

One idea I have got

$ cat epochcnvrt
date -d @$1

$ awk '{ system("sh epochcnvrt "$1)}' file

Should work.
But there should be a good way of doing this. Let me give a thought. Thanks for commenting. Keep in touch.

Unknown said...

or directly you can put the command like this

$ awk '{ system("date -d @"$1)}' file

But the problem is

suppose your file is like this:

$ cat file.3
1231254900 f
1231255800 d
1231256700 n

$ awk '{ system("date -d @"$1)}{print $2}' file.3

Tue Jan 6 20:45:00 IST 2009
f
Tue Jan 6 21:00:00 IST 2009
d
Tue Jan 6 21:15:00 IST 2009
n

I am trying to see how we can get the following output:

Tue Jan 6 20:45:00 IST 2009 f
Tue Jan 6 21:00:00 IST 2009 d
Tue Jan 6 21:15:00 IST 2009 n

JuanPablo said...

awk '{system("echo -n $(date -d @" $1 " )")}{print "\t" $2}' file.3

thank you very much for your help

Unknown said...

@JuanPablo, thanks, I forgot of the echo -n , great. thanks for the comment. Its helpful for me.

Sumit said...

Hi,

I have a text file "mytest.txt" where columns are separated by ^A (Ctrl-A) characters:

ID^AName^ATitle
1^AJohn^AManager
2^APeter^AProgrammer
3^A^APresident
4^AMike^AProgrammer

and it contains several rows with second column as blank (3^A^APresident). How can print/write to a file all those rows with second column as blank?

Thanks,
Sumit

Unknown said...

@Sumit, you can something like this:

awk -F "^A" '$2==""' file.txt > somefile.txt

(^A as you know is CtrlV-CtrlA if you are copying above line)

Hope this helps.

Sumit said...

Hi Jadu,

I tried something like you suggested:

awk -F^A '{if ($2 == "") print $0 }' > somefile.txt

and it worked.

Thanks!
Sumit

Anirudh said...

perl -F, -pale '
$_ = "$F[0] [" . ($a=join",",@F[2..$#F])=~tr/y//d . "days]";
'

Unknown said...


FS=","; # your field separator
ATF="4w_prod_u_1.txt"; # your attendance filename

echo 'Name[Num Days Present]'

while IFS= read -r line
do
case $line in '' | "#"* ) continue;; esac

set -f; IFS=$FS; set -- $line

case $# in [0-2] ) continue;; esac

name=$1 knt=; shift 2

while case $# in 0 ) break;; esac
do knt=$1${1:+"$FS"}$knt; shift; done

set -- $knt
echo "$name[$# days]"
done < $ATF

© Jadu Saikia www.UNIXCL.com