Friday, February 20, 2009

Print characters before and after a pattern - awk


e.g.

$ var="abcdefghij0123456789"

$ echo $var
abcdefghij0123456789

Now to print 3 characters before the pattern "h" in $var
$ echo $var | awk 'match($0,"h"){print substr($0,RSTART-3,3)}'

Output:
efg

And to print 4 characters after the pattern "h" in $var
$ echo $var | awk 'match($0,"h"){print substr($0,RSTART+1,4)}'

Output:
ij01

Combining, i.e. to print 3 characters before "h" and 4 characters after "h"
$ echo $var | awk 'match($0,"h"){print substr($0,RSTART-3,3),substr($0,RSTART+1,4)}'

Output:
efg ij01

Some awk terms:

match(string, regex): Returns the position of the first match for the regular expression regex in string, or 0 if no matches are found. Sets RSTART and RLENGTH variables.

substr(string, start [,length]: Return length characters from the specified string, starting from start. If length is not specified, return rest of record.

RSTART: Index of first character matched by a successful call to the match() function.

RLENGTH: Length of string matched by a successful call to the match() function.

Related post:
- Awk substr function explained
- Replace digit with serial number - awk
- Check for presence of a pattern in a line - awk

2 comments:

sync said...

Great stuff!

About combining... how would you do it if you wanted all in between 2 patterns ?
Let's say i would like to print "ghi" from "abcdefghij0123456789" pattern 1="def" and pattern 2="j0"

Thanks.

Unknown said...

@synchro, thanks for commenting.

Well, I can think of this solution using sed

$ echo "abcdefghij0123456789def" | sed 's/.*def\(.*\)j0.*/\1/'

Let me think of the awk solution. Keep in touch.

© Jadu Saikia www.UNIXCL.com