Even though sed or awk is not suggested to work with XMLs (generally a programing language like Perl or Python is more preferred), there can be simple cases where we can make use of sed or awk to perform certain search and replace in simple XML s. Here is an simple example.
Input XML file:
$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<StudentInfo Version="1">
<Student>
<StudentId>INS469</StudentId>
<ClassId>21</ClassId>
<Amount>90</Amount>
<Location>AA</Location>
</Student>
<Student>
<StudentId>CSI150</StudentId>
<ClassId>71</ClassId>
<Amount>82</Amount>
<Location>AX</Location>
</Student>
<Student>
<StudentId>E687</StudentId>
<ClassId>12</ClassId>
<Amount>30</Amount>
<Location>AR</Location>
</Student>
</StudentInfo>
Required: Add value 20 t0 the existing 'Amount' values for each student entry in the above XML, i.e. the required output will be something like this:
<?xml version="1.0" encoding="UTF-8"?>
<StudentInfo Version="1">
<Student>
<StudentId>INS469</StudentId>
<ClassId>21</ClassId>
<Amount>110</Amount>
<Location>AA</Location>
</Student>
<Student>
<StudentId>CSI150</StudentId>
<ClassId>71</ClassId>
<Amount>102</Amount>
<Location>AX</Location>
</Student>
<Student>
<StudentId>E687</StudentId>
<ClassId>12</ClassId>
<Amount>50</Amount>
<Location>AR</Location>
</Student>
</StudentInfo>
The awk solution:
$ awk '
BEGIN { FS = "[<|>]" }
{
if ($2 == "Amount") {
sub($3,$3+20)
}
}
' test.xml
And to replace the 'Amount' value in the input XML with a constant number say '100' here is a sed solution:
$ sed 's#\(<Amount>\)[0-9]*\(</Amount>\)#\1'100'\2#g' test.xml
Output:
<?xml version="1.0" encoding="UTF-8"?>
<StudentInfo Version="1">
<Student>
<StudentId>INS469</StudentId>
<ClassId>21</ClassId>
<Amount>100</Amount>
<Location>AA</Location>
</Student>
<Student>
<StudentId>CSI150</StudentId>
<ClassId>71</ClassId>
<Amount>100</Amount>
<Location>AX</Location>
</Student>
<Student>
<StudentId>E687</StudentId>
<ClassId>12</ClassId>
<Amount>100</Amount>
<Location>AR</Location>
</Student>
</StudentInfo>
The \1 and \2 (Specifying which occurrence) used above can be well understood from few of the related posts I am mentioning below. Also there is a useful description of the same on grymoire.com
Related posts:
- Printing single quote with awk in bash
- Printing column of file using sed
- Use sed to replace part of a file in bash
- Convert date format using sed and awk
- Format lines using sed
3 comments:
Hi Juda, I have asked you about xml in post about "sum group by with awk". But I found my answer here, so I deleted my comments and would ask more in this section.
Similar with this sample, suppose I want to get Amount of StudentId E687 or I want to list all StudentId and Amount of ClassId 12. I prefer oneliner in awk. Please advise.
Best regards,
never use sed or awk to parse xml. Use xmlstarlet command
sed -e '
#
# implement increment a number by 20 in sed
# n + 20
#
# store the lsbit in hold space
# increment by 1 the remaining two times
# append the lsb to this result
\|$|,\|$|!b
\|\([1-9][0-9]*\)$|!b
h
s//\1/
s/[^0-9]//g
s/.$//
:incr
s/9\(X*\)$/X\1/
tincr
s/^\(X*\)$/0\1/
s/8\(X*\)$/9\1/
s/7\(X*\)$/8\1/
s/6\(X*\)$/7\1/
s/5\(X*\)$/6\1/
s/4\(X*\)$/5\1/
s/3\(X*\)$/4\1/
s/2\(X*\)$/3\1/
s/1\(X*\)$/2\1/
s/0\(X*\)$/1\1/
y/X/0/
x
/^\n/bdone
s/^/\n/
x
bincr
:done
H;g
s/^\([1-9][0-9]*\)\n\n\(.*\)[1-9][0-9]*\([0-9].*\)/\2\1\3/
' < test.xml
Post a Comment