tips for parsing xml in a shell script?

Hello,

I'm writing a shell script to extract info from an xml file to create various text files that share data in the form of custom entities declared at the top of the document (product id, name, version, copyright date, etc.). So far I'm using xmllint which is working well. I'm extracting text for my project files using xmllint --shell:
<pre style="padding-left: .75ex; padding-top: .25em; padding-bottom: .25em; margin-top: .5em; margin-bottom: .5em; margin-left: 1ex; max-width: 80ex; overflow: auto; font-size: 10px; font-family: Monaco, 'Courier New', Courier, monospace; color: #222; background: #eee; line-height: normal">echo cat lls_cd_product/plist | xmllint --noent --shell xyz.xml | sed -e '1d; $d'</pre>
This is working well, but --shell mode adds an extra line to the beginning and end which I let sed cleanup, but then I need to strip the enclosing tags as well.

Xmllint does a good job of filling in my entities with the exception of MacRoman bullet characters that I've entered as &#165;. These are just converted to the same entity in hex, &#xA5;. I suppose I can replace those with sed, but that's starting to feel kludgey.

I'm beginning to wonder if there may be a better way to do this. I'm not very familiar with XSL, but I am considering ramping up on that.

Any suggestions would be welcome.

--
Cole

15 PB, Mac OS X (10.3.x)

Posted on Apr 1, 2008 8:41 AM

Reply
7 replies

Apr 1, 2008 11:45 AM in response to etresoft

etresoft wrote:
It may be a few hours before I get time to work on it.


Or not.

Here is your XML file:
<?xml version="1.0" encoding="UTF-8"?>
<info>
<ver>2.1.1</ver>
<tag>abc</tag>
<name>Product Name</name>
<filename>productname</filename>
<copyright>2008</copyright>
</info>

This generates the cd product file:
<?xml version="1.0"?>
<xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"
doctype-system="http://www.apple.com/DTDs/PropertyList-1.0.dtd"
doctype-public="-//Apple Computer//DTD PLIST 1.0//EN"/>
<!-- You could access addition data from some other XML file. -->
<!-- <xsl:variable name="data" select="document('data.xml')/data"/> -->
<xsl:template match="info">
<plist version="1.0">
<dict>
<key>hfs-openfolder</key>
<string>.</string>
<key>hfs-volume-name</key>
<string><xsl:value-of select="name"/></string>
<key>hide-hfs</key>
<string>./{Norton,*.txt,*.exe,.inf}</string>
<key>hide-iso</key>
<string>./{PDS,*Rename,readme,.Volume*,Norton*,*icns,Run*.app,Icon*,Desktop*,TheFolder}</string>
<key>hide-joliet</key>
<string>./{PDS,*Rename,readme,.Volume*,Norton*,*icns,Run*.app,Icon*,Desktop*,TheFolder}</string>
<key>iso-volume-name</key>
<string><xsl:value-of select="tag"/></string>
<key>joliet-volume-name</key>
<string><xsl:value-of select="name"/></string>
</dict>
</plist>
</xsl:template>
</xsl:stylesheet>

and this generates the Mac read me file in XML:

<?xml version="1.0"?>
<xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="xml" indent="yes" version="1.0" encoding="UTF-8"/>
<!-- You could access addition data from some other XML file. -->
<!-- <xsl:variable name="data" select="document('data.xml')/data"/> -->
<xsl:template match="info">
<!-- You don't have to deal with these entities anymore. You should be
able to save this xsl file in UTF-8 format and just type in the
bullets. But the entities work too. -->
<readme_mac><xsl:value-of select="concat(name, ' ', ver)"/>
Copyright <xsl:value-of select="copyright"/>, Laureate Learning Systems¨, Inc.
Minimum System Requirements:
Â¥ 300 MHz or faster PowerPC, Intel or better CPU
Â¥ Mac OS 8.1 or later, including any Mac OS X
Â¥ 64 MB available RAM
Â¥ 60 MB available disk space
</readme_mac>
</xsl:template>
</xsl:stylesheet>

This one will output the readme in HTML:

<?xml version="1.0"?>
<xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="html" indent="yes" version="4.0" encoding="UTF-8"/>
<!-- You could access addition data from some other XML file. -->
<!-- <xsl:variable name="data" select="document('data.xml')/data"/> -->
<xsl:template match="info">
<!-- You don't have to deal with these entities anymore. You should be
able to save this xsl file in UTF-8 format and just type in the
bullets. But the entities work too. -->
<html>
<head>
<title><xsl:value-of select="concat(name, ' ', ver)"/></title>
</head>
<body>

<xsl:value-of select="concat(name, ' ', ver)"/>


Copyright <xsl:value-of select="copyright"/>, Laureate Learning Systems¨, Inc.


  • Minimum System Requirements:
  • 300 MHz or faster PowerPC, Intel or better CPU
  • Mac OS 8.1 or later, including any Mac OS X
  • 64 MB available RAM
  • 60 MB available disk space

</body>
</html>
</xsl:template>
</xsl:stylesheet>


Yes. I do enjoy XSL quite a bit 🙂

Apr 1, 2008 9:47 AM in response to Cole Tierney

Cole Tierney wrote:
Xmllint does a good job of filling in my entities with the exception of MacRoman bullet characters that I've entered as &#165;. These are just converted to the same entity in hex, &#xA5;. I suppose I can replace those with sed, but that's starting to feel kludgey.


I know you are the master of sed, but for this you might want to take a look at Perl. See if you can get XML::Simple to read the data directly into a Perl hash. I need true event-driven parsing, so I'm now using XML::Parser. There are about a dozen other modules to choose from as well.

Also, NSDictionary has quite a few XML abilities. It could probably read this data too. This might be a good option if you can't or don't want to install lots of Perl modules.

I'm beginning to wonder if there may be a better way to do this. I'm not very familiar with XSL, but I am considering ramping up on that.


I think XSL could replace those entities with the correct text, but XSL doesn't sound right for this application. Don't get me wrong, I love XSL, but Perl would be easier.

Could you post some example input and desired output?

Apr 1, 2008 10:36 AM in response to etresoft

etresoft wrote:
I think XSL could replace those entities with the correct text, but XSL doesn't sound right for this application. Don't get me wrong, I love XSL, but Perl would be easier.

Could you post some example input and desired output?


Thanks for the feedback. Perl is sounding like a good balance. I'll try to summarize what I'm trying to do. For each of our cd products I have a set of text files:
1.) a plist to specify hdiutil makehybrid options
2.) a plist for mac installer
3.) an ini file shared be both mac and windows installers
4.) an autorun.inf for windows
5.) mac and windows readme files
6.) misc. data related to the project

Each of these files contain strings common to all. I normally just hand edit these, but it's easy to miss a version number or copyright date here and there. If I combine all these files into a single xml doc, I can declare custom entities at the top and use those through out the file. This will go a long way toward reducing my optical coaster production rate. 🙂

Here's an abbreviated version of my xml file.

--
Cole

Apr 1, 2008 11:04 AM in response to Cole Tierney

Here are the two files that would be created from abc.xml:
abc.plist
macreadme (those are suppose to macroman bullet characters - option 8)

I'm still trying to figure out a nice way to tuck a plist into an xml doc. My current example cheats by hiding the first lines of it in a comment and relies on the shell script to write those two lines.

--
Cole

Apr 1, 2008 11:12 AM in response to Cole Tierney

Cole Tierney wrote:
Thanks for the feedback. Perl is sounding like a good balance.


Actually, from this post, XSL seems like it might be a better fit. Or, perhaps a mix of XSL and Perl. Since you are creating your own source XML file, XSL becomes a much more viable option for someone who doesn't know XSL that well. Just because some random file is XML, doesn't mean it isn't a nightmare to parse. If you are creating your own XML files, you can create them correctly and they'll be easy to parse or transform with XSL.

Each of these files contain strings common to all. I normally just hand edit these, but it's easy to miss a version number or copyright date here and there. If I combine all these files into a single xml doc, I can declare custom entities at the top and use those through out the file. This will go a long way toward reducing my optical coaster production rate. 🙂


I do something very similar. I have a system that uses Perl to ingest satellite and aerial data. I currently have about 100 TB of image data and 126K database records. For each scene, I create an XML file that has all the metadata I need. Then, I use XSL to generate an HTML description for each scene and a Google Earth KML file for the whole batch. I have a Perl module that spits out shapefiles and CSV files or the old-school types. The XSL uses data from the database as well as some data from a static XML file. I can change the output quite a bit without ever touching any code.

You could do something very similar starting with one or more XML files that you create. Start off with your data in a Perl hash, that can be easily output in XML or other formats. Then, for each of your outputs:
1.) a plist to specify hdiutil makehybrid options

Easy XSL transform

2.) a plist for mac installer

Another easy XSL transform

3.) an ini file shared be both mac and windows installers

You could do these with XSL transforms, but it would be easier and more maintainable to do it in Perl.

4.) an autorun.inf for windows

Same as the ini file. XSL can do non-XML, but the further you get from XML, the messier your XSL gets.

5.) mac and windows readme files

You didn't specify the format for these. If you can output in HTML, it is easy. I have also used XSL to generate RTF files.

6.) misc. data related to the project

Well, there is always Perl for "other" data.

I will see if I can hack up something for you with XML and XSL. It may be a few hours before I get time to work on it.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

tips for parsing xml in a shell script?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.