<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Dec 5, 2013 at 4:46 PM, Chris Frey <span dir="ltr"><<a href="mailto:cdfrey@foursquare.net" target="_blank">cdfrey@foursquare.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><span style="color:rgb(34,34,34)">XSD is your table definition.</span><br>

</div>

XML is your database.<br>

XPATH is your SQL.<br>

XSLT is your programming language.<br><br></blockquote><div><br></div><div>Good analogy.</div><div><br></div><div>> <span style="font-family:arial,sans-serif;font-size:13px">copious free time</span></div><div><br></div>

<div>That's the rub. For me, XSLT was unfamiliar enough and difficult enough to debug that doing my task in XSLT was not feasible. For more complex tasks, it is worth considering the use of an XML processing library and your favourite programming language, as John suggested. Generally speaking, there are two approaches.</div>

<div><br></div><div>The first approach is to have a processor which parses the file and raises events when elements and other document entities are read. The keyword to learn more is SAX (Simple API for XML). The main benefit of the SAX-based approach is that you can essentially process files as large as you can store, as the entire thing doesn't need to fit in memory at once.</div>

<div><br></div><div>Another approach is for the library to build some hierarchical representation of the document, which can then be queried (often using XPATH). This is generally more resource intensive, but often considerably more convenient. These days, I think this is probably the better approach unless you have extraordinarily large documents.</div>

</div></div></div>