[kwlug-disc] BASH compare items in two files

Raul Suarez rarsa at yahoo.com
Wed Nov 3 14:55:08 EDT 2010

--- On Wed, 11/3/10, John Van Ostrand <john at netdirect.ca> wrote:

> From: John Van Ostrand <john at netdirect.ca>
> Subject: Re: [kwlug-disc] BASH compare items in two files
> To: "KWLUG discussion" <kwlug-disc at kwlug.org>
> Received: Wednesday, November 3, 2010, 2:17 PM
> ----- Original Message -----
> > ----- Original Message -----
> > > A higher level language (python, perl, C, etc)
> program would be your
> > > best bet as they already have the XML parsing and
> data handling
> > > libraries that will make this task a cinch.
> > >
> > > 1. Load the list of IDs into a searchable list
> > > 2. Using a SAX parser compare the ID of every
> node against the list
> > > 3. Done!
> > 
> > Oops I should read more carefully. He asked what a
> real programmer
> > would do.
> So my take on it is this. Use the document object model in
> some language. I presume that's what sax does. 

No, SAX and DOM are two different ways of handling XML

In SAX the process is sequential, Top down reading each node.
In DOM you load the whole document and query (or traverse) it.

DOM gives more flexibility
SAX is simpler to implement, has less resource requirements and is faster.

Usually with a SAX parser you just pass the file name and an event handler (call back) to call for each node (Event)

Here is a simple example, 

I am thinking that for Richard's purpose the code would be even simpler as he is just counting not storing the data so no need for "characters" or endElement methods

The code for the __init__ and startElement code would look like

def __init__ (self, searchTerm): 
   nodesDict = {} 
   setOfIDs = ()
   # Here put the code to load the file with IDs into the setOfIDs set.

def startElement(self, name, attrs): 
  if not nodesDict.has_key(name) :
    nodesDict[name] = 0

  id = attrs.get('name',"")
  if id in setOfIDs
    nodesDict[name] += 1

Raul Suarez

Technology consultant
Software, Hardware and Practices
Twitter: rarsamx
An eclectic collection of random thoughts

More information about the kwlug-disc mailing list