[kwlug-disc] BASH compare items in two files

John Van Ostrand john at netdirect.ca
Wed Nov 3 14:37:47 EDT 2010


----- Original Message -----
> ----- Original Message -----
> > ----- Original Message -----
> > > A higher level language (python, perl, C, etc) program would be
> > > your
> > > best bet as they already have the XML parsing and data handling
> > > libraries that will make this task a cinch.
> > >
> > > 1. Load the list of IDs into a searchable list
> > > 2. Using a SAX parser compare the ID of every node against the
> > > list
> > > 3. Done!
> >
> > Oops I should read more carefully. He asked what a real programmer
> > would do.
> 
> So my take on it is this. Use the document object model in some
> language. I presume that's what sax does. Javascript can do this
> (jQuery), as can the perl-XML-XPathEngine. I suspect php-domxml can as
> well. One uses a query language to search for the XML tags and
> iterates through them looking for matches or misses.

Here is a PHP version:

#!/usr/bin/php
<?php

# Load Id file
if (!$fd = fopen("/tmp/id.txt", 'r')) {
	die("Unable to open /tmp/id.txt");
}

while (!feof($fd)) {
	# Get line
	$line = fgets($fd);
	# Trim whitespace
	$line = trim($line);
	# Add to array
	$id_list{$line} = true;
}

# Load the XML file
$d = DomDocument::load("/tmp/id.xml");

# Get a list of all <foo> tags
$nl = $d->GetElementsByTagName("foo");

# Cycle through them
$l = $nl->length;
for ($i = 0; $i < $l; $i++) {
	# Extract ID
	$id = $nl->item($i)->getAttribute("id");
	# Check array:
	if (isset($id_list{$id})) {
		$match++;
	} else {
		$nomatch++;
	}
}


echo "Matching    : $match\n";
echo "Non-matching: $nomatch\n";

-- 
John Van Ostrand 
CTO, co-CEO 
Net Direct Inc. 
564 Weber St. N. Unit 12, Waterloo, ON N2L 5C6 
Ph: 866-883-1172 x5102 
Fx: 519-883-8533 

Linux Solutions / IBM Hardware 




More information about the kwlug-disc_kwlug.org mailing list