[kwlug-disc] Image Comparison

Chris Irwin chris at chrisirwin.ca
Thu Aug 13 14:54:36 EDT 2009


(Sorry if this doesn't thread properly. I use IMAP via gmail and it
doesn't echo my own mail back to me, so I can't actually reply to myself
on-list.)

I've got a bit of an update on my progress.

On Tue, 2009-08-11 at 20:32 -0400, Chris Irwin wrote:
> Does anybody know of any way of mass comparing jpeg files by a image
> content rather than a file sum or name? I've got two sets of several
> thousand images I need to sort through. Basically I want to find unique
> files in directories A & B, deleting duplicates from B.

Imagemagick has a utility called 'compare' that will take two images as
input, and output a third image illustrating the difference (it seems
to create a composite and overlay in red where they don't match).

But by specifying '-metric AE' it will also output the total number of
pixels that differ between the two. This is essentially what I want
(not necessarily pixel count, just some sort of indicator that the
image content is not the same). There are other metrics to guess
percentage difference (that should cover resizing and things like
that), but that is not necessary for my task.

Here is an example. Kelley315.orig.jpg is the file from my Uncle's
photos. Kelley315.jpg is the version I have now in my library after
import: It has been modified to include tags and other such things in
the EXIF data. Kelley316.jpg is a completely different image simply for
comparison.

$ ls -l Kelley*
-r-x------ 1 chris users 74904 2009-08-13 14:26 Kelley315.jpg
-r-x------ 1 chris users 74280 2009-08-13 14:12 Kelley315.orig.jpg
-r-x------ 1 chris users 69335 2009-08-13 14:39 Kelley316.jpg

$ md5sum Kelley*
91ff4dedc2cf37f34f9bd42016fa27e4  Kelley315.jpg
3db86a3eec1bf49b05f6caf0cd36ea4d  Kelley315.orig.jpg
1cb082952c06e81a095529003c7d1d90  Kelley316.jpg


As you can see, all three files have a different sizes and md5 sums,
despite the first two being identical images other than EXIF data.

$ compare -metric AE Kelley315.orig.jpg Kelley315.jpg \
  /dev/null 2>&1
0

The compare command above (split over two lines) compares the identical
images, writing the composite comparison image in /dev/null for safe
keeping, and outputting the actual pixel difference in image content.
The count comes out on stderr, so I'm moving it to stdout so I can use
this in a script. 0 is good result here :)


$ compare -metric AE Kelley315.orig.jpg Kelley316.jpg \
  /dev/null 2>&1
426270

Here is a comparison of different images. There are a couple of
differing pixels. The image is 533x400, so there only 426400 pixels
to begin with, so that is quite a negative match. For my purposes, anything >0
is not a match, so that cuts down on some comparisons.

I've now got something I can use in a script. Thanks to those who
emailed me off-list with suggestions :)

-- 
Chris Irwin <chris at chrisirwin.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://astoria.ccjclearline.com/pipermail/kwlug-disc_kwlug.org/attachments/20090813/16bf88c2/attachment-0001.bin>


More information about the kwlug-disc_kwlug.org mailing list