[kwlug-disc] OCR to web

Insurance Squared Inc. gcooke at insurancesquared.com
Thu Jan 6 16:51:22 EST 2011

As some of you know, I scan out of copyright books and publish them on 
the web.  I've struggled with this process for years.  I'd like your 
input on the following:
- any knowledge of decent linux OCR with gui that will let me OCR say a  
500 page book?
- let's say I've got the book(s) ocr'ed.  So I've got 100's or thousands 
of .txt files and the same number of image files.
     - how do I get those into a useable web platform i.e. get them into 
a cms?
     - what cms suits this type of application?

Any thoughts appreciated.  I'm hoping to put online another couple 
projects shortly and don't want to use my old platform.


