[kwlug-disc] OCR to web

Chris Frey cdfrey at foursquare.net
Thu Jan 6 17:21:06 EST 2011


On Thu, Jan 06, 2011 at 04:51:22PM -0500, Insurance Squared Inc. wrote:
> As some of you know, I scan out of copyright books and publish them on 
> the web.  I've struggled with this process for years.  I'd like your 
> input on the following:
> - any knowledge of decent linux OCR with gui that will let me OCR say a  
> 500 page book?
> - let's say I've got the book(s) ocr'ed.  So I've got 100's or thousands 
> of .txt files and the same number of image files.
>     - how do I get those into a useable web platform i.e. get them into 
> a cms?
>     - what cms suits this type of application?


Anyone know what Project Gutenberg uses?  They have text and HTML versions
of their books, and some of the HTML versions look pretty good.
Might be worth asking them.

- Chris





More information about the kwlug-disc mailing list