[time-nuts] OT magazine scanning

bg at lysator.liu.se bg at lysator.liu.se
Sat Mar 4 03:14:29 EST 2006


> I'm not sure how useful scanned magazines are going to be to you.  The
> scanning produces graphical images of the pages and not OCR'd text and
> graphics (The DocuTech excepted).  You can't search for keywords and
> such unless you build a separate database.
>
> A friend of mine has been on an almost fanatical quest to digitize
> every car magazine published during the muscle car era.  He's mostly
> succeeded.  Many, many gigabytes.  He gave me a copy.  It's mostly
> worthless since it can't be searched.  It is fun to sit down and page
> through old issues of Hot Rod and the like but as a useful tool for
> finding specifics, it's no good.
>
> John

This is the same problem, projects like Gutenberg (.org) and Runeberg
(also .org) have digitizing old books. The actual scanning is just the
first part of the job. You have the OCR and then proofreading to do also.
But given enough interest and effort, this problem is solvable, as shown
by others. I am sure descriptions of their exact workflows are available
for reuse.

--

    Björn Gabrielsson




More information about the time-nuts mailing list