Re: Scanned Docs to Revise

Subject: Re: Scanned Docs to Revise
From: Dick Margulis <margulisd -at- comcast -dot- net>
To: "TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Sat, 18 Jun 2005 04:58:20 -0400

twriter01 -at- hotmail -dot- com wrote:

Help! Suggestions on getting hundreds of scanned docs (.pdf) into MS Word
format. These docs were hard copies scanned into .pdf format.

Only options I can think of are re-typing everything or using
voice-recognition. Any suggestions? Any technologies? TIA!

If the scans are fairly clean, you can use the Capture plug-in for Acrobat (free download if you don't already have it). Or you can use some other OCR product that accepts PDF as input (some are reputedly better, but they're not free). The advantage of the Acrobat solution is that you can then save the document in Word format and do your formatting and cleanup at that point (applying paragraph styles, removing weirdnesses that Acrobat inserts, etc.)

In any case, OCR still requires manual work on every page to interpret words the OCR software can't decipher and to check for errors in general. However, you can skip at least some of this at the OCR stage and do it using spell check in Word.

If it really is hundreds of documents (thousands of pages??), you might look into bringing in a temp to do the bulk of the scutwork. Maybe bring in a second temp to proofread the output against the originals, so you don't have to.

If the scans are not clean enough to OCR with some level of efficiency, then indeed it might be cheaper to have them all rekeyed. There are companies that specialize in this kind of work, mostly located in the Caribbean and in Israel (or at least they used to be--they may have moved elsewhere by now). There is still a residual error rate that requires a final check on your end with a skilled proofreader, but the basic keying should be affordable.


Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today!

Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author content and configure Help in MS Word or any HTML editor. No proprietary editor! *August release.

You are currently subscribed to techwr-l as:
archiver -at- techwr-l -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- techwr-l -dot- com
Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit for more resources and info.

Previous by Author: Re: general topic of spelling
Next by Author: Re: Editing a PDF file
Previous by Thread: Re: Scanned Docs to Revise
Next by Thread: Re: Scanned Docs to Revise

What this post helpful? Share it with friends and colleagues:

Sponsored Ads