Scanned docs to revise? (Take II)

Subject: Scanned docs to revise? (Take II)
From: Geoff Hart <ghart -at- videotron -dot- ca>
To: "TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Sun, 19 Jun 2005 11:13:26 -0400

David Neeley responded to my comment that "OCR of really crisp scans easily exceeds 99% accuracy nowadays, but that's not as good as it sounds; it still means one error roughly every 20 words. Budget time for proofreading to catch typos.": <<I have gotten *much* better accuracy than you suggest.>>

A clarification: I said "easily exceeds 99%", not "only hits 99% on a good day".

<<A dozen years ago, the early versions of OmniPage were seeing about 2 or 3 errors per page of xerographically reproduced printed pages, and TypeReader was even better.>>

Then you were getting much better results than PC Magazine was getting only a few years back. The rates have unquestionably improved, no doubt about it, particularly with high-quality scans and source material. My only purpose in citing that number was that it's a simple calculation (1 error in 100 characters @ 5 characters per word = 1 error in every 20 words; @500 words per page, that amounts to 25 errors per page), not to suggest that you should inevitably expect that many errors.

In fact, the improved accuracy of more modern OCR software carries a subtle danger: After reading most of a page or even a few pages and finding no errors, it's very tempting to assume the software did a perfect job and to stop checking. Not a wise strategy, but who said we humans were always wise? This is why God invented editors and proofreaders...

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
Geoff Hart ghart -at- videotron -dot- ca
(try geoffhart -at- mac -dot- com if you don't get a reply)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


Now Shipping -- WebWorks ePublisher Pro for Word! Easily create online
Help. And online anything else. Redesigned interface with a new
project-based workflow. Try it today!

Doc-To-Help 2005 now has RoboHelp Converter and HTML Source: Author content and configure Help in MS Word or any HTML editor. No proprietary editor! *August release.

You are currently subscribed to techwr-l as:
archiver -at- techwr-l -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- techwr-l -dot- com
Send administrative questions to lisa -at- techwr-l -dot- com -dot- Visit for more resources and info.


Scanned docs to revise?: From: Geoff Hart
Re: Scanned docs to revise?: From: David Neeley

Previous by Author: Editing a PDF file?
Next by Author: New TECHWR-L Poll - ADD?
Previous by Thread: Re: Scanned docs to revise?
Next by Thread: Re: Scanned docs to revise? (Take II)

What this post helpful? Share it with friends and colleagues:

Sponsored Ads