Re: OCR/barcode reliability

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Thu Jun 03 2004 - 12:55:42 CDT

At 1:28 PM -0400 6/3/04, David Mertz wrote:
>On Jun 3, 2004, at 1:06 PM, Popkin, Laird (WMG Corp) wrote:
>>In general, barcodes are more accurate than OCR. The error rates I
>>see quoted are:
>>Barcodes: 1 error per 1m barcode reads.
>>OCR: 2% error rate (20,000 errors per 1m character reads).
>This is surely apple-to-oranges.
>I can't imagine a 2% error rate reading OCR-A/B. It is quite
>believable for "some unknown font that might be used in a book or
>magazine" though.
>I think it is safe to say that if we decide to go with OCR of
>ballots (either solely, or redundantly with a barcode), we should
>print the votes in an OCR font. Maybe even MICR, if that's actually
>more reliable than OCR-A.

Does MICR encode as many characters as OCR-A?

>>Even though OCR is typically much less accurate than barcodes, in
>>our case if we're clever we can perform an "OCR" that takes
>>advantage of our knowledge of what we're looking for. So we don't
>>need to recognize free text, we only have to determine which
>>candidate's name is printed, which should make it much more
>Agreed. In fact, I gave some examples about Levenshtein distance
>and the like.

Modularity precludes using the contests for decoding the OCR. (For
verifying a correct read, yes, but not for determining *what* was

Apparently a message of mine got truncated. Here's the story:

Too error prone. It reminds me of the classic story of the speech
recognition system at CMU that was tied to a chess playing program
(to get a domain of vocabulary). When it was the human's turn to
move, the potential moves would be computed and compared with the
spoken input, and the best match was taken as the move. However,
when there was a "mate-in-one" move, all you had to do was cough into
the microphone, and that would be the best match.

Best regards,

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Wed Jun 30 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT