RE: OCR/barcode reliability

From: Popkin, Laird (WMG Corp) <"Popkin,>
Date: Thu Jun 03 2004 - 18:14:45 CDT

Yes, the error rates aren't "apples to apples" -- as I indicated, the
barcode error rate is per barcode read, while the OCR error rate is
per-character for "generic OCR", and that constraining the character set and
vocabulary would clearly help compensate for errors in the base OCR
recognition. For example, if you know that you're looking for "Fred" or
"Bob" you can safely assume that a read of "B0b" isn't a vote for Fred. That
being said, barcodes get their accuracy through redundant coding, ECC's,
etc., that OCR's lack (unless we add ECC, etc., as a layer of output in our

I found an error rate for MICR on checks -- one error per 20,000 to 30,000
checks. That's still much higher than barcodes, but is much, much better
than generic OCR.

I bet if you added redundancy and ECC to the OCR that might get the accuracy
to match barcodes. One of the reasons that barcodes are so accurate is that
misreads can be detected and corrected (either corrected by the ECC or
re-scanned); since OCR lacks ECC codes, you can't detect errors the same
way, leading to higher error rates. We could add ECC to our ballots by
putting a line of 'gibberish' at the end of the ballot, and that would push
our accuracy rates way up.

Based on your wonderful story about the chessk match, I wonder if someone
could challenge an election based on OCR technology, if for some reason an
error (e.g. the equivalent of the 'cough into the microphone') would be
recognized as one candidate more often than the other for some reason.

- LP

-----Original Message-----
[]On Behalf Of Arthur
Sent: Thursday, June 03, 2004 1:56 PM
Subject: Re: [voting-project] OCR/barcode reliability

At 1:28 PM -0400 6/3/04, David Mertz wrote:
>On Jun 3, 2004, at 1:06 PM, Popkin, Laird (WMG Corp) wrote:
>>In general, barcodes are more accurate than OCR. The error rates I
>>see quoted are:
>>Barcodes: 1 error per 1m barcode reads.
>>OCR: 2% error rate (20,000 errors per 1m character reads).
>This is surely apple-to-oranges.
>I can't imagine a 2% error rate reading OCR-A/B. It is quite
>believable for "some unknown font that might be used in a book or
>magazine" though.
>I think it is safe to say that if we decide to go with OCR of
>ballots (either solely, or redundantly with a barcode), we should
>print the votes in an OCR font. Maybe even MICR, if that's actually
>more reliable than OCR-A.

Does MICR encode as many characters as OCR-A?

>>Even though OCR is typically much less accurate than barcodes, in
>>our case if we're clever we can perform an "OCR" that takes
>>advantage of our knowledge of what we're looking for. So we don't
>>need to recognize free text, we only have to determine which
>>candidate's name is printed, which should make it much more
>Agreed. In fact, I gave some examples about Levenshtein distance
>and the like.

Modularity precludes using the contests for decoding the OCR. (For
verifying a correct read, yes, but not for determining *what* was

Apparently a message of mine got truncated. Here's the story:

Too error prone. It reminds me of the classic story of the speech
recognition system at CMU that was tied to a chess playing program
(to get a domain of vocabulary). When it was the human's turn to
move, the potential moves would be computed and compared with the
spoken input, and the best match was taken as the move. However,
when there was a "mate-in-one" move, all you had to do was cough into
the microphone, and that would be the best match.

Best regards,

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Wed Jun 30 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT