OCR/barcode reliability

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Wed Jun 02 2004 - 20:21:41 CDT

On Jun 2, 2004, at 8:01 PM, Arthur Keller wrote:
> Barcodes are also more reliable than OCR.

I find this claim unlikely if we are talking about OCR fonts like
OCR-A. At the least it is not supportable without some empirical
evidence. Strong counter-evidence is provided by the banking industry,
who process literally billions of checks every day, with extremely low
error rates, using OCR fonts.

Incidentally, I think that Karl's suggestion to go by field length
alone is probably slightly tongue-in-cheek. It's certainly not a good
idea to use field length alone. However, you CAN utilize general
features of fields that are selected from known lists. In my prior
example, the Hamming distance or Levenshtein distance between

   Krishnamurthi Ramavissipan
   Krish***urthi Ram**issipan

is quite small. Even if a couple of those asterisks turned into wrong
characters, it's recoverable. Perhaps a write-in value would need to
always carry some special mark so we know not to try Levenshtein
measures. I.e. if you write-in you might have a ballot with:

   Moose Catcher ---> [W] Krishnamurti Ramirassipan

The initial '[W]' mark will prevent a false match to the listed
candidate.

While barcodes indeed have excellent built-in error correction, if any
single-bit errors make it through the ECCs, that amounts to a changed
vote. In contrast, a non-write-in that flips a character or two is
still easily matched. This probably amounts to even better ECCs, even
though they are a side-effect of the human-readability (English uses
about two bits of entropy per character, while ASCII uses 7 bits...
that's quite a bit of Arthur's favorite thing: redundancy).
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Wed Jun 30 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT