Re: OCR/barcode reliability

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Thu Jun 03 2004 - 19:45:04 CDT

charlie strauss <cems@earthlink.net> wrote:
|Or as I mentioned earlier in this thread you could encode the
|proportional spacing. hide the ECC code there.

Another kind of transparent redundancy would be to just include the
ballot order as part of the vote. It's not sneaky steganography, like
hiding various width spaces amounts to.

For example, my prior example might have a ballot with:

  Moose Catcher ---> (3) Krishnamurthi Ramavissipan

The numeral three refers to the fact that the candidate is the third one
listed on the ballot screen (or the third one read out in the RII).
It's not hidden information, every voter can perfectly well recognize
what it encodes (and the system would be documented in voter manuals,
etc). But if Krishnamurthi isn't the third candidate, this flags some
problem (maybe a software problem in the vote station, maybe something
smudged on the paper, whatever).

It's not *exactly* an ECC, but it achieves much the same thing.

This suggestion is neutral as to whether we use OCR per se, or use
Charlie's direct bitmap comparison.

Laird Popkin wrote:
|I found an error rate for MICR on checks -- one error per 20,000 to 30,000
|checks. That's still much higher than barcodes, but is much, much better
|than generic OCR.

I find this rate quite acceptable. Arthur hinted that MICR might not
have a full character set, but I think OCR-A/B should have similar
reliability. A misread for us doesn't mean a wrong vote (with high
likelihood), it just means a hand count of that one ballot out of 20
thousand.

Since checks have little redundancy (most account numbers are valid), we
should be able to do (much) better than 1/20k, even without adding
explicit ECCs to an OCR-ish system. The candidate names have lots of
redundancy. One wrong character breaks a bank check scan, one wrong
character is no big thing to us since it is correctable.

I'm just agreeing with Laird here, of course. However, I would resist
adding back in explicit ECCs... I think a string of random-looking
characters at the bottom of the ballot would raise the very same doubts
in voters' minds as a non-human-readable barcode does.

Overall... I think I lean in the direction of hiding any crypto
information in sneaky steganography in the watermark. The globally
unique ballot identifiers can go in plain English at the top of the
ballot (date, state, precinct, ballot-id, etc)... we don't need to
squeeze it into the 36 or 40 bits I've written of before.

Most voters won't be bothered by the stego, because they won't think of
it. But those who do think of it are welcome to read the documentation
of how it works, and the source code that implements it. Voter
confidence is important; even if that confidence is not accompanied by a
complete understanding of the system by all voters.
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Wed Jun 30 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT