Re: Notes #1 from KSG/NSF symposium

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Wed Jun 02 2004 - 18:57:38 CDT

At 10:17 AM -0600 6/2/04, Charlie Strauss wrote:
>OCR fonts may not be needed at all. The problem of OCR scanning a
>ballot is greatly simplified by the fact that the text on the ballot
>is not general text but a selection from a small menu of possible
>choices in a known format.

Context-based scanning too complex. OCR is modular.

> in most cases one would not even need to resolve the font but
>merely measure the length words in the text.

Too error prone. It reminds me of the classic story of the speech
recognition system at CMU that was tied to a chess playing program
(to get a domain of vocabulary). Each move would be computed and
compared with

> Not that you would want to rely on this exclusively, but one
>could even tweak the proportional spacing to gaurentee line length
>produced a unique encoding.
>
>of course actually implementing the above any OCR would be tricky
>and require much more than a bar code scanner and might even be a
>tad slow.
>
>Another desirable feature of a barcode scanner is that it is dumb.
>You can deliberately starve the bar code of excess information
>content and thus mitigate if not eliminate hypothetical collusion
>between the voting machine and the scanning machine.
>
>If one uses a true OCR font then if you are thinking about the ones
>I've seen then printing small fonts becomes problematic. Maybe
>there are more modern ones?

I wouldn't use anything smaller than 12 point for human readability.

>The advantage of relying on OCR is that as long as you are willing
>to trust the OCR scanner, and it can be made fast enough, then
>perhaps one could avoid the need for a separate verification station
>entirely.

You need such a verification station for those who can't read.

Best regards,
Arthur

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Wed Jun 30 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT