Re: OCR/barcode reliability

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Thu Jun 03 2004 - 14:14:42 CDT

Here are some hardware decoding barcode readers I found doing some
Internet searches.

See http://www.consumerschoicepos.com/metrologic_ms9544_barcode_scanner.html

http://www.buy.com/retail/product.asp?sku=10336465&loc= and
http://www.waspbarcode.com/scanners/wlp_4170_ccd_barcode_scanner.asp

http://www.posmicro.com/Scanners/WALLYN/IMAGETEAM%203800.htm and
http://www.hhp.com/hhp/products/product.tpl?prodsku=90391990996489

The last one can do 270 PDF417 2-D scans per *second*.

That's probably faster than the typical sheet feeder. Those are
typically rated at maybe 100 sheets per *minute*.

Best regards,
Arthur

At 11:59 AM -0700 6/3/04, charlie strauss wrote:
>I'd like to suggest a requirement that any method of reading,
>barcode or OCR, has to be capable of being done fast enough to keep
>up with a sheet feeder. That is perhaps it would be a good idea to
>be able to recount 100,000 ballots per day on a reasonable size
>peice of machinery. Or make up you own reasonable number. The
>point is if you have to do a recount it after the precincts closed
>it would be nice to be able to do this with a small staff using
>automated scanners, and that is going to require high throughput.
>
>Barcodes can do this. bank checks which are pretty simple can do
>this. generalized OCR like you use to scan arbitrary documents will
>be hard pressed. So my supposition is its going to have to be very
>tailored OCR if you want to meet the above criteria.
>
>As long as we are brainstorming about how to resolve ambiguous OCR
>reads (e.g. Levenshtein distance ) I'll make two comments. one is
>that I would assume this is a well plowed row in some body of
>literature, but I'm not personally familiar with it (aside from
>sequence alingment in genomics). two is that recognizing and then
>aligning characters might not be as good or as fast as simply
>matching the image directly. for example, once the page orientation
>is known, simply use a standard fourier transform to convolve it
>against the possible correctly typed items. and voila. One could
>do better than a fourier convolution given a proper error model
>(e.g. use wavelets if image strteching is allowed). Given were
>matching to a very very very small set of possible answers it makes
>sense to match the responses wholistically as an image not character
>by character. The FT approach would be potentially very fast and
>able to keep up with a sheet feeder, if this were to be done on a !
> lot of ballots.
>
>
>
>
>-----Original Message-----
>From: David Mertz <voting-project@gnosis.cx>
>Sent: Jun 3, 2004 11:20 AM
>To: voting-project@lists.sonic.net
>Subject: Re: [voting-project] OCR/barcode reliability
>
>Arthur Keller <arthur@kellers.org> wrote:
>|Modularity precludes using the contests for decoding the OCR. (For
>|verifying a correct read, yes, but not for determining *what* was
>|read.)
>
>Did you read my note that gave a Levenshtein distance example?
>
>I'm not sure what principle you think you have in mind under the name
>modularity. Certainly the OCR software should be generic, if only
>because such packages are widely tested outside OVC (and it takes
>thousands of programmer hours to develop such software; we might as well
>benefit from the Free Software community).
>
>But that's only a first pass. After you generically read a ballot, you
>can apply another pass to make sense of it. That is, if a given name
>does not exactly match any available candidate (and is not marked as a
>write-in), there is no reason not to figure out what the intention was.
>Certainly not anything having to do with modularity.
>
>Obviously, we need to decide parameters. If what was read has a
>Levenshtein distance of 2 from one valid candidate, and a distance of
>over 50 from every other candidate in that contest, I feel entirely
>comfortable declaring the intention as the near match. However, if two
>of the candidates are:
>
> Maria Cruz
> Mario Crump
>
>I wouldn't want to make any guesses about the apparent value of:
>
> Ma*ic Cruo
>
>It's not just the absolute Levenshtein distance we should look at, but
>the distribution of them to all the valid names. Enough skew is enough
>confidence. In any case, it's easy to flag EVERY non-perfect match as
>requiring manual confirmation (call the fuzzy match "provisional
>results").
>
>In edge cases, Karl's chocolate-covered voters might necessitate manual
>examination of ballots

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Wed Jun 30 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT