Re: OCR/barcode reliability

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Thu Jun 03 2004 - 15:25:27 CDT

On Jun 3, 2004, at 2:59 PM, charlie strauss wrote:
> I'd like to suggest a requirement that any method of reading, barcode
> or OCR, has to be capable of being done fast enough to keep up with a
> sheet feeder.

Seems reasonable.

> Barcodes can do this. bank checks which are pretty simple can do
> this. generalized OCR like you use to scan arbitrary documents will
> be hard pressed. So my supposition is its going to have to be very
> tailored OCR if you want to meet the above criteria.

I entirely agree that generalized OCR isn't something we should worry
about. Limiting the font to OCR-A/B (or something similar) is much
more specialized. Moreover, even past that we should be able to limit
the regions analyzed to a relatively small portion of the ballot page
(per election, each will differ).

> As long as we are brainstorming about how to resolve ambiguous OCR
> reads (e.g. Levenshtein distance)... recognizing and then aligning
> characters might not be as good or as fast as simply matching the
> image directly. for example, once the page orientation is known,
> simply use a standard fourier transform to convolve it against the
> possible correctly typed items. and voila.

This is a really interesting idea. I have not worked in image
processing, so have few intuitions about this area. But it does seem
like if you know you are looking 5.3" down and 2.7" from the left
margin for one of the three strings:

   Lu Win
   Tomas Singleton
   Krishnamurthi Ramavissipan

...or looking for something that starts with [W] in the case of a
write-in (or some other special mark)

It really shouldn't be hard to discern among the few possibilities,
using just pixels and statistics. Even Karl's chocolate smudges
shouldn't prevent discernment, since a lot more will still be similar
to the desired reference image than not.
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Wed Jun 30 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Wed Jun 30 2004 - 23:17:29 CDT