Re: Bar code choice

From: <Adechert_at_aol_dot_com>
Date: Fri Sep 12 2003 - 18:53:05 CDT

In a message dated 9/12/03 3:35:19 PM Pacific Daylight Time, writes:

> There is an additional requirement (discussed and agreed earlier) that
> is not reflected in Jan's otherwise quite excellent samples. In order
> to prevent easy visual identification of cast votes via the barcodes, we
> will pad the position of actual vote data by a random amount, per ballot.

Respectfully, David, the scheme we agreed was good (posed by Arthur) also
included a scheme for compression.

Jan's proposal does not use compression. Padding with many more symbols will
make a very long barcode and we don't know what readers will be able to read
it accurately. Jan is encoding the 35 digits (plus leading zero) with 22
symbols. It's already about 2.5 inches long. If we increase that to 40 symbols
(60 digit or so decimal number) plus 2 more for the ballot number, that's may
be a barcode of 5 inches or more. From my research on this topic, a barcode
this long will be a problem.

Jan's work proves that we can get by for the demo without compression.
However, if we add the padding feature (to help ensure non-human readability and
establish a fixed length for the barcode), we may also need the compression.

The 116-bit string will be highly compressible since only a tiny fraction of
possible 116-bit strings can result from the pattern of selections on our
sample ballot.

If someone could write a function using Arthur's compression scheme that
would take the 116-bit string (or any length up to 500 or so) and return a short
string of symbols (and, of course, give us a function to decode the compressed
string), then, presumably, we'd have a very short string of symbols (less than
10 certainly) and then you can add lots of padding without making an
excessively long barcode.

If no one has the time to write the compression (and decompression) function,
then before we add the padding, we might want to check to see if anyone can
eyeball the barcode and figure anything out -- especially after adding two
leading symbols representing the ballot number (a random 4-digit number).

Before we go for the non-compressed padding, maybe we could do a test. Say
we give Jan 10 different electronic ballot images (i.e., ten 116-bit strings
that could result from voting on our example ballot). Let Jan add random
4-digit ballot numbers (encoded with two leading symbols) then give us the ten
barcodes. Let's see if anyone could match up the barcodes with the ballots just by
looking at them. If no one can do a decent job of matching them up then we
should not worry about the padding, for the demo at least.

If someone(s) can figure out which is which by looking at the barcodes, then
we could also look at some other alternative that would not increase the
length of the barcode.

For example, we could vary the starting point for the string of symbols and
allow it to wrap. Let's say the 24 symbols were (22 for selections and 2 for
ballot number):


Where YZ represents the ballot number and the rest represents the selections.
 We could confuse the eyeballer by starting in a different place... like so:


and we could add a symbol that tells where the string really begins.

Say we add N which represents the position the string is really supposed to
start at;


So now the barcode for identical ballots will look different but it will
still be short enough without using compression.

Note that even with the non-compressed string that Jan proposes, we still
need to be able to convert a long binary number to a long (35 digit) decimal
number. Do we know if this capability is readily available in python?

> It looks like Jan's sample encodes about 36 digits of information. We
> can probably pad that within the size of the ballot to 60 or 70 digits.
> The first two digits will simply indicate the offset to get to the
> real data. So, for example, Jan's data is:
> 083076749736557242056487941267521536
> Two different voters who vote identically will have on their ballot
> distinct encoded strings, such as:
> 13ddddddddddddd083076749736557242056487941267521536dddddddddddddd
> and
> 21ddddddddddddddddddddd083076749736557242056487941267521536dddddd
> (or rather, the barcode version of these).
> Where the 'd's are random decimal digits. I am confident that that is
> sufficient to prevent elections workers who see numerous exposed edges
> during a day from beginning to recognize vote patterns.
I am confident of that too. I am less confident that we know that our
barcode readers will be able to handle very long barcodes without problems.

Alan D.
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Tue Sep 30 23:17:03 2003

This archive was generated by hypermail 2.1.8 : Tue Sep 30 2003 - 23:17:09 CDT