Re: Proposed Electronic Ballot Image Format for the Demo

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Sat Aug 23 2003 - 15:20:56 CDT

The "Y" below is for ballot type. For example, for primary races by
party, the "Y" represents the party.

I don't understand why the ballot selections are fixed length.
Shouldn't it depend on the length of the ballot? Also, two digit
codes for ballot contests may not be enough. If you use 2
hexadecimal characters (0-F), you support 256 choices in the same
space.

As far as ballot vote coding is concerned, here is a proposal:

Assume the races on any given ballot are in fixed order.

Assume all zeros is invalid
Assume the number 1 means no vote selected.
Assume the highest number (I'll define this later) means that there
is a write-in vote for this race. The write-in votes are in the same
order as the races themselves, for those races for this ballot for
which there is a write-in vote.
This means that if there are 6 declared candidates, then the
encodings are 1 (no vote), 2-7 (the candidates in a defined order),
and 8 (write-in).
Numbers are in binary coded octal, transformed to make them
self-delimiting. See my paper in
http://www-db.stanford.edu/pub/keller (Arthur M. Keller and Jeffrey
D. Ullman, "A Version Numbering Scheme with a Useful Lexicographical
Order," appeared in Int. Conf. on Data Engineering , Taipei, Taiwan,
March 1995. Also a Postscript file), which includes a discussion on a
self-delimiting number scheme. With 3 bits (it's octal, remember?),
you can code up to 3 choices, and with 6 bits you can code up to 19
choices. And 9 bits can code up to 147 choices. (If you want, you
can have more short codes, but at the risk of making the longer codes
even longer.)
An advantage of a self-delimiting variable-length codes is that it
makes it harder to compare votes, particularly if there are many
votes.
We could use binary coded hexadecimal instead. Then 4 bits code 7
choices, 8 bits code 71 choices, and 12 bits code 1095 choices.
It's best to stick with one coding scheme and not switch for each race.
Now, calculate the maximum number of bits possible for any ballot and
add 9 bits (or some number that isn't equal to one bar-code character
and is "big enough."

Now suppose that the maximum number of bits in a race is 500 bits
(not counting the extra 9 bits) and suppose some ballot is 300 bits
long. Then you have 200 bits extra. Since you don't want to make
"short" ballots or "long" ballots stand out, except for write-ins,
you can place the 300 bits starting anywhere between bits 1 and 201.
The 9-bit (or whatever) prefix tells you where the 300 bits start.
So choose a pseudo-random number between 1 and 201. The "extra" 200
bits are placed at the beginning and/or the end of the ballot, and
they are pseudo-randomly chosen bits. Now it is impossible for the
human to compare ballots by eyeballing them. Yet it is extremely
easy to code and decode.

Now suppose we use a barcode that has 7-bit characters. Then adjust
the "maximum" number of bits to be 7 more (1 more character) than
the original maximum number of bits rounded up to the next multiple
of 7. This way, every ballot has the possibility for *some*
shifting, even if it is a maximum length ballot.

Now, I suggest that we use 3-bit (octal codes) as they will work best
for this scheme for yes/no propositions that have 3 choices (the
write-in choice is invalid, but the "no-vote" choice *is* valid), and
those choices proliferate the ballot, at least in California.

Thanks.

Arthur

At 11:23 AM -0700 8/23/03, Alan Dechert wrote:
>I propose that the raw vote data be stored as follows in a plain comma
>delimited character format -- one row per ballot.
>
>Like so,
>
>STCOPRECTXXXXYCCCCCCCCCC, 02 MY WRITE-IN FOR SENATE, 05 MY WRITE-IN FOR ATTY
>GEN
>
>Where STCOPRECT stands for State, County, and Precinct. XXXX is the ballot
>number. Y is an extra character we may or may not use -- it could be used
>to designate alternative schemes for mapping characters to bit patterns ...
>or it could be used for something else. The ten "C"s hold the encoded
>ballot selections. Write-in names follow with the 2-digit number indicating
>the applicable contest (should the write-in names be encrypted?).
>
>This fixes the basic part of the ballot image to 25 characters. This will
>help us decide on a bar code scheme. We don't plan to bar code write-ins
>names at this point.
>
>We will need a to write a routine to "uncompress" the data. With a table of
>ballot images that look like what I've described above, we should be able to
>punch a "uncompress" button and get the data written out like so:
>
>STCOPRECTXXXXY, RACHEL CARSON JOHN MUIR,MY WRITE-IN FOR SENATE,LILLIAN
>HELLMAN, .. ETC
>
>Note that write-ins will be inserted in the correct position and the other
>selections are written-out.
>
>Please sign-off on this design if you think it's okay. Let us know right
>away if you want it different.
>
>Keep in mind that this is for the demo. The production system will likely
>be a little different -- maybe another character or two for other attributes
>and, of course, the ten characters for the encoded selections could be more
>or less than ten.
>
>-- Alan Dechert

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Sun Aug 31 23:17:14 2003

This archive was generated by hypermail 2.1.8 : Sun Aug 31 2003 - 23:17:18 CDT