Re: Compression, encoding, entropy

From: Alan Dechert <alan_at_openvotingconsortium_dot_org>
Date: Sun May 02 2004 - 14:03:31 CDT

Charlie,

> As someone who actually got their PHd doing information theory
> I'm really bewildered by the barcod length thread, .....
>

I'll try to summarize:

For a variety of reasons, I decided to include a barcode representation of
the selections on the ballot (It was actually first suggested to me over 3
years ago by the assistant registrar of voters, Ryan Ronco of Placer
County--the county in which I reside).

For the demo, a very inexpensive barcode system was an absolute requirement
since we had no budget for this. I needed, 1) Free barcode software, 2)
very inexpensive readers, and 3) adequate data capacity.

After a little research, I decided that Code128 was the best way to go for
the demo. I found GPL software (by Jan Karrman), I was able to buy 18
readers for about a dollar each including shipping, and the data capacity
was more than adequate. I mailed the barcode readers to various project
participants at a postage cost of $1.37 ea (slightly more in some
circumstances... I mailed two of them to Jan in Sweden). The cost for
postage was more than the cost of the units.

There is a practical limit to the length of the barcode. I thought about
methods to "compress" the data. It turns out, the use of this term caused
some confusion. Maybe "optimal encoding" or something like that would have
been more palatable. In any case, we went for a straight conversion of the
bit pattern to a decimal number. This scheme was hashed out between me and
Jan Karrman (David Mertz suggested an obfuscation routine so that similarly
voted ballots would have different-looking barcodes... this was unnecessary
for the demo but it was a nice touch and easy enough to do. Jan Karrman
made the last call on how this was done).

The encoding wasn't very efficient but it didn't matter since the resulting
3-inch barcode was no problem for barcode printing/reading. I thought of a
method that would have shortened the barcode to about 1.5 in. Arthur
discussed a technique he had that.

On Aug 23, Arthur summarized,
http://gnosis.python-hosting.com/voting-project/August.2003/0237.html

> Numbers are in binary coded octal, transformed
> to make them self-delimiting. See my paper in
> http://www-db.stanford.edu/pub/keller (Arthur M.
> Keller and Jeffrey D. Ullman, "A Version Numbering
> Scheme with a Useful Lexicographical Order,"
> appeared in Int. Conf. on Data Engineering , Taipei,
> Taiwan, March 1995. Also a Postscript file), which
> includes a discussion on a self-delimiting number
> scheme. With 3 bits (it's octal, remember?), you can
> code up to 3 choices, and with 6 bits you can code
> up to 19 choices. And 9 bits can code up to 147
> choices. (If you want, you can have more short codes,
> but at the risk of making the longer codes even longer.)
> An advantage of a self-delimiting variable-length codes
> is that it makes it harder to compare votes, particularly
> if there are many votes.
>
I would like to have explored this and other methods for encoding choices
with as few bits as possible. It wasn't necessary for the demo so I dropped
the idea.

Now as we start to think about the production system, natually people wonder
if the 1-d Code128 system will be adequate. Some people say, "let's go to
2-d." My response has been, "I don't know. This needs to be studied."

We did not scratch the surface on possible encoding schemes. This wasn't
necessary for the demo. So I'm not prepared to say that the 1-d Code128
system is inadequate. I think it's safe to say that if all jurisdictions
went to ranked preference voting, it would be inadequate. But should we let
the tail wag the dog? If one county out of 2,219 counties goes for ranked
preference, do we go to a 2-d system even if the other 2,218 counties don't
need it? What are the cost differences?

Here are some more research questions:

What all will we include in the barcode?

How big of a ballot can we safely encode in the barcode?

What percentage of ballots does this encompass?

If the percentage is not 100%, what difference does it make?

Can our system support both 1-d and 2-d barcodes?

Consider that the largest county (Los Angeles) has a population about 8
times that of the 25th largest county. What if 2100 counties requirements
could be met with 1-d and the other 119 need 2-d? Would it make sense for
our system to require more expensive readers for the 2100 counties that
don't need them?

No decision has been made on going to 2-d barcodes. No decision will be
made about this until these questions--and other questions--are thoroughly
examined. Very likely, this level of research will only come after we get
some funding. I think this is too involved for an all-volunteer effort.

I suspect we can support both 1-d and 2-d barcodes no problem, and that 1-d
will be adequate for the vast majority of counties.

Alan D.

==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Mon May 31 23:17:02 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT