Re: bar code bit encoding

From: Alan Dechert <adechert_at_earthlink_dot_net>
Date: Mon Sep 15 2003 - 15:47:19 CDT

In a message dated 9/13/03 1:30:00 PM Pacific Daylight Time,
evm@1reality.org (Chris Schaefer) writes:

> Architectural document is a HUGE improvement. Thank you.
>
I agree. Good work, David!

I see several big-picture issues in your post, Chris.

Let's keep three things somewhat separated:

1) The production system
2) The demo barcode
3) The tabulation demo

You are working on the tabulation demo. Strictly speaking, the demo barcode
has nothing to do with the tabulation demo. The data format you use for the
tabulation demo will be whatever we come up with for vote-selection.xml.
This will include write-in names, which will not be in the barcoded data.
The data you use in the demo will be dummy data, the raw format of which
will be what's recorded on the voting machine PCs.

Certainly, your ideas on the barcode scheme are welcome but we want to make
sure we don't get bogged down here. I generally favor looking forward to
the production system in our design for the demo, so long as we don't impede
progress on the demo working on issues for the development of the production
system. The scope of the demo is already quite large.

My problem is that Jan (with some input from me) has described a system
that, while not optimal, we know will work for the demo. He needs 22
symbols to represent the choices and two more to represent the ballot
number. That's 24 symbols and we are pretty confident that this can be read
easily by cheap commodity barcode scanners. I mentioned that 30 symbols has
been mentioned by some experts with 1-D barcodes as a practical limit.
David confirmed this number researching this independently. He said,

     The limit Alan states of aproximately 30 symbols
     for a readable 1-D barcode is on target.

http://gnosis.python-hosting.com/voting-project/September.2003/0085.html

David has also pointed out that long integer math is no sweat in Python. So
there appears to be no technical barriers to implementing the barcode as Jan
has described. We all know it's not optimal but we also know it will work.

You have made some suggestion for improving the design. David has endorsed
those suggestions and added more suggestions. I am interested especially if
it will result in a shorter barcode. Jan's is 24 symbols and about 2.5
inches. You mention 11 bytes although I'm not sure if equates to 11 symbols
since we only have 100 (not 128 or 256) to work with in character set C of
Code 128. We want to keep the barcode short to ensure readability, and I
think it will be aesthetically better to have a shorter one as opposed to
longer.

One thing you said is just plain wrong:

     It's important to remember that compression system
     will not work well on very short lengths of data like ours.
     They're designed for kBytes at a minimum, not
     segments of 50 bytes or so.

It may be that some readily available compression algorithms won't work with
our data, we know others that will work. I described a scheme to get it
down to 10 bytes (actually it was a little bit wrong because I was figuring
7-bit characters with the full 128 character set... we've since learned that
we will only be able to use 100 symbols so my scheme might result in 11 or
12 symbols for encoding all the selections). Arthur countered my crummy
demo-specific system with a more general scientific solution.

Maybe you could have a look at the Arthur's paper ("A Version Numbering
Scheme with a Useful Lexicographical Order") mentioned here:
http://gnosis.python-hosting.com/voting-project/August.2003/0237.html

However, no one has written any code to implement his "bit-dense continuous
coding scheme with varying length codes" for our demo.

If your idea will result in a shorter bar code AND satisfy developer
aesthetics, then I'm all for it. However, the idea needs to be completed.
David is out-or-town and so if you want to see the barcode data encoded this
way, then please help to see that this gets done soon. It behooves us --
for a great many reasons -- to get the demo done as soon as we can.
Re-doing the barcode scheme could delay us for a week or more.

Please, for the demo, let's not get too wound up about how many races we can
handle or how many candidates or choices per contest we can handle. Taking
some of this into consideration is fine but the demo only involves a total
of 13 contests. We are not building the production system right now.
Please let's not expand the demo beyond the already large amount of stuff we
are demo-ing.

Jan has agreed to take on the ballot printing routine. Please communicate
with Jan about implementing your idea.

Alan D.

> here are some of my thoughts:
>
> I started by trying to find the real amount of information ( ie the
> minimum size we could compress to )
> For each item on the ballot I see the following 4 types of responses:
>
> i) YesOrNo - requires 2 bits or log2(3) ( remember lack of
> choice needs to be represented )
> ii) PickOne - requires log2(M+1) bits where M is
> number of choices
> iii) PickNofM - requires N * log2(M+1) bits where M is number of
> choices and N is number to pick
> iv) Rank - requires M * log2(M+1) bits where M is number
> of choices to rank.
> (note all fractional bits are rounded UP )
>
> (Obviously a write-in situations complicates things a bit on the bit
> counting. But, we can encode the standard 26 letters plus a space in
> 5 bits so that's 5 bits X #letters )
>
> For purposes of discussion I have separated out PickOne and YesOrNo
> from PickNofM just for consideration in vote count systems, even
> though it's clear that one is a subset of the other. Further you
> might argue that a single rank item Rank is a composed of a sequence
> of PickOne items ( where you pick a single position for each
> candidate ).
>
> So I'm getting somewhere around 72 bits of data to represent our
> sample ballot without write-in text. I don't believe any compression
> scheme will find much to squeeze out of that. It's important to
> remember that compression system will not work well on very short
> lengths of data like ours. They're designed for kBytes at a minimum,
> not
> segments of 50 bytes or so. The best way to compress this data is to
> represent it in as compact a fashion as possible.
>
> Now Jans encoding scheme is to move the 72 bits representation as a
> single _VERY_ large binary number to decimal, and then use the code
> 128 decimal scheme to represent that. Before I continue, let me
> just say as a programmer that dealing with weird bit-aligned fields
> in data is very difficult and error prone. In the interests of
> clarity my inclination would be to move to byte aligned data, or
> perhaps nibble (4 bit) aligned data.
>
> In this fashion each PickOne field would be represented by either a
> nibble for most election or a byte for exceptional election ( like
> our recall governor race here in CA ). That would give us room for
> 14 candidates, a write-in and a no-mark in the case of a nibble or
> 253 candidates, a write-in and a no-mark in the case of the byte.
>
> If we go the nibble route we can represent the sample data in 22
> nibbles or 11 bytes. (88 bits for those of you who are counting
> bits ).
>
> A final option would further increase the size slightly but eliminate
> the need for large integer math. Basically store up to 8 candidates
> and a write-in and a nomark. This would be a single decimal digit.
> The expanded version would allow for 98 candidates ( yuck, doesn't
> fit CA election ).
>
> Each of these proposals could utilize the XOR visual obfuscation method.
>
> Before we continue to much further can we find out some salient info
> 1) How many ballot items will we need to represent?
> 2) How long a barcode can we tolerate?
>
> These will allow us to determine how much additional info ( like
> ballot item # , and vote type ) we can include along with the vote
> data.

==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Tue Sep 30 23:17:05 2003

This archive was generated by hypermail 2.1.8 : Tue Sep 30 2003 - 23:17:09 CDT