bar code bit encoding

From: Chris Schaefer <evm_at_1reality_dot_org>
Date: Sat Sep 13 2003 - 15:29:17 CDT

Architectural document is a HUGE improvement. Thank you.

here are some of my thoughts:

I started by trying to find the real amount of information ( ie the
minimum size we could compress to )
For each item on the ballot I see the following 4 types of responses:

i) YesOrNo - requires 2 bits or log2(3) ( remember lack of
choice needs to be represented )
ii) PickOne - requires log2(M+1) bits where M is
number of choices
iii) PickNofM - requires N * log2(M+1) bits where M is number of
choices and N is number to pick
iv) Rank - requires M * log2(M+1) bits where M is number
of choices to rank.
                            (note all fractional bits are rounded UP )

(Obviously a write-in situations complicates things a bit on the bit
counting. But, we can encode the standard 26 letters plus a space in
5 bits so that's 5 bits X #letters )

For purposes of discussion I have separated out PickOne and YesOrNo
from PickNofM just for consideration in vote count systems, even
though it's clear that one is a subset of the other. Further you
might argue that a single rank item Rank is a composed of a sequence
of PickOne items ( where you pick a single position for each
candidate ).

So I'm getting somewhere around 72 bits of data to represent our
sample ballot without write-in text. I don't believe any compression
scheme will find much to squeeze out of that. It's important to
remember that compression system will not work well on very short
lengths of data like ours. They're designed for kBytes at a minimum,
segments of 50 bytes or so. The best way to compress this data is to
represent it in as compact a fashion as possible.

Now Jans encoding scheme is to move the 72 bits representation as a
single _VERY_ large binary number to decimal, and then use the code
128 decimal scheme to represent that. Before I continue, let me
just say as a programmer that dealing with weird bit-aligned fields
in data is very difficult and error prone. In the interests of
clarity my inclination would be to move to byte aligned data, or
perhaps nibble (4 bit) aligned data.

In this fashion each PickOne field would be represented by either a
nibble for most election or a byte for exceptional election ( like
our recall governor race here in CA ). That would give us room for
14 candidates, a write-in and a no-mark in the case of a nibble or
253 candidates, a write-in and a no-mark in the case of the byte.

If we go the nibble route we can represent the sample data in 22
nibbles or 11 bytes. (88 bits for those of you who are counting
bits ).

A final option would further increase the size slightly but eliminate
the need for large integer math. Basically store up to 8 candidates
and a write-in and a nomark. This would be a single decimal digit.
The expanded version would allow for 98 candidates ( yuck, doesn't
fit CA election ).

Each of these proposals could utilize the XOR visual obfuscation method.

Before we continue to much further can we find out some salient info
1) How many ballot items will we need to represent?
2) How long a barcode can we tolerate?

These will allow us to determine how much additional info ( like
ballot item # , and vote type ) we can include along with the vote


Chris Schaefer                                   Email: 
             Professional Bit Twiddler and student of reality.
                 "Global Information, Local Production"
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Tue Sep 30 23:17:04 2003

This archive was generated by hypermail 2.1.8 : Tue Sep 30 2003 - 23:17:09 CDT