# Re: How big could a bar code be if a bar code could be big?

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Fri Apr 30 2004 - 16:58:14 CDT

On Apr 30, 2004, at 5:31 PM, Karl Auerbach wrote:
> It's my sense that we need to move out of "the demo" thinking and
> begin thinking of the first pilot system.

Yes! Totally right.

> Compression doesn't always compress - it's a gamble.

That's exactly the point I was trying to make also. But we CAN decide
to use efficient encodings of the total "vote space" It's easy to
compute the exact inherent entropy of different types of contests:

Single-selection:
for N = Number of candidates/options
V(N) = log2(N)

N of M (unordered):
for N = slots
M = candidates
C = combinations N of M = M!/N!(M-N)!
V(N/M) = C(M/N) + C(M/N-1) + ... + 1

Ranked order:
See my other post. Basically, this is the largest factor, by far.

For example, the demo ballot with 8 slots, and 8 candidates (including
write-in) has 17 bits of entropy.

I haven't gone through the demo for a precise count... but eyeballing,
I think there's probably less than 30 bits of entropy in the demo
ballot, total. I could write a little program to determine exact
entropy given any collection of contests, if someone helps that
illustrate some real races. We could test the information content of
some known past elections.

>> As discussed in this OCT thread, we added error detection code...
> I haven't seen that in the code. I've seen range and value checks.
> But I
> don't see anything that resembles a CRC or ECC.

It's more just the fact that a sparse vote encoding only makes a small
portion of all digit strings "valid." It's not really ECC, per se.

> (A Hamming ECC code would be a nice touch.

Sure.

> In light of the potential of having the bar-code digitally signed -
> this
> implies a message digest as part of the data represented by the bar
> code.
> Such digests are typically at least 32 or 64 bits long. An ECC would,
> I
> believe, be best applied as the outermost wrapper, i.e. done after and
> appended after the message digest.

If we can get the globally unique ballot ID to fit in, say, 40 bits
(that's a trillion voters over an historical range). Maybe we can find
a 32 bit digest to use, which also seems sufficient (the threat model
makes it infeasible to spend CPU-years to forge one ballot, we don't
need SHA or similar).

So that's 72 bits overhead. Which might give us 78 bits of votes.
That's about 2.5 times the size of the demo ballot, assuming 150 bits
is the limit for easily scanned 1D barcodes. Does that look like
enough? Or will some be bigger? If we have 79 judicial retentions, for
example, we've definitely blown our limit.

Looking at it this way, compression is ABSOLUTELY no allowable. By
looking at the raw entropy of votes, ipso facto, some legal votes are
non-compressible.
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Fri Apr 30 23:17:28 2004

This archive was generated by hypermail 2.1.8 : Fri Apr 30 2004 - 23:17:29 CDT