Re: Compression, encoding, entropy

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Sun May 02 2004 - 17:23:17 CDT

On May 2, 2004, at 4:03 PM, Alan Dechert wrote:
> David, have you read Arthur's paper? If so, can you comment on it?
>> With 3 bits (it's octal, remember?), you can code up to 3 choices,
>> and with 6 bits you can code
>> up to 19 choices. And 9 bits can code up to 147 choices.

I haven't read the paper (I glanced at it when it was referenced some
months ago). However, just based on the abstract, I can see that it's
not an -optimal- encoding. It's still quite possible that it would be
worth spending a few extra bits to make contests self-delimiting.

My hunch is that this isn't really important, since we can decide based
on the type and number of options the bit length--and we HAVE the
ballot-election.xml configuration in advance. Arthur's technique is
more likely to be helpful when we encode a sequence of data whose
complexity we don't know in advance of an individual datum. But I'll
say that this question is open, in principle.

The script election-entropy.py that I posted gives the bits needed for
optimal encoding of an arbitrary election. I posted that to let the
group make some judgment on the barcode requirements. For example, the
demo needs an optimal 53 bits (assuming bit-boundaries between
contests). Any non-optimal encoding needs more bits.

I'm not judging if those 53 bits are a lot or a little, that's just
what it takes. I was hoping the people familiar with other historical
elections could calculate their information needs. If pretty much all
of them come out no more than about 80 bits, I'm pretty happy with
Code128 as a default. If a bunch of people find elections that would
need over 100 bits, 2-D starts to seem more likely. Keep in mind that
ranked preference has MUCH more entropy than other contest types. Even
a whole bunch of simpler contests won't add up to all that many bits.

Yours, David...
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Mon May 31 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT