Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Sun May 02 2004 - 23:22:13 CDT

At 6:23 PM -0400 5/2/04, David Mertz wrote:
>On May 2, 2004, at 4:03 PM, Alan Dechert wrote:
>>David, have you read Arthur's paper? If so, can you comment on it?
>>>With 3 bits (it's octal, remember?), you can code up to 3 choices,
>>>and with 6 bits you can code
>>>up to 19 choices. And 9 bits can code up to 147 choices.
>
>I haven't read the paper (I glanced at it when it was referenced
>some months ago). However, just based on the abstract, I can see
>that it's not an -optimal- encoding. It's still quite possible that
>it would be worth spending a few extra bits to make contests
>self-delimiting.

Please try again. There is an optimality proof for self-delimiting
encodings in there done jointly with Prof. Jeff Ullman.

>My hunch is that this isn't really important, since we can decide
>based on the type and number of options the bit length--and we HAVE
>the ballot-election.xml configuration in advance. Arthur's
>technique is more likely to be helpful when we encode a sequence of
>data whose complexity we don't know in advance of an individual
>datum. But I'll say that this question is open, in principle.

It allows for a uniform algorithm that is independent of the number
of possibilities. The advantage of such a scheme is that the
decoding is simpler when the encoding relies less on context.

>The script election-entropy.py that I posted gives the bits needed
>for optimal encoding of an arbitrary election. I posted that to let
>the group make some judgment on the barcode requirements. For
>example, the demo needs an optimal 53 bits (assuming bit-boundaries
>between contests). Any non-optimal encoding needs more bits.

Relying on context is problematic and a potential source of errors.
Redundancy is helpful to reduce errors.

>I'm not judging if those 53 bits are a lot or a little, that's just
>what it takes. I was hoping the people familiar with other
>historical elections could calculate their information needs. If
>pretty much all of them come out no more than about 80 bits, I'm
>pretty happy with Code128 as a default. If a bunch of people find
>elections that would need over 100 bits, 2-D starts to seem more
>likely. Keep in mind that ranked preference has MUCH more entropy
>than other contest types. Even a whole bunch of simpler contests
>won't add up to all that many bits.

was it Steve Chessin who suggested placing phonemes (for pronouncing
the name) as part of each vote in the bar code, so the BVA wouldn't
need its own collection of sound files? I haven't though enough
about the implications of this idea.

Coding by phonemes could enable the eventual creation of a voice
recognition voting system.

Best regards,
Arthur

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Mon May 31 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT