Re: Compression, encoding, entropy

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Sun May 02 2004 - 23:50:15 CDT

>>>> With 3 bits (it's octal, remember?), you can code up to 3 choices,
>>>> and with 6 bits you can code
>>>> up to 19 choices. And 9 bits can code up to 147 choices.
>> I haven't read the paper (I glanced at it when it was referenced some
>> months ago). However, just based on the abstract, I can see that
>> it's not an -optimal- encoding.

On May 3, 2004, at 12:22 AM, Arthur Keller wrote:
> There is an optimality proof for self-delimiting encodings

Right... optimal *for self-delimiting encodings*

Which is to say: NOT optimal among all encodings. Or specifically, an
optimal (non-delimited) 3-bit code encodes 8 options, not just 3 (and
similarly for other bit lengths).

>> My hunch is that this isn't really important, since we can decide
>> based on the type and number of options the bit length--and we HAVE
>> the ballot-election.xml configuration in advance.
> It allows for a uniform algorithm that is independent of the number of
> possibilities. The advantage of such a scheme is that the decoding is
> simpler when the encoding relies less on context.

I tend to disagree here. But not so much that I couldn't perhaps be
convinced otherwise.

To my mind, requiring the encoding to EXPLICITLY rely on prior
knowledge of the contest context acts as an additional cross-check on
consistency between ballot template and cast ballot. ECCs do a better
job with the integrity constraint you want than do self-delimiting
contests.

FWIW though, could you (Arthur) produce a little utility similar to
election-entropy.py that would let members determine the exact
bit-length requirement to encode ballots using optimal self-delimiting
encodings? That would let us do a comparison of the info requirements
of historical elections, comparing delimited and non-delimited optimal
encoding styles.

Another possible thing is to explicitly list where the contest breaks
occur within the ballot, i.e. "contest 1: 3 bits; contest 2: 7 bits;
etc." I'm not sure exactly how much space that description would take
(if encoded compactly). But the info would not normally need to be
part of the BRP or BVA scan, it would just act as a validation during
debugging or a challenge (i.e. it could go elsewhere on the ballot:
another barcode or printed numbers or an annotation to the barcode).

> Relying on context is problematic and a potential source of errors.
> Redundancy is helpful to reduce errors.

I think the contest delimiter is a much less useful type of redundancy
than are ECCs or election context. Or Karl's globally unique election
identifier prefix.

> was it Steve Chessin who suggested placing phonemes (for pronouncing
> the name) as part of each vote in the bar code, so the BVA wouldn't
> need its own collection of sound files? I haven't though enough about
> the implications of this idea.

That's a neat idea.

Certainly if we do that, we are nowhere near the 100, or even 150, bits
we can fit in Code128. But something like that should fit in common
2-D codes.
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Mon May 31 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT