Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Mon May 03 2004 - 03:57:09 CDT

At 12:50 AM -0400 5/3/04, David Mertz wrote:
>>>>>With 3 bits (it's octal, remember?), you can code up to 3
>>>>>choices, and with 6 bits you can code
>>>>>up to 19 choices. And 9 bits can code up to 147 choices.
>>>I haven't read the paper (I glanced at it when it was referenced
>>>some months ago). However, just based on the abstract, I can see
>>>that it's not an -optimal- encoding.
>
>On May 3, 2004, at 12:22 AM, Arthur Keller wrote:
>>There is an optimality proof for self-delimiting encodings
>
>Right... optimal *for self-delimiting encodings*
>
>Which is to say: NOT optimal among all encodings. Or specifically,
>an optimal (non-delimited) 3-bit code encodes 8 options, not just 3
>(and similarly for other bit lengths).

True enough.

>>>My hunch is that this isn't really important, since we can decide
>>>based on the type and number of options the bit length--and we
>>>HAVE the ballot-election.xml configuration in advance.
>>It allows for a uniform algorithm that is independent of the number
>>of possibilities. The advantage of such a scheme is that the
>>decoding is simpler when the encoding relies less on context.
>
>I tend to disagree here. But not so much that I couldn't perhaps be
>convinced otherwise.
>
>To my mind, requiring the encoding to EXPLICITLY rely on prior
>knowledge of the contest context acts as an additional cross-check
>on consistency between ballot template and cast ballot. ECCs do a
>better job with the integrity constraint you want than do
>self-delimiting contests.

It's a both - and, not an either - or.

>FWIW though, could you (Arthur) produce a little utility similar to
>election-entropy.py that would let members determine the exact
>bit-length requirement to encode ballots using optimal
>self-delimiting encodings? That would let us do a comparison of the
>info requirements of historical elections, comparing delimited and
>non-delimited optimal encoding styles.

I'm not a python programmer, so I can't. But the formula is:

Let n be the highest number being encoded
Let d be the number of bits in a digit
Let b be the base (2**n)
Let m be one-half of the base
The number of digits, g = floor (log_m(n) +1)
The number of bits is d*g
The efficiency in encoding compared with straight binary is:
(d*g)/floor(lg(n)+1)

>Another possible thing is to explicitly list where the contest
>breaks occur within the ballot, i.e. "contest 1: 3 bits; contest 2:
>7 bits; etc." I'm not sure exactly how much space that description
>would take (if encoded compactly). But the info would not normally
>need to be part of the BRP or BVA scan, it would just act as a
>validation during debugging or a challenge (i.e. it could go
>elsewhere on the ballot: another barcode or printed numbers or an
>annotation to the barcode).

That can be done with a length prefix. The problem is how long is a
length prefix is needed?

>>Relying on context is problematic and a potential source of errors.
>>Redundancy is helpful to reduce errors.
>
>I think the contest delimiter is a much less useful type of
>redundancy than are ECCs or election context. Or Karl's globally
>unique election identifier prefix.

I'm saying that it is easier to parse without context, and therefore
less bug prone than using context. Certainly ECCs (and digital
signatures) are useful as well, and context is a further check.
Remember, redundancy is your friend.

>>was it Steve Chessin who suggested placing phonemes (for
>>pronouncing the name) as part of each vote in the bar code, so the
>>BVA wouldn't need its own collection of sound files? I haven't
>>though enough about the implications of this idea.
>
>That's a neat idea.
>
>Certainly if we do that, we are nowhere near the 100, or even 150,
>bits we can fit in Code128. But something like that should fit in
>common 2-D codes.

That's correct. Personally, I like the idea of a fixed station under
which you place the ballot rather than a wand scanner type device.
If you place the ballot against a guide, you get easy registration.
And if you place a scanner above and below the ballot, you can read
the ballot in any orientation, although at a higher cost.

Best regards,
Arthur

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Mon May 31 23:17:03 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT