Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Tue May 04 2004 - 02:33:49 CDT

At 3:20 AM -0400 5/4/04, David Mertz wrote:
>On May 4, 2004, at 2:56 AM, Arthur Keller wrote:
>>You can code in hexadecimal (4 bits, base 16), octal (3 bits, base
>>8), or quad (2 bits, base 4). d is the number of bits in your
>>digital encoding.
>
>I managed to figure out in conversation with Charlie Strauss what
>Arthur was talking about. The missing part of Arthur's description
>is that he is talking about the symbology of a particular barcode.
>I.e. one barcode has a 16-letter alphabet, another has a 4-letter
>alphabet, etc.
>
>So Arthur's "base" is the size of the barcode symbol set.
>Unfortunately, this doesn't allow us to specify in a general way
>what the bit-length required for an optimal self-delimited encoding
>is. It's always relative to the particular alphabet we've already
>chosen. That is, it's of less use in deciding the parameters for
>what is an acceptable barcode to use.
>
>I kinda wish Arthur would have indicated that this was what he was
>talking about... but I guess since it was a hidden assumption in his
>own thinking, it seemed obvious to him (even though it wasn't stated
>on-thread).

I thought I did. I learned "base arithmetic" (like binary, base 4,
base 8, etc.) in grade school. What better terminology would you
have preferred?

>However, we can still produce an example, assuming a priori a
>particular symbology for our yet-undecided barcode. For example, if
>we assume our barcode uses a 16-letter alphabet, applying this to
>the demo election we get:
>
>$ ./election-entropy.py < demo-election.data
>Election summary for OVC demo ballot (write-ins count as candidate)
>
>269995136716800 distinct votes are possible
>Optimal encoding is approximately 48 bits
>Contests at bit-boundaries, approx 53 bits
>Contests self-delimited, approx 88 bits
>
>Vote Space Optimal Self-Delim
>---------- ------- ----------
> 9 4 8
> 9 4 8
> 4 2 4
> 4 2 4
> 5 3 4
> 4 2 4
> 5 3 4
> 4 2 4
> 3 2 4
> 3 2 4
> 3 2 4
> 176 8 12
> 109601 17 24
>
>To me the self-delimited case looks enough less efficient that
>optimal encoding to rule it out if we are still contemplating
>Code128. Especially since Code128 in decimal code includes some
>error-detection of its own. And even more so if we use Karl's
>globally unique identifier for ballots: that ID will include
>reference to the particular ballot template that the ballot refers
>to, i.e. the bit positions of each contest. So self-delimiting
>doesn't get us anything extra.
>
>Yours, David...

Thanks, please try a 3-bit (base 8) encoding, for your example. Thanks.

Best regards,
Arthur

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Mon May 31 23:17:09 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT