Re: Fwd: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Mon May 03 2004 - 15:52:06 CDT

At 1:16 PM -0400 5/3/04, David Mertz wrote:
>>On May 3, 2004, at 4:57 AM, Arthur Keller wrote:
>>>I'm not a python programmer, so I can't. But the formula is:
>>Arthur: Does this look right to you? If so, I'll post the source
>>code. But if not, I must have made a programming error in
>>implementing your formulas:
>>Vote Space Optimal Self-Delim
>>---------- ------- ----------
>> 3 2 3
>> 3 2 3
>> 176 8 15
>> 109601 17 37
>As I stared at it, it started to look wrong that it should take so
>many bits to self-delimit contests. So I figured I did something
>wrong. Naturally, I did. However, looking at the formulae again, I
>think they must be wrong too:
>>Let n be the highest number being encoded [DQM: "Vote Space"]
>>Let d be the number of bits in a digit
>>Let b be the base (2**n)
>>Let m be one-half of the base
>>The number of digits, g = floor (log_m(n) +1)
>>The number of bits is d*g
>>The efficiency in encoding compared with straight binary is:
>The term 'g' very quickly converges to 1, i.e. for n >= 3.

g doesn't converse to 1. It grows with the log of n.

>Assuming you mean 'b' by "the base"

'b' is the base, like base 8 arithmetic is encoded in 3 bits, d=3.
Try varying 'd' from 2 to 4.

> m is larger than n for almost all n's

m is larger than n only for small n's.

>And therefore log_m(n) is always less than one (for n>2). And
>therefore g is exactly equal to one.

For d=2, b=4, m=2, the encoding sequence is:


So for n=5, 4 bits are needed. The optimal is 3 with a schema.

>Since d is constant, d*g is also constant. Which would mean that
>every contest gets encoded in the same number of bits. And at that
>point I start to think something's wrong. :-)

'd' is a system-wide parameter. It isn't constant. Try varying 'd'
from 2 to 4.

Best regards,

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Mon May 31 23:17:05 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT