Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Mon May 03 2004 - 20:48:59 CDT

At 9:14 PM -0400 5/3/04, David Mertz wrote:
>Hi Arthur (or anyone who sees that I'm doing something wrong),
>I'm afraid that there still seems to be an some other error in your
>description of the length of optimal self-delimiting encodings, even
>after the correction of your substitution of 'n' for 'd'. Here's
>the latest code, as actually run:
>>>> from math import e, log, ceil, floor
>>>> def SD(n):
>... """Self-delimited data bit-length requirement
>... Let n be the highest number being encoded
>... Let d be the number of bits in a digit
>... Let b be the base (2**d)
>... Let m be one-half of the base
>... The number of digits, g = floor (log_m(n) +1)
>... The number of bits is d*g
>... """
>... d = 1/log(2) # 1.443 bits per base-e digit
>... m = e/2 # b == 2**(1/log(2)) == e
>... g = ceil(log(n)/log(m)) # log_m(n) == log(n)/log(m)
>... return d*g
>>>> SD(3)
>>>> SD(19)
>>>> SD(147)
>The variable 'g' does scale logarithmically, as desired. But The
>bit-lengths are longer than they should be according to your
>abstract (and according to my intuition about about how much info
>delimiting takes):
>>With 3 bits (it's octal, remember?), you can code up to 3 choices,
>>and with 6 bits you can code up to 19 choices. And 9 bits can code
>>up to 147 choices.
>It looks like I might be able to choose a different scaling factor
>'d' that gives the right answers. I considered that you might have
>meant "digits in a bit" (i.e. a fraction) rather than "bits in a
>digit", but that's only closer, not exactly right.
>So at this point I'm just guessing about the correct formula.

David, your definitions of d and m don't match mine.

d is an integer, probably in the range of 2 to 4. It's a parameter,
not some function of a log.
Please use an algorithm closer to my formula, and use other (unused)
variable names for your helper logarithms.

Best regards,

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Mon May 31 23:17:07 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT