Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Mon May 03 2004 - 20:48:59 CDT

At 9:14 PM -0400 5/3/04, David Mertz wrote:
>Hi Arthur (or anyone who sees that I'm doing something wrong),
>
>I'm afraid that there still seems to be an some other error in your
>description of the length of optimal self-delimiting encodings, even
>after the correction of your substitution of 'n' for 'd'. Here's
>the latest code, as actually run:
>
>>>> from math import e, log, ceil, floor
>>>> def SD(n):
>... """Self-delimited data bit-length requirement
>... Let n be the highest number being encoded
>... Let d be the number of bits in a digit
>... Let b be the base (2**d)
>... Let m be one-half of the base
>... The number of digits, g = floor (log_m(n) +1)
>... The number of bits is d*g
>... """
>... d = 1/log(2) # 1.443 bits per base-e digit
>... m = e/2 # b == 2**(1/log(2)) == e
>... g = ceil(log(n)/log(m)) # log_m(n) == log(n)/log(m)
>... return d*g
>...
>>>> SD(3)
>5.7707801635558535
>>>> SD(19)
>14.426950408889635
>>>> SD(147)
>24.525815695112378
>
>The variable 'g' does scale logarithmically, as desired. But The
>bit-lengths are longer than they should be according to your
>abstract (and according to my intuition about about how much info
>delimiting takes):
>
>>With 3 bits (it's octal, remember?), you can code up to 3 choices,
>>and with 6 bits you can code up to 19 choices. And 9 bits can code
>>up to 147 choices.
>
>It looks like I might be able to choose a different scaling factor
>'d' that gives the right answers. I considered that you might have
>meant "digits in a bit" (i.e. a fraction) rather than "bits in a
>digit", but that's only closer, not exactly right.
>
>So at this point I'm just guessing about the correct formula.

David, your definitions of d and m don't match mine.

d is an integer, probably in the range of 2 to 4. It's a parameter,
not some function of a log.
Please use an algorithm closer to my formula, and use other (unused)
variable names for your helper logarithms.

Best regards,
Arthur

-- 
-------------------------------------------------------------------------------
Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
==================================================================
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
==================================================================
Received on Mon May 31 23:17:07 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT