Re: Compression, encoding, entropy

From: Arthur Keller <arthur_at_kellers_dot_org>
Date: Tue May 04 2004 - 01:56:51 CDT

At 10:13 PM -0400 5/3/04, David Mertz wrote:
>On May 3, 2004, at 9:48 PM, Arthur Keller wrote:
>>>>>> from math import e, log, ceil, floor
>>>>>> def SD(n):
>>>... """Self-delimited data bit-length requirement
>>>... Let n be the highest number being encoded
>>>... Let d be the number of bits in a digit
>>>... Let b be the base (2**d)
>>>... Let m be one-half of the base
>>>... The number of digits, g = floor (log_m(n) +1)
>>>... The number of bits is d*g
>>>... """
>>>... d = 1/log(2) # 1.443 bits per base-e digit
>>>... m = e/2 # b == 2**(1/log(2)) == e
>>>... g = ceil(log(n)/log(m)) # log_m(n) == log(n)/log(m)
>>>... return d*g
>>David, your definitions of d and m don't match mine.
>I'm just trying to program your description. But apparently I
>haven't managed to understand what it is you are describing yet.
>Does anyone else, who might help me?

You can code in hexadecimal (4 bits, base 16), octal (3 bits, base
8), or quad (2 bits, base 4). d is the number of bits in your
digital encoding.

m is (2**d)/2, or if you prefer 2**(d-1). It has absolutely nothing
to do with e.

>>d is an integer, probably in the range of 2 to 4.
>>It's a parameter, not some function of a log.
>I don't understand that. When I read "number of bits in a digit"
>the only thing that comes to my mind is the number of bits of
>information that it takes to represent a digit in a given base.
>I.e. 1.443 (approx) for base e, or 3.324 for base 10.

I'm not talking about base e. You introduced that because of some
programming language limitation.

The encoding is digital in some arbitrary base. That base, b, is a
power of two, and is represented by d bits in a binary encoding. For
example, base 4 has digits (0, 1, 2, 3) represented by two
consecutive bits (00, 01, 10, 11). b is 4, d is 2.

>I'm not trying to be difficult here, but I have no idea what you
>mean by 'd' from your description. Why is it probably in the range
>2 to 4? How would we know what it was?

There are 2 bits in base 4, 3 bits in base 8, and 4 bits in base 16.

>>Please use an algorithm closer to my formula, and use other
>>(unused) variable names for your helper logarithms.
>I wish I knew how. I've tried my best to use your identical names.
>What ARE your formulae?

I can't tell from your program where m gets a value that is an
*integer*. The value of m is 2 for base 4, 4 for base 8, and 8 for
base 16.

Also my formula doesn't have a "ceil" but instead has a "floor" function.

>Does someone else on this list understand what Arthur is trying to
>describe? I'm guessing that whatever it is only take a few lines of
>code to implement... but I can't figure out which lines those might

So as far as I can tell, your algorithm has only a passing resemblance to mine.

Every single one of my variables is a positive integer.

Please try again.

Best regards,

Arthur M. Keller, Ph.D., 3881 Corina Way, Palo Alto, CA  94303-4507
tel +1(650)424-0202, fax +1(650)424-0424
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Mon May 31 23:17:08 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT