Compression, encoding, entropy

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Mon May 03 2004 - 20:14:15 CDT

Hi Arthur (or anyone who sees that I'm doing something wrong),

I'm afraid that there still seems to be an some other error in your
description of the length of optimal self-delimiting encodings, even
after the correction of your substitution of 'n' for 'd'. Here's the
latest code, as actually run:

>>> from math import e, log, ceil, floor
>>> def SD(n):
... """Self-delimited data bit-length requirement
... Let n be the highest number being encoded
... Let d be the number of bits in a digit
... Let b be the base (2**d)
... Let m be one-half of the base
... The number of digits, g = floor (log_m(n) +1)
... The number of bits is d*g
... """
... d = 1/log(2) # 1.443 bits per base-e digit
... m = e/2 # b == 2**(1/log(2)) == e
... g = ceil(log(n)/log(m)) # log_m(n) == log(n)/log(m)
... return d*g
>>> SD(3)
>>> SD(19)
>>> SD(147)

The variable 'g' does scale logarithmically, as desired. But The
bit-lengths are longer than they should be according to your abstract
(and according to my intuition about about how much info delimiting

> With 3 bits (it's octal, remember?), you can code up to 3 choices, and
> with 6 bits you can code up to 19 choices. And 9 bits can code up to
> 147 choices.

It looks like I might be able to choose a different scaling factor 'd'
that gives the right answers. I considered that you might have meant
"digits in a bit" (i.e. a fraction) rather than "bits in a digit", but
that's only closer, not exactly right.

So at this point I'm just guessing about the correct formula.
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Mon May 31 23:17:07 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT