Compression, encoding, entropy

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Tue May 04 2004 - 23:48:45 CDT

OK, new version of election-entropy.py. The latest incorporates
Arthur's idea for self-delimiting encoding, now that I've worked out
his description of the formula for optimal self-delimiting encoding
size. Unlike the globally optimal (even if obeying bit-boundaries),
the encoded size depends on the symbology used (specifically, the
alphabet size). The latest version of the script lets you (optionally)
specify an alphabet size. The script docstring gives its usage:

    USAGE: election-entropy.py [alphabet_size] [-verbose] <
election.summary

    Where the input data has lines of the form:
      contest-type slots [candidates] [# comment]

"Verbose" means that it prints out the encoded size of each contest,
not just a summary. If not specified, we assume a hexadecimal 16-char
encoding. Note that Code128 (our demo barcode) uses a 10-char
symbology (decimal digits), and PDF417 uses a 100-200 char alphabet.

For example:

   $ ./election-entropy.py 10 < demo-election.data
   Election summary for OVC demo ballot (write-ins count as candidate)

   269995136716800 distinct votes are possible
   Optimal encoding is approximately: 48 bits
   Contests at bit-boundaries, approx: 53 bits
   Self-delimited (10 char symbology): 88 bits

   $ ./election-entropy.py 200 < demo-election.data
   Election summary for OVC demo ballot (write-ins count as candidate)

   269995136716800 distinct votes are possible
   Optimal encoding is approximately: 48 bits
   Contests at bit-boundaries, approx: 53 bits
   Self-delimited (200 char symbology): 128 bits

See the beginning of this thread in the archive for a sample data file
(and write your own to match an historical election).

==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================

Received on Mon May 31 23:17:14 2004

This archive was generated by hypermail 2.1.8 : Mon May 31 2004 - 23:18:15 CDT