From: Alan Dechert <alan_at_openvotingconsortium_dot_org>

Date: Sat May 01 2004 - 16:23:44 CDT

Date: Sat May 01 2004 - 16:23:44 CDT

Karl,

*> Compression doesn't always compress - it's a gamble. Sometimes
*

*> compression actually results in more bits than the original.
*

*>
*

Right. But if you come up with a good scheme on compressible data, you

should have a good savings.

After discussing this some with David Mertz, it occurred to me I may just

not be speaking your language--and I fully admit to being less qualified

than many on this project to talk about some of these things.

Consider this:

Take a sequence of thirty consecutive integers--large or small. How many

bits or bytes do you think you'd need to represent the prime/composite

property of all 30 integers?

I got it down to one byte. I think of that as compression, although that

may or may not be the right term. I figured out that there are only 254

possible patterns of prime/composite integers for a range of 30 integers. I

mapped 8-bit characters to each of these patterns. You could store the

prime/composite sequence for any range of 1,000,000 integers in a 34k file.

(in case you're wondering why.... I wrote a Sieve of Eratosthenes benchmark

utility while testing dBASE. Ten years ago, I was sitting in a cube at

Borland trying to figure out why Foxpro 2.5 kicked dBASE 5.0's ass on this

benchmark. dBASE was faster at some things, but Foxpro kicked our ass on

other tests).

In case you're curious, here's the code http://www.go2zero.com/se-prg.txt

The Char2Bin array maps all possible sequences where 0 is prime and 1 is

composite. Even numbers are ignored since 2 is the only prime even number

so the pattern of 30 only needs 15 bits.

The selected/not-selected pattern on our ballot is also compressible since

only a small percentage of possible patterns of zeros and ones can represent

valid ballots

The possibilities for the first 7 bits are:

1000000

0100000

0010000

0001000

0000100

0000010

0000001

0000000

Currently, we indicate each bit one-by-one so we use 7 bits. However, there

are only 8 possible patterns for these 7 bits. These 8 patterns could be

represented by 3 bits if you had a way to map these patterns to 000 001 010

011 100 101 110 111. So what if we made a table like so:

000 = 1000000

001 = 0100000

010 = 0010000

011 = 0001000

100 = 0000100

101 = 0000010

110 = 0000001

111 = 0000000

In the case of the prime/composite sequence, I developed the mapping table

semi-manually. But what if we had an algorithm so that this mapping could

be determined programatically? The first 7 bits on the ballot could be

represented by 3 bits. I figured at the time that our 3-inch barcode could

be reduced to about 1.5 inches. After testing with the scanner, we found

the 3-inch barcode was fine so I dropped the idea. Besides, Doug brought up

the transparency issue.

This was all discussed on this list back in August. Arthur had a better

scheme, I believe. If there are ways to reduce it further, it might be

worth spending some time looking into that.

Alan D.

==================================================================

= The content of this message, with the exception of any external

= quotations under fair use, are released to the Public Domain

==================================================================

Received on Mon May 31 23:17:01 2004

*
This archive was generated by hypermail 2.1.8
: Mon May 31 2004 - 23:18:15 CDT
*