Data format and compression (was Re: Bar code choice)

From: Edward Cherlin <edward_dot_cherlin_at_etssg_dot_com>
Date: Sat Sep 13 2003 - 13:58:29 CDT

On Saturday 13 September 2003 11:00 am, wrote:
> The ballot number is supposed to be a 4-digit number. I
> mentioned that it could be represented with two symbols. If
> I'm understanding you correctly, we will have 95 symbols ...
> 95*95 = 9025 possible ballot numbers.

So we can't represent all 10,000 4-digit numbers that way.

> We will want more for
> the production system but for the demo this is plenty. If
> we're not going to use compression, I'd rather save the extra
> character from the barcode.

This is all premature optimization. Why are we trying to save two
bytes in a string of still unknown length, and in the face of
our intention to compress the results?

> For the production system, we will definitely want to use
> compression since it's possible to have far more than 116
> bits. Doug Jones shows us 228 and 235 position punch card
> ballots. There is even a 312 position ballot there.

What do punch cards have to do with our demo? We only need to
represent the numeric codes for the voter's chosen candidates,
and a set of binary choices on yes-or-no ballot measures. It
makes no difference how it used to be encoded in some other

> I still would not want to rule out compression for the demo.

Compression is trivial. You can use free code for a variety of
compression algorithms, or call a standard compression utitily
(zip, bzip, gzip, etc.)

> With your scheme, you still have to come up with a way to do
> long integer math within Python to convert the 116 digit
> binary number to a 35 (or less) digit decimal number.

Why decimal? You can convert to hex four bits at a time.

Anyway, I'm sure there is a long-integer library for Python. I
was working on long-integer math in APL in the 1970s. In binary,
division is just a matter of shift and subtract.

> It might
> not be that time consuming to write the compression algorithm
> we need (or maybe we can find one already done).

There are several.

> I would ask
> Arthur to do it but right now he is contributing to the
> overall project by working on an NSF grant proposal. If the
> choice is taking away time from the NSF proposal to work on
> the compression function, I'd rather have him stick to the NSF
> proposal.


> Alan D.

Edward Cherlin, Simputer Evangelist
Encore Technologies (S) Pte. Ltd.
Computers for all of us,
= The content of this message, with the exception of any external 
= quotations under fair use, are released to the Public Domain    
Received on Tue Sep 30 23:17:04 2003

This archive was generated by hypermail 2.1.8 : Tue Sep 30 2003 - 23:17:09 CDT