Initial thoughts on technical design

From: David Mertz, Ph_dot_D_dot_ <voting-project_at_gnosis_dot_cx>
Date: Wed Jul 30 2003 - 12:59:54 CDT

- - This is minor and premature: but I would assume that it is much
   easier to arrive at a bug-free final system in Python than in C/C++,
   as was raised in a couple notes. The alleged "auditability" gain for
   C++ is an illusion, since it seems to assume a priori that a C++
   compiler has fewer security/accuracy issues than a Python interpreter.
   But I did, after all, write a book about Python--so you can take this
   with a grain of salt.

- - More important: I concur with a suggestion upthread that XML should be
   used for all internal document formats, including ballot definition
   files, vote result records, and any other data formats needed. I do
   not have a great love for XML (despite or because of getting paid
   money to write about it), but it is indeed relatively human readable,
   and is parsable by a large set of well-tested tools.

- - I saw a suggestion that an XML document standard related to this was
   created by another organization, but was described by the poster as
   "complicated." Without having looked at specifics, I believe we
   should not take on additional complication in data formats simply to
   match a previously existing DTD.

- - I recommend (frequently) my own gnosis.xml.objectify package as an
   extremely simple and flexible library for manipulating XML internally
   within a Python application. Way better than DOM or SAX, for example.

- - Security issues should be considered carefully at all levels. But in
   particular, I believe we should try to minimize the disruption caused
   by damage to voting machines at each possible stage. I.e. a machine
   might crash and/or corrupt data before, during, or after the voting
   period; each scenario should be well understood.

- - I believe that ballots should contain authentication information to
   provide a detection mechanism if forged ballots are placed in ballot
   boxes (or saved on removable media or transmitted over wires). The
   same mechanism should be applicable to both the paper ballot and to
   the XML data file that contains the ballot content.

   The scheme I propose is as follows:

   1. When a machine is "initialized" at the beginning of the voting
      period, it generates a private key (priv) using a symmetric
      encryption algorithm like AES. This key is held internally within
      the machine, and is NOT accessible externally during the voting
      period.

   2. Each machine has a permanent plaintext serial number that is
      printed on paper ballots and stored in XML ballot files. The
      number could be printed on the front of the physical machine if
      desired--i.e. it is public information.

   3. Each time a voter accesses a machine to vote, a random session
      number is generated. This session number should NOT be tied to a
      timestamp in a way that allows reconstruction of voting times
      (since that might be correlatable with individual voters, and
      puncture anonymity).

   4. A voter selects a set of candidates and issues according to
      whatever interface is decided on. But basically, these selections
      are just a list of data values from a security perspective.

   5. Printed on the paper ballot and stored in the XML ballot document
      is the following information:

      MACHINE-ID
      SESSION-ID
      VOTE-SELECTIONS
      HASH = Hash( MACHINE-ID + SESSION-ID + VOTE-SELECTIONS )
      SIGNATURE = E{priv}(HASH)

   6. When the voting period ends, the machine is "finalized", which
      discloses the private key, priv. At this point priv is public
      information--it can be published in a newspaper or generally
      circulated. One private key is disclosed per machine per voting
      period.

   Suppose Mallory wishes to create false ballots. She can create a
   false set of VOTE-SELECTIONS (e.g. for her candidates). And Mallory
   can even create a plausible SESSION-ID (which will not be an actual
   one, presumably, however). MACHINE-ID is public, as is algorithm
   Hash() (e.g. SHA). However, Mallory is unable to generate SIGNATURE
   since she does not know the machine private key prior to close of the
   voting period.

   After voting closes, all voters know priv, and can themselves verify
   the integrity of a given ballot (either in paper or XML form).
   Obviously, after voting is closed, Mallory can then create false
   ballots; so measures to disallow the introduction of late (possibly
   false) ballots need be taken. Disclosure of the complete voting
   results at the moment the voting period closes is a transparent means
   to prevent later tampering with the vote set (and is perfectly
   feasable for the electronic/XML form of the ballot documents).

   There are some failure modes to consider. A machine may crash prior
   to the end of the voting period, but in such a way as to allow
   finalization (i.e. revalation of the private key used up to that
   point). Assuming additional/replacement machines are available, I
   believe this event has no security implications.

   A machine may crash prior to the end of the voting period in such a
   way as to prevent recovery of the private key. At this point the
   SIGNATURE value attached to each ballot (paper/XML) is of no value.
   However, the votes themselves are not altered. The stored hash
   still provides protection against simple data corruption of the
   ballot, but not against insertion of malicious ballots. At this
   point, election officials would need to decide whether the physical
   procedures in place were sufficient to allow counting these partially
   non-verifiable ballots (I do not know the precise legal and political
   concerns). However, even with such a crash, we are no worse off than
   with current paper-and-pencil ballots.
Received on Wed, 30 Jul 2003 13:59:54 -0400

This archive was generated by hypermail 2.1.8 : Wed Aug 06 2003 - 12:50:26 CDT