Renewed anonymity concern in OVC design

From: David Mertz <voting-project_at_gnosis_dot_cx>
Date: Thu Jul 08 2004 - 12:35:48 CDT

A little bit jogged by Yoshi Kohno's good expression of anonymity
concerns, I'd like to raise what I now believe is a flaw in the OVC
design.

The OVC ballot displays a prominent ballot-ID on every ballot. This
number (currently a four digit number) is used to store EBIs, and to
correlate EBIs with REBIs/ballots. I am not happy with this design.

Specifically, a four digit number displayed in human-readable form is
something that a voter can easily remember or write down. Disclosure
of this ballot-ID can be used as part of a vote-buying or vote-coercion
attack; admittedly, such an attack requires a certain degree of
insider-collusion. However, such collusion might potentially occur at
many levels of canvassing, and the OVC design does not provide very
tight guarantees on preserving the secrecy of ballot-IDs during
canvassing.

Let's be very concrete here: The guys with the brass knuckles come to
my house, and tell me that I better vote to re-elect Mayor Dechert (to
pick an arbitrary name :-)). To prove my vote, they say, I should tell
them my ballot-ID after I leave the polling place. Perhaps I wonder
how they will know what vote matches that ballot-ID, but they assure me
that it's not my business to worry about that issue. When I reveal my
ballot-ID of 4567, collaborators at the polling-place (or county, etc)
check off that 4567 really contains a vote for Dechert. They might do
this in comparison with the paper ballots, or they might do it
electronically by looking at the EBI-4567.xml file.

Problem mentioned, here's what I would suggest instead. My solution is
more tentative than is my identification of the problem. Other ideas
are welcomed. Revised design:

1. No ballot IDs associated with votes.

2. When preferences are selected, generate a hash of the exact choices
made, using MD5 or similar.
   a. You cannot re-create a vote based on the hash only
   b. You probably *can* do an exhaustive search of all the votes
      (excluding write-ins) in order to find a ballot that *would*
      hash to the calculated value.
   c. If (b) is a problem, perhaps use some sort of seed, maybe
      based on a timestamp that is not itself stored.

3. Store the EBI as name EBI-<md5sum>.xml or similar (where the value
'<md5sum>' is said hash, not the literal string I write: e.g.
EBI-2F29A39C.xml)

4. Some voters may vote identically to each other.
    a. The EBIs can include an extra sequence number or random
       padding to disambiguate filenames.
    b. Given two paper ballots with identical votes, the hashes
       will be identical.
    c. You cannot determine which EBI corresponds to which of
       two identical paper ballots. However, you *can* determine
       that exactly N voters voted in that exact manner, and then
       expect exactly N EBIs containing those votes (per precinct
       or whatever).

In our paper, Arthur added a concern about ballot-IDs not being re-used
between machines. I never actually cared about that matter, neither
from a security or auditability perspective. But it wasn't wroth
arguing for the paper deadline. To me, knowing that the ballot-ID 1234
was used by two machines, and each such EBI should correspond to
exactly one of the two paper ballots with that ID doesn't seem
permutationally difficult to sort out. Even if the ballots had the
same votes, no biggie: two people voted for the same candidates. If
the two 1234 ballots contain different votes, comparing those votes to
each of two (or even of 10) EBIs is simple for computers, and not hard
for eyeballs.

The fact my design may contain multiple indistinguishable ballots is
similar. It's not a big deal. If five people voted in a specific
pattern, we still know that it should not be four voters, and should
not be six voters.
==================================================================
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
==================================================================
Received on Sat Jul 31 23:17:12 2004

This archive was generated by hypermail 2.1.8 : Sat Jul 31 2004 - 23:17:15 CDT