Re: [EILeg] [ei] Re: A 3-Step Audit Protocol w/ 99%confidence

From: charlie strauss <cems_at_earthlink_dot_net>
Date: Fri Jan 26 2007 - 16:10:41 CST

-----Original Message-----
>From: Ron Crane <>
>Sent: Jan 26, 2007 4:39 PM
>To: charlie strauss <>, Open Voting Consortium discussion list <>
>Subject: Re: [OVC-discuss] [EILeg] [ei] Re: A 3-Step Audit Protocol w/ 99%confidence
>A related problem with the audit schemes is that their assurance factors
>rely upon the assumption that each precinct has the same probability of
>being miscounted.

Ron I think I follow what you say below, but your intial sentence I think can be easily miscontrued. That is it conflates two different phenomena.

First there is no assumption in the intended calcluation that each precint has the same probability of being miscounted. The intended compuation of how many precints to audit to achieve a certain statistical certaintly that a shift margin is above a specified level would be detected in the sample. If you don't agree with that then we need to hash this through as I'm fairly certain of what I'm saying.

Assuming that is what you really mean to say, Then there are two distinct conditions to consider
1) if the probability of a given machine (or whatever the integral counting unit is) is not uniform
2) if the machines have grossly different batch sizes (number of counts).

In the first case there is no problem at all. The aforementioned calculation even though one is selecting random samples uniformly still acheives the required certainty. end of story.

(Now if you happened to know something about how the non-uniformity was distributed you could in principle sample more cleverly and achieve an even tighter certainty. But you would never achieve a lower statistical certainty than the random sampled case.)

In the second case we come to why I use the word "intended" in my description above. When the machines are grossly differing in size the simple calculations being proffered as to the number of machines to randomly sample are wrong. There is a correct way to do this. I just don't know offhand what it is. That is there is nothing in principle wrong with the conceptual appraoch of sampling a certain number of machines to get a certain bound. But that certain number if a function of the machine size distribution. So far I beleive no one has actually written down or even tabulated that function.

While this is a stumbling block for writing down the procedure in general, in principle it can be worked out for any give machine size distribution. There are several ways to appraoch this
1) just assume the worst case scenario where all the machines have counts as large as the largest one. This cause the number of machines you need ot count to be an overestimate using the simple minded formulas.

2) you can squeeze down some of the overestimate by iteratating the calculation using a ranking of the number of machines. I suspect that this is what Ron is proposing below. It's still an over estimate of the actual number.

3) For a given machine size distribution perhaps the easiest thing to do monte-carlo sample it using a computer to get a estimate of the optimal way to sample.

In any case if the law is written to say that suffieint machines are sampled to assure that the confidence of a vote shift ecceeding the observed margin between the candidates is less than, say, 80%, then this is sufficient. All three of the above methods satisfy this. Some just do it more parcimoniusly.

Therefore I would not try to write the method of calcualtion into the laws. Just state the objective and leave it to the implementor to find the appropriate method of getting a satisfactory bound on the desired outcome.

At the end of the day the preciseness of the recount is much less important that the satisfaction of the people and candidates that the election was fair. As I pointed out in the last e-mail, something like a TAR would have far more improtance in gaining that confidence that mincing over exact confidence values.

>This is arguable with respect to accidental miscounts,
>such as those caused by using the wrong ballot description files. It is
>suspect with respect to fraud. Indeed the schemes seem partially to
>recognize this when they calculate the minimum number of precincts (M)
>needed to flip the election (i.e. sort from largest to smallest and
>reduce overall margin by each precinct's margin until remaining margin
>is <= 0). But then they throw all the precincts back in one bin and pick
>the audit candidates uniformly randomly. This means that, though fraud
>is probably significantly more likely in larger precincts (better yield
>per conspirator), such precincts are no more likely to be audited than
>much smaller precincts (= low yield per conspirator).
>This shouldn't be a problem when precinct size varies only slightly. But
>sometimes it doesn't. For example, in San Francisco's recent election,
>precinct size by registered voters (ignoring mail-in-only precincts)
>ranged from 249 (#1136) to 1134 (#3631).
>One way to approach this problem is to stratify the audit, always
>auditing the M largest precincts and selecting another N precincts to
>audit randomly. You could calculate N by using the existing schemes'
>sort method on a precinct list that excludes the M largest precincts.
>Another approach might be to weight a precinct's probability of
>selection based upon its size (either by number of RVs or number of
>ballots cast). But it isn't totally clear that this approach is
>mathematically valid, and it'd be difficult to choose the precincts
>without using computers.
>charlie strauss wrote:
>> I agree with arthur, this fudge factor is the achilles heel of the recount strategy. The good news however is two fold.
>> First, by asking the question in the right way "provide a sampling procedure that give 90% chance of discovering at least one fraudaudulent machine if existant, assuming that no machines are shifting more than Fudge%", then you have reduced an enormously slippery problem down to a single parameter we can argue over. (actually there are two parameters sort of).
>> Second, we already know how to fix the fudge factor problem. The solution is to provide a limited TAR to go along with the sampling.
>> Why do we need this TAR. Well the problem with the fudgefactor problem is we already know that any apparent vote shift--which can only be argued by statistical analysis of polls, registration data, and comarison to other voting modelities-- has proven fairly unconvincing. Witness Sarasota FL in the last election where it appears some precents shifted it his claimed by figures exceeding 25% in some reports. Yet the judge labeled it as insufficient speculation. And we are all familiar with the studies of ad nauseum of Ohio and Florida which use statistical evidence to claim large vote shifts in certain precints--yet no investigations results. Candidates seldom n challenge even bigger suspect vote shifts becuase they felt they lost in aggregate regardless of apparent shifts.
>> Thus this presumed "fudge factor" might be a lot larger than anyone is really comfortable with. Yet that creates an enourmous problem for recounts designs. If we were to set the fudge factor at some ridiculously high number list say a 75% vote shift as being an undeniable self evident situation that would be automatically recounted, then one computes a rather painfully stringent sampling rate. (possibly too high for practical value). When you combine this with the fact that precints vary in size it gets a bit worse (a 75% shift in a big precint is wore than a 75% shift in a tiny one).
>> Thus the solution here is to put the fudge factor at a small value. Then satisfy the candidates and voters with a TAR that can probe specificly contetest results not simply random sampling.
>> Randoms sampling + TAR lacks the achiles heal.
>> I don't believe the TAR needs to be very big either.
>> _______________________________________________
>> OVC-discuss mailing list

OVC-discuss mailing list
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Tue Jan 1 14:12:49 2008

This archive was generated by hypermail 2.1.8 : Tue Jan 01 2008 - 14:12:51 CST