From: Charlie Strauss <cems_at_earthlink_dot_net>

Date: Wed Nov 22 2006 - 01:55:03 CST

Date: Wed Nov 22 2006 - 01:55:03 CST

On Nov 21, 2006, at 8:30 PM, Ron Crane wrote:

*> charlie strauss wrote...
*

*>>
*

*>>> 1/4 was chosen so that we could treat the tar machines as though
*

*>>> they were randomly chosen without excessively dilluting the
*

*>>> statistical power of the random sample. That is it will introduce
*

*>>> a slight bias but not one big enough to argue over. Most of the
*

*>>> time it weel serves It's main purpose: to satisfy the the
*

*>>> candidates that anomlies were simply anomlous not errors.
*

*>>>
*

*>>> in typical elections the number of recounted machines in NM will
*

*>>> be small: if the election margin between 2 candidates was 2% then
*

*>>> to achieve 90% confidence all of NM would have to count 33 machines.
*

*>> Which formula did you use to calculate this number?
*

*>>
*

*>> ANSWER: Don't worry. Don't have the formula off hand, but this
*

*>> is pretty easy mathematics. (By the way the original calculation
*

*>> was done by people who compute reliability statistics for Nuclear
*

*>> weapons assurance)
*

*> This isn't sufficient. We are all here because we worry about these
*

*> things. I am quite sure that the appropriate formula is not "pretty
*

*> easy mathematics," since Kathy Dopp and others spent considerable
*

*> effort deriving it just this past summer, and, in the end, I
*

*> believe that they could not find a closed-form solution. Did you
*

*> use Dopp's spreadsheet?
*

*>
*

Well guess it's all relative. I don't see it as that complicated.

Maybe I'm missing something.

As I see it the axiomatic starting point is that there is some size

vote shift in a precinct that would be plainly wrong. It's an axiom

so it's not defensible from the context of the consequent

statistics. So pick a number maybe 15%, maybe 30%. That's the

ceiling on the vote shift that would pass by unquestioned on a given

machine. Once we have that it's all down hill. For a given election

margin you can compute how many machines would have to have a vote

shift. The worst cast for detection by sampling is if the fewest

number of machines are altering votes, so that means they would all

be at the maximum vote shift. So now we can simply computer how many

machines in the total population you would have to sample to have a

XXX % chance (say 90%) of your sample containing at least one of

those machines.

Now for this to be valid then all the machines in question need to

process the same number of votes. This is roughly true for precint

scan. But its grossly violated by absentee central scan. Rather than

deal with complex weighing of the statistics its easier just to batch

the absentee ballots into groups that are about the same size as a

precinct machine count. At this point the remaining fluctuations in

precinct sizes are small enough that the effect on the statistics is

second order. For the most part we can expect those will be damped

further if the sample size itself is not too small, so they will move

out to even higher order effects.

One could end the discussion there, since that's all one needs to

know if one is content to ignore the second order effects. Namely,

pick a maximal undetectable machine deviation, pick a desired

detection threshold and assume that all the precints are about the

same size. Then you can readily compute the number of machines to

audit out of the total number of machines for a given margin in a

contest.

However, I sense you perhaps do want to sweat the higher order

effects so I'll continue an exposition of my rationale a tad

further. First I can say that If we really cared about it, it's not

a lot of work to write a program to simulate sampling from a set of

variable sized precints to determine bootstrapped empirical

confidence limits for different sample sizes.

But that obfuscates the essential process with unnecessary

complications. These second order effects are much less important

that the defects in the primary data model itself. The glaring

problem is the axiomatic assumption of the worst case scenario that

would be considered undetectable: maybe 30% is better than 15%. If

it were 30% and we assumed 15% then the number of machines we need to

audit is much larger. Maybe one has a prior distribution for this

detection probability? okay let's use that. But then that's still

not enough since you then need a some sort of cost function to use

for your decision. And we've never stated one in the model.

As a relevant aside I note that one of hidden beauties of tossing in

a limited Targeted (TAR) selection is to helps us determine if 15% or

30% is the worst case bounds by cherry picking the machines that

other prior information outside the model tells us are the likely

worst cases. If we were clever enough to be able to codify that

prior information then presumably one could know how to write a nice

bayesian scheme for integration of the targeted and random recounts.

But were not that clever.

Other improperly modeled aspects are things like Correlated error

patterns and assymetric binomial statistics for deviations for

voter registration. Since we can expect the mechanisms of both fraud

and errors to exhibit more correlated deviations within a district

than between unrelated districts it would make some sense not to

choose randomly but to try to spread out the choices so they

maximally span the districts. We can also expect correlations

between races and geography. Even local weather, cultural

homogeneity, and any number of sociological factors will have

profound effects on the optimal pattern of choosing precincts to audit.

Which is simply to say that fretting the second order statistics when

the primary model is not correct anyhow just unnecessarily

complicates things (And as I said it's not that difficult bootstrap

or jacknife the second order numbers if one wants to insist on this )

Is this sufficient.

_______________________________________________

OVC-discuss mailing list

OVC-discuss@listman.sonic.net

http://lists.sonic.net/mailman/listinfo/ovc-discuss

==================================================================

= The content of this message, with the exception of any external

= quotations under fair use, are released to the Public Domain

==================================================================

Received on Thu Nov 30 23:17:10 2006

*
This archive was generated by hypermail 2.1.8
: Thu Nov 30 2006 - 23:17:19 CST
*