# Re: approximate solutions

From: Kathy Dopp <kathy_dot_dopp_at_gmail_dot_com>
Date: Sat Jul 22 2006 - 17:54:57 CDT

On 7/21/06, laird popkin <lairdp@gmail.com> wrote:
> I'd agree that it would be much better to present the required
> sample size given the various inputs as a formula (if possible, and it seems
> to me that it ought to be) because it's much easier to analyze, and thus
> trust, a formula than a computer program.

I agree completely, but we seem unable to solve it. If you'd like to
try, I'll send the equation to you along with the farthest along that
we got in trying some approaches to solve it, including Jerry
Lobdill's approach.

Frank has now written a maple program to solve it that he says will
work better than the computer algorithm I suggested earlier, but I
don't have maple on my computer and therefore I have not checked its
results yet. Would you or anyone want to help check his maple program
results agains the results in the trial-and-error spreadsheet I
created?

> You could do a reasonable job of a basic
> sensitivity analysis with only four graphs (e.g. "typical values for three
> inputs, then chart the fourth input against the confidence level",

The confidence level (the desired probability of detecting one or more
miscounts) is one of the inputs, the sample size the output.

We could pick a desired probability level (say 95%), pick a constant
maximum % of margin shifted on one machine (say 30% = 2*15% votes
shifted), pick typical values for the number of total vote counts in a
county (one for each of your four charts) and plot the resulting
sample size needed against the smallest margin between candidates.

> it's enough to answer questions like "how do my assumptions
> of the maximum rate of margin-shifting per machine affect the required vote
> count audit sample size?"

Good question. The bigger the maximum rate of margin-shifting, the
bigger the number of vote counts which must be audited, but it would
be necessary to show this to novices. I say the Brennan Center
estimate of 30% max margin-shifting is a good starting point.

Would you or anyone on this list want to write up a joint paper on
this topic with me? We could possibly release it under the banner of
both the Open Voting Consortium (or Foundation, whichever is most
appropriate legally) and the National Election Data Archive.

> To be more thorough, such as for a deeper academic analysis, you might want
> to show all combinations of three values per input (min, mean and max, for
> example). That would be 108 charts (3*3*3*4), which nobody would ever read.
> But you could put three lines on each chart (i.e. the lines for the
> min/mean/max of one of the inputs, or "family of curves" you mentioned) and
> that would get you down to 36 charts, which is (IMO) manageable for a
> detailed, academic analysis.

I like your idea of trying to put three curves for three values for
one of the other inputs on each chart, but still not sure how the 4
got into the calcs, rather than 3^4 and 3^3.

> Hmm. I don't know whether than explanation of how the charting could be
> done is sufficiently clear. If you can send me the confidence formula, I can
> quickly generate a set of the charts as I've described, so that you can see
> what I'm talking about. If the "formula" is a computer program, this might
> be tricker, depending on the program's complexity and what it's written in.

I can send you:

1. the as-yet unsolved equation;

2. the spreadsheet where trial and error can quickly hone in on the
correct sample sizes; plus

3. Franks' maple program which he says is short for calculating s
from N,C, and P where:

N = total # of vote counts

C = minumum corrupt number of precincts = (smallest candidate
margin)/(maximum margin shift per machine)

P = desired probability of detecting one or more corrupted precincts.

What would you like? Feel free to call me at 435-658-4657

> On 7/22/06, Kathy Dopp <kathy.dopp@gmail.com> wrote:
> > Actually the graphs and tables are still quite useful
> > for checking and cutting down orders of magnitude
> > errors.
> > While it's great to get precise solutions with
> > today's calculators and computers, it's much easier to
> > display how things relate to the variables with a
> > graphical approximation.
>
> Ed and Jerry,
>
> I love graphs, but there are four independent variables (election
> margin, total #vote counts, desired probability, and the assumed
> maximum rate of margin-shifting per machine) and one dependent
> variable - the vote count audit sample size. We could make graphs by
> holding three of the independent variables fixed and vary one of the
> independent variables for each graph, but you would end up with LOTS
> of graphs. The three variables with the fewest real life values for
> them that would be needed, would be assumed maximum margin
> shift/machine, probability and candidate margin, so taking candidate
> margins from 1% to say 15% and probabilities of say 90% to 98%, there
> could be 15*8 =120 charts that election officials could use to look up
> how many vote counts to select for audit if they have N vote counts in
> their county - assuming we give them only one option for max vote
> counts.
>
> I think that you are suggesting that we could simply do something
> like have 15 charts for the 15 margins of interest (assuming that all
> races in a county are audited with the same sample size determined by
> the race with the smallest margin -- I don't know enough yet to
> conclude that would be the case with optical scan ballots that you
> could sort to count, but I would assume that would have to be the best
> approach with stupid DRE paper rolls) each with nine curves on them
> for the nine probabilities they may want.
>
> To do a family of curves for a family of charts to make it easy to
> look up values for vote count auditing would require some
> decision-making re. the range of values to use to create charts and
> curves for.
>
> I agree with you that presenting a book of charts and/or tables for
> people to use to calculate vote count audit margins is a good idea
> providing the "family of curves" for each desired probability or
> candidate vote count margin don't overlap each other illegibly because
> then say 120 charts would be needed rather than 15 - but regardless,
> this is a big project that a computer would need to generate.
>
> So we still need a program or spreadsheet that will generate the
> charts for us, whether from an exact formula if we obtain one, or from
> an algorithm.
>
> A book of tables and charts may be more likely to be used than a
> spreadsheet that required trial and error or even a computer program
> that let users input the four independent vars and found the exact
> answer - is that what you're saying?
>
> Thanks.
>
> Kathy
>
> "Enlighten the people generally, and tyranny and oppressions of body
> and mind will vanish like evil spirits at the dawn of day," wrote
> Thomas Jefferson in 1816
Kathy Dopp
http://electionarchive.org
National Election Data Archive
Dedicated to Accurately Counting Elections
