Re: Limitations of Many Eyes -- in the context of voting systems (fwd)

From: Danny Swarzman <danny_at_stowlake_dot_com>
Date: Tue May 20 2008 - 14:28:13 CDT

David Wagner discussed the limitations of source code review in his
report to SoS Debra Bowen last year. Of course, this finding doesn't
argue that closed source software is better but the contrary. Open
source means more eyes.

Nobody should trust software even if it is open source. That's why we
should have voter marked paper ballots that can be audited.


On May 20, 2008, at 8:29 AM, Richard C. Johnson wrote:

> Just a few comments on the implications of the cited study. Source
> and version control is very important; access to the code under
> development for voting systems obviously in need of control. One
> would also want to go beyond eyes only to compile, run, and analyse
> the code for code unresponsive under test. The obvious sign of
> suspicious code is simply that it lies hidden during test in actual
> use. Code that is not exercised under test is either there for some
> specific reason to be documented in comments and reviewed, or it may
> be excised.
> I am sure that the reviewers understood the "bugs" when given lines
> of code were pointed out. Eyes alone do not do the job. It is also
> people compiling, running, and analysing the code that makes a
> difference in Open Source. Finally, there is the matter of how
> the Open Source code came into being. Strict code control leaves a
> limited number of suspects for any detected "problem" and the chance
> of getting caught is a real limitation on skullduggery. I would
> think there is very little chance of catching a nasty inserted by a
> commercial vendor or employee. Open Source offers a far better
> opportunity to catch errors and malware by analysis as well as
> eyeballs.
> -- Dick
> Brian Behlendorf <> wrote:
> I'm forwarding the below email with David's permission. The email was
> sent to a list of folks interested in the intersection of security and
> open source. It's self explanatory, and many of you have probably
> already
> heard about the study. I think it's important, as advocates of open
> source voting systems, that we understand this study carefully;
> David's
> message sets it in an appropriate context.
> In my own opinion, instead of spending a lot of time and energy
> refuting
> the study (which could still have its flaws, and will be
> overgeneralized
> and spun by foes of open source voting), advocates of the security
> advantages of open source voting should instead acknowlege it as a
> reality
> of all software, and argue that open source still gives us the best
> shot
> of detecting exploits, rather than providing any sort of assurance
> that
> exploits could not exist because they would be quickly found. It
> should
> also reinforce the importance of differentiating between "published
> source" and true open source, as "published source" is still
> unlikely to
> get anywhere near the amount of scrutiny per line of code. It could
> also
> point out the importance of funding review efforts; open sourcing
> the tech
> doesn't cause code reviewers to appear, it only makes them possible.
> Most importantly, it should point out why publishing source code in
> any
> manner does not obviate the need for an auditable (and actually
> audited)
> voting process.
> Brian
> ---------- Forwarded message ----------
> Date: Sun, 18 May 2008 12:47:57 -0700 (PDT)
> From: David Wagner
> Subject: Re: [ROS] Limitations of Many Eyes -- in the context of
> voting systems
> > The original reason behind the research that led to the static
> analysis
> > tools MALPAS, SPADE and SPARK was to detect trojan code in UK
> Government
> > security software.
> >
> > If we could implement open source software using programming
> languages
> > that support sound static analysis of data flow and path
> expressions, it
> > would be far easier to check for trojans.
> I used to think so, too. But Ping Yee and I recently did an
> experiment whose results have raised serious doubt about this in
> my mind.
> In Ping's PhD dissertation, Ping came up with an alternative
> architecture
> for a voting machine that vastly reduces its trusted computing base
> (down
> to < 500 lines of simple Python code). We hypothesized that this would
> make it much easier for reviewers to find unintentional and
> intentional
> flaws in the software. We decided to test this hypothesis by inviting
> several security experts to review our code.
> As part of the experiment, we injected three artificial bugs into our
> code, to see if the reviewers would find them. These bugs were an
> attempt to simulate a malicious backdoor that could influence election
> results. We chose these bugs so they would be unlikely to be detected
> by ordinary pre-election testing. We crafted the bugs to offer a range
> of difficulty in finding them: one that we expected to be easy to
> find,
> one that we expected to be of medium difficulty to spot, and one that
> we expected to be difficult for a code reviewer to notice.
> We then ran a code review, with a total of 4 expert reviewers. They
> were
> broken down into 3 teams of 1-2 reviewers, each team working
> separately
> for one day to review the buggy code. At the start of the day we
> told them "there exists at least one bug in this code!" and told them
> that we had inserted an artificial bug (we didn't tell them that we
> had inserted more than one until they found their first bug). We even
> revealed that the bug was in a narrow subrange of 100 lines of code,
> to
> give them an advantage and narrow the focus of their review. In fact,
> all 3 articially inserted bugs were in that range. We then sat back to
> see whether they found the bugs.
> One team found both the "easy" bug and the "medium" bug, without much
> difficulty, but never found the "difficult" bug. Another team found
> the "easy" bug after about four hours, but never found the "medium" or
> "difficult" bugs. A third team found the "easy" bug after about two
> hours but never found the "medium" or "difficult" bugs. After a total
> of about 33 reviewer-hours, no one ever found the "difficult" bug.
> Of course, had we actually been malicious, we would have inserted just
> the "difficult-to-find" bug, and we would have gotten away with it.
> (The only reason we inserted that we also inserted the "easy" and
> "medium" bugs was because we were worried that they wouldn't find the
> "difficult" bug!) I should also point out that we came up with these
> malicious backdoors after only 2-3 hours of brainstorming -- we only
> came up with the idea of inserting artificial bugs the night before,
> so we were in a hurry.
> In short, in 2-3 hours we were able to come up with a malicious
> backdoor
> that 4 reviewers could spending a total of 33 reviewer-hours could not
> find, even with the information that there was a bug and with the side
> information that it could be found within a particular subrange of 100
> lines of code. This suggests that malicious backdoors can be very,
> very
> difficult to detect, even in situations carefully crafted to be the
> best
> possible for the defenders.
> Let's compare to current commercial practice. Typical code reviewing
> practice today for commercial code involves 1-2 reviewers reviewing
> code
> at a rate of 100-200 lines/hour. In comparison, in our experiment the
> reviewers reviewed code at a rate of about 14 lines/hour -- they had a
> lot more time, and we had more eyeballs, and the malicious backdoor
> was
> still not detected.
> Or, we can put this in context by looking at the cost. I estimate that
> it cost something like $10,000 to hire 4 reviewers with these
> qualifications
> to perform this kind of code review. But in this experiment we
> reviewed
> only 100 lines of code; real-world systems are much larger than that.
> Imagine, for example, a industrial system with 10,000 lines of code;
> that's pretty tiny by commercial standards, yet the experiment we ran
> suggests that one could spend $1 million and still not detect a
> malicious backdoor. (And this isn't even taking into account the fact
> that in larger systems there are probably more opportunities for even
> more subtle kinds of backdoors.)
> For me, the results of this experiment were eye-opening, because
> they violated my intuition and my expectations so dramatically.
> The conventional wisdom says that there is no reliable way to check
> for malicious backdoors in a large, complex software system. I had
> been hoping that the situation might be different if one followed
> sound
> engineering principles, kept the system simple, and minimized the size
> of the codebase. Alas, it didn't work out that way. This experiment
> suggests to me that there is no feasible, reliable way to check for
> malicious backdoors in software of any size -- not even for small,
> simple systems that have been thoughtfully designed, architected, and
> implemented with the malicious code problem in mind.
> More details about this experiment can be found in Chapter 8 of
> Ping's thesis ( Even more details
> can be found in his techreport:
> What about programming languages designed to support sound static
> analysis and formal program verification? Would they provide a
> reliable
> way to detect deliberately inserted backdoors and malicious logic? I
> don't know, but I'm not convinced yet.
> Our experiment wasn't designed to test that question. But, Ping
> put a lot of effort into manual verification of the code. He put
> together a 70-page assurance document, which was made available to the
> reviewers. This document contained his best attempt at a requirements
> analysis, a careful explanation of the threat model and the
> correctness
> claims he was making about his software, as well as rigorous
> analysis of
> each of those claims. He also translated those claims into invariants,
> preconditions, and postconditions on the code and performed a line-
> by-line
> analysis of the code, showing the pre- and post-conditions needed at
> each line to prove these claims. In other words, this analysis was
> along the lines of what an automated program verification tool might
> perform, except that Ping did it by hand. The assurance document is
> available here:
> Ping did use a programming language subset that was explicitly
> designed
> to support human code review and chosen with simplicity in mind
> (details
> in Chapter 3 of the assurance document), though sadly we didn't have
> automated static analysis tools.
> I don't believe that "sound static analysis of data flow and path
> expressions" would have made it any easier to find any of the three
> malicious backdoors we artificially inserted.
> But let me be more charitable, and imagine the best possible program
> verification and analysis tool I can plausibly envision (even though
> I think it's beyond the state of the art to build such a thing today,
> this is the most I can think to ask for). I think such a hypothetical
> tool would have fared as follows:
> * A good program verification tool could have detected the "easy"
> bug. It falls squarely into the category of the kind of bugs that
> formal verification is intended to find.
> * A strict static analysis tool might have helped to detect the
> "medium" bug, though this is debateable -- I could argue it
> either way.
> * Hard to say, but I think formal verification would not have
> made a huge difference in the difficulty of detecting the "difficult"
> bug (I suspect developing a spec that would be sufficient to detect
> the bug is as hard as finding the bug -- the hard part is arguably
> recognizing the need for this particular special-case requirement,
> and once you realize that, I suspect you'll have a decent chance of
> finding the bug either way).
> One challenge here is that such tools require specifications. If
> we are assuming the possibility of malicious logic in the program, we
> also have to assume the possibility of malicious logic in the
> specification. Since formal verification tools can, at best, only
> verify the consistency of the program and the specification, a
> matching
> bug in both the program and the spec can evade detection. And checking
> detailed, program-level specs for malicious logic does not seem likely
> to be appreciably easier than checking programs for malicious logic.
> But that's not even the worst of the it. The worst of it is that
> we have to assume that a malicious insider will know that formal
> verification is being used and will craft the bug to minimize the
> chances that the formal verification tool will detect the bug. (After
> all, the malicious insider can run the formal verification tools
> himself
> and check whether his bug will be detected, and he can keep trying
> until
> he finds a bug that is unlikely to be detected by that tool.) In
> other words, we have to assume that the attacker will adapt to the
> defenses you put in place. If there is any chink in the armor, we
> have to assume the attacker will find it. That makes this problem
> challenging, even if we assume major advances in program verification.
> So for me, I've gradually come to the hypothesis that detecting
> malicious backdoors is really hard, even in the best possible case.
> I don't know how to do it with any confidence or assurance, for
> any non-trivial system, no matter how well-designed or implemented.
> Sounds like a difficult open problem to me.
> -- David
> _______________________________________________
> OVC-discuss mailing list
> By sending email to the OVC-discuss list, you thereby agree to
> release the content of your posts to the Public Domain--with the
> exception of copyrighted material quoted according to fair use,
> including publicly archiving at
> _______________________________________________
> OVC-discuss mailing list
> By sending email to the OVC-discuss list, you thereby agree to
> release the content of your posts to the Public Domain--with the
> exception of copyrighted material quoted according to fair use,
> including publicly archiving at

OVC-discuss mailing list
By sending email to the OVC-discuss list, you thereby agree to release the content of your posts to the Public Domain--with the exception of copyrighted material quoted according to fair use, including publicly archiving at

= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Sat May 31 16:17:03 2008

This archive was generated by hypermail 2.1.8 : Sat May 31 2008 - 16:17:04 CDT