XML MATTERS #38: OASIS Election Markup Language Standardization of XML Formats for Voting and Elections David Mertz, Ph.D. Bean Counter, Gnosis Software, Inc. September 2004 The Organization for the Advancement of Structured Information Standards (OASIS) has developed many XML standards in use within government, law, and business. The Election Markup Language is OASIS' foray into to world of elections--with special attention to voting within governmental jurisdictions. In this installment, David gives readers an introductory look at the structure and purpose of EML, with an eye toward how the largely European EML will shape future data standards used in the United States. XML AND VOTING SYSTEMS ------------------------------------------------------------------------ Readers of my prior _XML Matters_ installment on the use of XML in an open source voting machine will recognize my motivation for investigating the OASIS standard for Election Markup Language (EML). Actually, my direct interest is further piqued by my recent membership in the still-fledgling IEEE Project 1622 (Voting Systems Electronic Data Interchange). OASIS' EML covers quite a bit more ground than Open Voting Consortium's narrow demo system, or even than is anticipated for P-1622. Specifically, EML is intended to be rich enough to accommodate governmental elections across many jurisdication levels, and also elections with many different kinds of organizations (community or corporate, for example); to allow voting over many channels, both traditional voting booths (perhaps electronic ones) and remote systems like Web pages, telephone voting, kiosks, and so on; to enable many tabulation and voting rules, such as ranked preference and cummulative voting; to handle security, encryption and authentication requirements; and also to record and convey information about voter registration, organization membership, and other voter metadata. EML has seen significant real world use in European government, and in some non-governmental organizations worldwide. EML, in my opinion, suffers somewhat (but not outrageously) from an over-engineering common among XML technologies (think SOAP, W3C XML Schemas, or even XSLT). Committees have a tendency to produce standards with too many details, handling too many corner cases centrally, and with too many levels of indirection. Of course, having joined another standards committee myself, I suppose I will myself soon be guilty of participating in feature creep. Nonetheless, our tentative plan in IEEE P-1622 is to start with a simpler data model provided by a commercial election system vendor (but released on non-proprietary terms), rather than adopt EML whole cloth towards standardization of USAian elections data. Our target in P-1622 is only to accommodate the needs of governmental elections, not every possible voting scenario; moreover, the fifty-some US states and territories have somewhat less procedural variation than do the 45 member nations in the Council of Europe (for example). Nonetheless, the fact that we already have several other contributed data models to reconcile into the final design already makes for a nascent featuritis. WHAT DOES EML INCLUDE? ------------------------------------------------------------------------ To get a sense of the scope of EML version 3.0, let me quote from the Executive Summary to the standard: The primary deliverable of the committee \[is] the Election Markup Language (EML). This is a set of data and message definitions described as XML schemas. At present EML includes specifications for: * Candidate Nomination, Response to Nomination and Approved Candidate Lists * Voter Registration information, including eligible voter lists * Various communications between voters and election officials, such polling information, election notices, etc. * Logical Ballot information (races, contests, candidates, etc.) * Voter Authentication * Vote Casting and Vote Confirmation * Election counts and results * Audit information pertinent to some of the other defined data and interfaces There are a good number of distinct data requirements addressed by the various aspects of EML. The schemas associated with the logical aspects of an election process are given numeric prefixes to indicate general category. So the 400 series schemas are associated voting as such; the 500 series are associated with tabulation (called -canvassing- in American terminology); the 100 series with an overall election specification; the 200 series with candidates; the 300 series with voters (eligibility, etc). Within each schema series, one or more W3C XML Schemas are provided to describe documents filling those requirements. As well as the numbered schema families, EML contains a collection of supporting schemas, mainly dealing with common datatypes. For example, most or all include the schema 'emlcore.xsd' (in some cases indirectly via some other include). Such a schema will have a line like: #---------- Include line for EML core datatypes -----------------# The EML core, in turn includes 'emlexternals.xsd' and imports 'emltimestamp.xsd' and the W3C's 'xmldsig-core-schema.xsd'. I have not listed everything incorporated, but it shows the style. The lines to include or import the mentioned schemas are: #------------ External resources used by emlcore.xsd --------------# So far, so good; let us look deeper. The schema 'emlexternals.xsd' only defines formats for addresses and personal details about voting-eligible citizens. But my feeling is the includes are structured as they are with an eye to expanding the element and type definitions within 'emlexternals.xsd' when or if the need arises. In the main, 'emlexternals.xsd' does its work with yet more includes: #----- Citizen info datatypes imported to emlexternals.xsd ------# Of course, once you follow the path still further, into 'AddressTypes-v1.xsd' you find still more external definitions, not as includes or imports, but via namespaces like those for Dublin Core Metadata Initiative. Sidebar In a prior review of XML editors, I looked at an early version of oXygen. I continue to be more-and-more impressed with this editor, each time a new version comes out, or even when I merely dig around for new features. Specifically, oXygen 4.2 includes a wonderful tool to create a friendly HTML documentation page for an W3C XML Schema. Included with this page is an automatically generated valid XML Instance. Contrast reading a schema directly, such as http://gnosis.cx/download/eml3/410-ballots.xsd with reading oXygen's generated documentation, such as http://gnosis.cx/download/eml3/Ballots.html. WHAT MAKES UP A BALLOT? ------------------------------------------------------------------------ The schema '410-ballots.xsd' specifies the format for an uncast ballot. The format is relatively unremarkable, but it is worth noticing that it includes a number of features to accommodate ballots in general, not merely governmental elections. For example, I am not familiar with any governmental elections that provide a "Reason" for Election/Contest qualification. In this case, however, it may be that such a reason (e.g. "Initiative met signature threshhold") is worth conveying to elections officials, even while not displaying it to voters. The schema '440-castvote.xsd' specifies an actual vote made in response to a ballot. In the Open Voting Consortium (OVC) design that I presented in an earlier installment, I called these root elements '' and '' to emphasize their connection. In contrast to the OVC (preliminary) design, EML does not create any particular relationship between '' and ''. Recall that the OVC design nearly generates a '' simply by removing non-supported selections from a ''. For example, if a '' contains several selections for a '', a '' is just the same XML fragment with all but one selection (candidate) removed. The independent design of schemas within EML leads to certain pitfalls, in my opinion--albeit minor ones. For example, in '410-ballots.xsd' '' may contain -either- a list of '' elements or list of '