XML MATTERS #38: OASIS Election Markup Language
Standardization of XML Formats for Voting and Elections
David Mertz, Ph.D.
Bean Counter, Gnosis Software, Inc.
September 2004
The Organization for the Advancement of Structured Information
Standards (OASIS) has developed many XML standards in use within
government, law, and business. The Election Markup Language is
OASIS' foray into to world of elections--with special attention to
voting within governmental jurisdictions. In this installment, David
gives readers an introductory look at the structure and purpose of
EML, with an eye toward how the largely European EML will shape
future data standards used in the United States.
XML AND VOTING SYSTEMS
------------------------------------------------------------------------
Readers of my prior _XML Matters_ installment on the use of XML in an
open source voting machine will recognize my motivation for
investigating the OASIS standard for Election Markup Language (EML).
Actually, my direct interest is further piqued by my recent membership
in the still-fledgling IEEE Project 1622 (Voting Systems Electronic
Data Interchange). OASIS' EML covers quite a bit more ground than Open
Voting Consortium's narrow demo system, or even than is anticipated
for P-1622.
Specifically, EML is intended to be rich enough to accommodate
governmental elections across many jurisdication levels, and also
elections with many different kinds of organizations (community or
corporate, for example); to allow voting over many channels, both
traditional voting booths (perhaps electronic ones) and remote systems
like Web pages, telephone voting, kiosks, and so on; to enable many
tabulation and voting rules, such as ranked preference and cummulative
voting; to handle security, encryption and authentication
requirements; and also to record and convey information about voter
registration, organization membership, and other voter metadata. EML
has seen significant real world use in European government, and in
some non-governmental organizations worldwide.
EML, in my opinion, suffers somewhat (but not outrageously) from an
over-engineering common among XML technologies (think SOAP, W3C XML
Schemas, or even XSLT). Committees have a tendency to produce
standards with too many details, handling too many corner cases
centrally, and with too many levels of indirection. Of course, having
joined another standards committee myself, I suppose I will myself
soon be guilty of participating in feature creep. Nonetheless, our
tentative plan in IEEE P-1622 is to start with a simpler data model
provided by a commercial election system vendor (but released on
non-proprietary terms), rather than adopt EML whole cloth towards
standardization of USAian elections data. Our target in P-1622 is only
to accommodate the needs of governmental elections, not every possible
voting scenario; moreover, the fifty-some US states and territories
have somewhat less procedural variation than do the 45 member nations
in the Council of Europe (for example). Nonetheless, the fact that we
already have several other contributed data models to reconcile into
the final design already makes for a nascent featuritis.
WHAT DOES EML INCLUDE?
------------------------------------------------------------------------
To get a sense of the scope of EML version 3.0, let me quote from the
Executive Summary to the standard:
The primary deliverable of the committee \[is] the Election Markup
Language (EML). This is a set of data and message definitions
described as XML schemas. At present EML includes specifications
for:
* Candidate Nomination, Response to Nomination and Approved
Candidate Lists
* Voter Registration information, including eligible voter lists
* Various communications between voters and election officials, such
polling information, election notices, etc.
* Logical Ballot information (races, contests, candidates, etc.)
* Voter Authentication
* Vote Casting and Vote Confirmation
* Election counts and results
* Audit information pertinent to some of the other defined data and
interfaces
There are a good number of distinct data requirements addressed by the
various aspects of EML. The schemas associated with the logical
aspects of an election process are given numeric prefixes to indicate
general category. So the 400 series schemas are associated voting as
such; the 500 series are associated with tabulation (called
-canvassing- in American terminology); the 100 series with an overall
election specification; the 200 series with candidates; the 300 series
with voters (eligibility, etc). Within each schema series, one or
more W3C XML Schemas are provided to describe documents filling those
requirements.
As well as the numbered schema families, EML contains a collection of
supporting schemas, mainly dealing with common datatypes. For
example, most or all include the schema 'emlcore.xsd' (in some cases
indirectly via some other include). Such a schema will have a line
like:
#---------- Include line for EML core datatypes -----------------#
The EML core, in turn includes 'emlexternals.xsd' and imports
'emltimestamp.xsd' and the W3C's 'xmldsig-core-schema.xsd'. I have not
listed everything incorporated, but it shows the style. The lines to
include or import the mentioned schemas are:
#------------ External resources used by emlcore.xsd --------------#
So far, so good; let us look deeper. The schema 'emlexternals.xsd'
only defines formats for addresses and personal details about
voting-eligible citizens. But my feeling is the includes are
structured as they are with an eye to expanding the element and type
definitions within 'emlexternals.xsd' when or if the need arises. In
the main, 'emlexternals.xsd' does its work with yet more includes:
#----- Citizen info datatypes imported to emlexternals.xsd ------#
Of course, once you follow the path still further, into
'AddressTypes-v1.xsd' you find still more external definitions, not
as includes or imports, but via namespaces like those for Dublin Core
Metadata Initiative.
Sidebar
In a prior review of XML editors, I looked at an early version of
oXygen. I continue to be more-and-more impressed with this editor,
each time a new version comes out, or even when I merely dig around
for new features. Specifically, oXygen 4.2 includes a wonderful tool
to create a friendly HTML documentation page for an W3C XML Schema.
Included with this page is an automatically generated valid XML
Instance. Contrast reading a schema directly, such as
http://gnosis.cx/download/eml3/410-ballots.xsd with reading oXygen's
generated documentation, such as
http://gnosis.cx/download/eml3/Ballots.html.
WHAT MAKES UP A BALLOT?
------------------------------------------------------------------------
The schema '410-ballots.xsd' specifies the format for an uncast
ballot. The format is relatively unremarkable, but it is worth
noticing that it includes a number of features to accommodate ballots
in general, not merely governmental elections. For example, I am not
familiar with any governmental elections that provide a "Reason" for
Election/Contest qualification. In this case, however, it may be that
such a reason (e.g. "Initiative met signature threshhold") is worth
conveying to elections officials, even while not displaying it to
voters.
The schema '440-castvote.xsd' specifies an actual vote made in
response to a ballot. In the Open Voting Consortium (OVC) design that
I presented in an earlier installment, I called these root elements
'' and '' to emphasize their connection. In
contrast to the OVC (preliminary) design, EML does not create any
particular relationship between '' and ''. Recall
that the OVC design nearly generates a '' simply by
removing non-supported selections from a ''. For example, if a
'' contains several selections for a '',
a '' is just the same XML fragment with all but one
selection (candidate) removed.
The independent design of schemas within EML leads to certain
pitfalls, in my opinion--albeit minor ones. For example, in
'410-ballots.xsd' '' may contain -either- a list of
'' elements or list of '