XML Programming Paradigms (part four)

Functional Programming approached to XML processing

David Mertz, Ph.D. <[email protected]>
Gnosis Software, Inc. <http://gnosis.cx/publish/>
January 2002

This series looks at several distinct conceptual models for manipulating XML documents. Functional programming languages, in many ways, provide a stronger fit for XML transformation tasks than do procedural, OOP, or XSLT-style declarative techniques. This article introduces general concepts of functional programming, then illustrates their advantages using the HaXml library for the Haskell language. Resources are given for libraries that exist in other functional languages.

About This Series

As XML has developed into a widely used data format, a number of programming models have arisen for manipulating XML documents. Some of these models--or "paradigms"--have been enshrined as standards, while others remain only informally specified (but equally widely used nonetheless). In a general way, the several models available for manipulating XML documents closely mirror the underlying approaches and techniques that programmers in different traditions bring to the task of working with XML. It is worth noticing that "models" are at a higher level of abstraction than a particular programming language; most of the models discussed in this series are associated with APIs that have been implemented in multiple programming languages.

In part, the richness of available XML programming models simply allows programmers and projects to work in the ways that are most comfortable and familiar to them. In many ways, there is overlap--at least in achievable outcomes--between all the XML programming models. However, different models also carry with them specific pros and cons in the context of XML manipulation; and these might urge the use of particular models for particular projects. This series of five articles aims to provide readers with an overview of the costs, benefits, and motivations for all of the major approaches to programmatic manipulation of XML documents (manipulation here, should be understood also to mean "using XML to drive or communicate other application processes").

The first article, Part 1, discussed the OOP-style Document Object Model (DOM), which is a W3C Recommendation. Part 2 discussed the Simple API for XML (SAX) and similar event-driven and procedural styles of XML programming. Part 3, covered eXtensible Stylesheet Language Transformations (XSLT) The final installment, Part 5, will look briefly at a number of tools and techniques that did not quite fit into the previous discussion, but that readers would do well to be aware of.

This article, Part 4, addresses functional programming (FP) approaches to XML processing. In contrast to the imperative programming style of object-oriented and event-driven paradigms discussed in earlier installments, FP approaches use a declarative programming style. XSLT, discussed in Part 3 of this serious, is also declarative; but the use of higher order functions and generalized computation is difficult and awkward (but not quite impossible) with XSLT. General FP libraries like HaXml allow direct and powerful application of higher-order functions, recursive computation, type-safe data structures, and other FP techniques.

Functional Programming's Conceptual Framework

There are a number of ways in which functional programming languages are strikingly different from more common imperative programming languages. In the concrete, many languages have aspects of each paradigm, but most languages can be easily classified as either functional or imperative. Imperative languages include C, Pascal, Fortran, C++, Java, Cobol, Ada, Perl, TCL, REXX, JavaScript, Visual Basic, and many others (object-orientation is just a variant on an imperative style); functional (and "logical") programming languages include Lisp, Scheme, Erlang, Clean, Mercury, ML, OCaml, and the one we will look at in this article, Haskell. Several "little languages" that are widely used for special purposes are also functional--SQL is one example, but of most interest to us is XSLT.

Functional programming (FP) emphasizes the declaration of functions, in a mathematical sense. The last installment of this series, on XSLT, used the analogy of linear equations to illustrate the declarative style of XSLT. This example applies equally well to more general functional languages, but the example might be extended with the ways one talks in higher algebra, calculus, and the like. That is, in FP as in mathematics, a program consists of a set of named definitions of functions. Every function is stateless , it says that given certain arguments, a certain result occurs; that result is in the nature of the function, it does not depend upon program flow or the state of variables when it is called.

In particular, since functions in FP are stateless, their declarations can occur in any order, not only according to the "program order." Moreover, everything in an FP program is a definition of a function; there are no separate flow-control structures like loops, branches and jumps. Yet another thing one goes without in "pure" functional languages is mutable variables--there are not assignment statements, only definitions. Variables exist, but they are simply place holders for arguments that are referred to in the result definitions, not "buckets" that store and are emptied of values. All of these features might seem strange to those unaccustomed to FP languages; but many of those same readers have been using SQL or XSLT all along without realizing they operated under the same constraints.

The power of FP comes from several places. It turns out that all flow-control structures can be expressed in terms of recursive definitions, so nothing is lost there (except a few habits of thought). But the real strength comes from two main places: 1) the elimination of that large majority of programming errors that result from misunderstanding of program state at a given point; 2) the use of higher-order functions. The latter takes some particular explanation. In FP's, it is very common to define functions that take (collections of) other functions as arguments and/or return functions as results. Rather than merely play with ordinary values like strings, integers, floats, etc, FP's manufacture their own functions to produce complex result mappings.

Some functional programming languages are also "lazy." What this means is that every data structure (including a function definition itself) is evaluated only when, and as much as needed. In many cases, laziness saves work for the CPU running a program; but this savings can be extended as far as defining true infinite data structures (for example, a list of all the prime numbers). Infinite memory is not required, because only those elements actually utilized are concretely computed at runtime. Haskell is a lazy language, as well as a pure functional language.

For further introduction to functional programming, look at the resources at bottom.

HaXml at Work

Let me describe a quite realistic scenario, that shows weaknesses in the techniques described in the earlier installments. XSLT is typically the most direct way previously examined to describe a transformation of an XML document into an output. For example, we might want to create an HTML representation of an XML document. Let us say we have an XML version of the I Ching that looks something like:

XML Version of the I Ching

<?xml version="1.0"?>
<IChing>
    <title>Some Hexagrams from the I Ching</title>
    <hexagram>
        <number>1</number>
        <name>Ch'ien / The Creative</name>
        <judgement>
            The Creative works sublime success,
            Furthering through perseverance.
        </judgement>
    </hexagram>
    <hexagram>
        <number>2</number>
        <name>K'un / The Receptive</name>
        <judgement>
            The Receptive brings about sublime success,
            Furthering through the perseverance of a mare.
        </judgement>
    </hexagram>
    <hexagram>
        <number>3</number>
        <name>Chun / Difficulty at the Beginning</name>
        <judgement>
            Difficulty at the Beginning works supreme success,
            Furthering through perseverance.
        </judgement>
    </hexagram>
</IChing>

XSLT and its shortcomings

To present this information in an HTML table, we might use something like the below XSLT instructions:

XSLT Instructions for an I Ching HTML Table

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns="http://www.w3.org/TR/xhtml1/strict">
    <xsl:output method="html" indent="yes" encoding="UTF-8"/>
    <xsl:template match="IChing">
      <html>
        <head><title><xsl:value-of select="title"/></title></head>
        <body><table border="1"><xsl:apply-templates/></table></body>
      </html>
    </xsl:template>
    <xsl:template match="hexagram">
        <tr><xsl:apply-templates/></tr>
    </xsl:template>
    <xsl:template match="number">
        <td><xsl:apply-templates/></td>
    </xsl:template>
    <xsl:template match="name">
        <td><xsl:apply-templates/></td>
    </xsl:template>
    <xsl:template match="judgement">
        <td><xsl:apply-templates/></td>
    </xsl:template>
    <xsl:template match="*"></xsl:template>
</xsl:stylesheet>

This XSLT seems simple and direct: we just create a template to describe how we would like each element formatted. What could be easier? The problem comes as soon as we want to filter or compute something for the output--something that is not included in the few comparisons available to XSLT. For example, maybe we want (in a numerological spirit) to display only the even numbered hexagrams, or only the prime ones. With XSLT, we are out of luck for something this simple.

At this point, we might turn to DOM or SAX for the task. This will work, but we first have to simply throw away the work that went into earlier XSLT transformations. DOM or SAX are completely different models, and share no significant code or concepts with XSLT. For something as simple as my toy stylesheet, it hardly matters; but for large production-quality transformations, we might lose a lot.

Moreover, neither SAX nor DOM have particularly elegant or maintainable models for output. In order to output the simple (filtered) HTML table described, we just have to litter our application code with print statements or printf() functions (or whatever our general language uses). These output statements are themselves buried in conditional blocks that test for element types and other conditions. In such code, there is no way to tell at a glance how the output flows or make sure that a print "</tr>" is reached to correspond with every print "<tr>" that occurred earlier. There is certainly no guarantee given by imperative or OOP code that our output HTML or XML will be well-formed (let alone valid).

A more uniform solution

What would be ideal for XML transformations would be a system that both let us express output declaratively (as XSLT does) and lets us include arbitrary filters and computations (as the implementation languages underlying DOM and SAX do). While we are at it, it couldn't hurt to be guaranteed the well-formedness, or even validity, of our output. And a compact and direct syntax would be nice too.

HaXml gives us everything requested in the previous paragraph. Actually, the power of HaXml is more general than was even asked for. Taking advantage of the higher order combinatorial functions that a functional programming language like Haskell allows, we can arbitrarily combine multiple filters in specifying output. In XSLT, each <xsl:template> acts as a sort of filter between the input and output. But the only real combination of filters created by an XSL file is a union on all the filters. In contrast, HaXml lets us specify much more fine grained chains of filters for each element of the output. Actually, much of the same can be achieved using XPATH with XSLT, but HaXml is much more concise, and strictly more powerful.

As well providing numerous combinators, HaXml allows the inclusion of arbitrary computations in the Haskell language as part of the filtering. An extra bonus is that output can be specified in a much more coherent block that allows readable intermixing of output terms with filtering conditions. The result is much more concise than XSLT, and has fewer punctuation symbols to make visual scanning difficult:

HaXml program to output an I Ching HTML Table

module Main where
import XmlLib
-- Concise XSLT-like specification of output
main = processXmlWith (hexagrams `o` tag "IChing")
hexagrams =
    html [
      hhead [htitle [keep /> tag "title" /> txt] ],
      hbody [htableBorder [rows `o` children `with` tag "hexagram"] ]
    ]
htableBorder = mkElemAttr "TABLE" [("BORDER",("1"!))]
rows f =
    let
      num = keep /> tag "number" /> txt
      nam = keep /> tag "name" /> txt
      jdg = keep /> tag "judgement" /> txt
    in
      if (condition (num f) (nam f) (jdg f))
      then hrow [hcol [num], hcol [nam], hcol [jdg]] f
      else []
condition num nam jdg = isPrime (makeInt num)

-- Supporting computations for rows condition
makeInt = toInteger . read . unwrap -- Turn [Content] into an Integer
unwrap [(CString b c)] = c          -- Turn [Content] into a String
isPrime = ordSearch (sieve [2..])   -- ordered search of Sieve of Eratosthenes
    where
    ordSearch (x:xs) n
        | x < n     = ordSearch xs n
        | x == n    = True
        | otherwise = False
    sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x > 0]

As with XSLT, definitions may occur in any order. The first twenty lines of the code specify the output format, with some definitions broken out into supporting functions, simply for readability. In Haskell syntax, a function is whatever thing occurs to the left of an equal sign, and the definition is to the right of the equal sign. The where and let clauses specify what we might call "inner functions" in other languages. The first lines are conceptually very much like the XSLT approach. But as an improvement, the HaXml version lets us define ad hoc filters at each point where we need it. For example, one filter is:

rows `o` children `with` tag "hexagram"

The `o` is somewhat fancifully called "Irish Composition", and can also be pronounced "of". rows is our very own filter that we have defined out of convenience; we are just as free to reuse rows as to use any of the standard filters, and in unlimited combination.

The latter half of the program contains a few computational functions. One nice example is a six line primality test that does a lot to illustrate the power and elegance of Haskell as a language. But as the program is structured, the function condition could equally well perform any tests it wanted on the Content of the <number> , <name> or <judgement> elements. "Content" is a special data type, so the first thing we need to do is unwrap the String contained in a Content. After that, we can convert it to an Integer, and test it (or do whatever else we want with it).

There are a lot more details to the HaXml library--and learning Haskell itself requires a learning curve for programmers accustomed to imperative and OOP styles of programming. But if one limits one's interest to just the capabilities one would find in XSLT, the top half of the example program is quite easy to expand upon (I would argue, with less of a learning curve than XSLT, while using a similar functional style). Not only is the syntax more readable than XSLT, but one holds in reserve learning how to do the sort of thing in the bottom half of the program (and much more).

Enforcing Validity

In the above example, an XML document was treated as a generic tree structure. For most purposes, doing this is the easiest and quickest approach. But HaXml provides something else that is rare. If one has a DTD for a document type one plans to work with, a set of native Haskell data structures can be generated from a DTD. From that point forward, applications can be written that utilize the native DTD in their manipulations. The generation mentioned involves several steps. The first thing is to create the data structure as a module, something like:

Create a HaXml data structure from a DTD

% DtdToHaskell MyFile.DTD MyFileDTD.hs

Once that is available, one can write an application that handles XML documents that must conform to the DTD. Such applications will generally contain at least the following lines:

Custom HaXml app for MyFile.DTD XML documents

import Xml2Haskell (readXml)
import MyFileDTD
[...]

From there, one can use all the higher order techniques Haskell provides for dealing with recursive data structures. The first thing, naturally, will probably be to readXml in order to work with a particular input XML document.

A reader can be forgiven for thinking at this point, "so what?" It would appear that this is no different from a DOM approach--or even SAX--where one can perfectly well work with structured data, or even validate against a DTD.

There is much more here than meets the eye. Haskell, unlike almost all other programming languages, is thoroughly type safe about data structures. It is quite simply impossible in Haskell to perform any computation on the generated data structure that would result in an invalid XML document. In contrast, the best one can do in languages like C/C++, Java, Perl, Smalltalk, Python is do a sanity check (validation) on the way in, and another one on the way out, and hope everything goes well. It might be possible in something like Eiffel to add enough contractual constraints on every "adder" and "deleter" to make sure DTD validity is maintained (or, with enough work, in all the mentioned languages), but doing so involves custom programming within every .addSpamTag() method. Moreover, all the DTD can do in the mentioned languages is provide a "cheat sheet" for an application programmer to look at.

With HaXml, the data structure generated programmatically from a DTD automatically includes every validity constraint. Mind you, just enforcing the constraints doesn't make an application programmer write correct code; but at least any bad code that would result in invalid documents is caught at compile time. The other caveat, of course, is that programming a custom application that enforces validity is just plain going to be more work than programming one that does not need to do this. But for "mission critical" requirements, HaXml could well provide the quickest and safest path to the rigorous goal.

The Good and the Bad of FP for XML

For the most part, I think that functional programming offers many benefits and few disadvantages for XML manipulation. Of course, the first disadvantage one must acknowledge is that most programmers already know imperative languages, while learning FP languages is extra work. But I do not think that consideration should be weighed too heavily for long-term projects that will be maintained and modified over time. Allowing better code is worth the extra learning curve for individual programmers (who can be more efficient after the learning stage).

One real disadvantage to understand with HaXml is that it is basically DOM-like in memory usage. Even though Haskell allows lazy IO, HaXml does not take advantage of it, but rather--like DOM--reads an entire XML document into a memory image. However, the HXML library mentioned in the Resources is intended as a superset of HaXml functionality, and adds the SAX-like lazy data structure XMLEvent and functions parseInstance and buildTree (the latter function can be replaced with a more event-driven style). One loses some of the elegance of Haskell data structures by moving to an event-driven style; but FP is not inherently contradictory to memory-frugal techniques (laziness encourages it in some ways).

The final disadvantage that I think is worth mentioning is that each FP XML library is just that, but nothing more. That is, SAX and DOM are general API's that have implementation is in many programming languages. These API's are standards, not merely someone's good idea about what to put in a library. As a consequence, each of the libraries pointed to in the Resources will have different conventions and function names. They will share many of the same advantages of HaXml that were addressed in this article, but the particular API of using them will have little in common. Moreover, this situation is extremely unlikely to change with time; functional programming languages have arisen, largely in academia, each to express a special collection of features and techniques. A library in one functional language will not make much sense in another FP language, even at a conceptual level (for example, if you assume laziness, a non-lazy/strict implementation might be extremely inefficient; or vice-versa). All the OOP languages, in contrast, are much closer to one another.

Resources

The examples, and much of the text, in this article is adapted from an XML Matters column on IBM developerWorks. The earlier article can be found at:

http://www-106.ibm.com/developerworks/library/x-matters14.html

Bijan Parsia has a very interesting related essay called, Functional Programming and XML at XML.com. Parsia makes the argument that functional programming styles are generally better suited to XML manipulation than are more familiar OOP techniques. He discusses HaXml and several other tools:

http://www.xml.com/lpt/a/2001/02/14/functional.html

XML Support in Erlang:

http://sowap.sourceforge.net/xml.html

The SSAX parser for a number of Scheme dialects can be found at:

http://ssax.sourceforge.net/

XDuce: A Typed XML Processing Language

http://xduce.sourceforge.net/

HXML is a Haskell XML parser that is largely compatible with--but an enhancement of--the older HaXml library. The main improvement by HXML is an improvement in the space behavior compared to HaXml. For very large XML documents, HXML is likely to be practical where HaXml is not:

http://www.flightlab.com/~joe/hxml/

A detailed discussion of HaXml was written by its original authors, Malcolm Wallace and Colin Runciman. Haskell and XML: Generic Combinators or Type-Based Translation may be found at the below URL. Its tone and level presume a greater familiarity with Haskell and functional programming than is requisite for this column, but Wallace and Runciman's paper thereby contains many details not addressed herein:

http://www.cs.york.ac.uk/fp/HaXml/icfp99.html

Information about Haskell, including several tutorials, numerous papers, and various compilers and interpreters can be found at the Haskell language website:

http://www.haskell.org/

I have myself written an introductory tutorial to Haskell, which is aimed at beginners:

http://gnosis.cx/publish/programming/Haskell.pdf

The files mentioned in this article may be found in the below archive:

http://gnosis.cx/download/xml_matters_14.zip

Some sample outputs of this transformations discussed in the article can be found at:

http://gnosis.cx/download/iching.html

The HaXml version at (whitespace and layout are not identical):

http://gnosis.cx/download/iching_hs.html

It turns out that from a computer-science perspective, XSLT is a "complete" programming language. Dimitre Novatchev ([email protected]) has written an as-yet-unpublished article that I recently reviewed which shows that some fully general functional programming techniques can be implemented in XSLT using a "hack" on namespaces in XSLT.