David Mertz, Ph.D.
Attributor, Gnosis Software, Inc.
November 2001
This article discusses XML-RPC as a way of modelling object
data. General background on what XML-RPC does is given. We
compare, specifically, XML-RPC as a means of representing
Python objects--and objects in other languages--with the
xml_pickle
module discussed in earlier columns.
XML-RPC is a remote function invocation protocol that has the
great virtue of being worse than all its competitors. Compared
to Java RMI, or CORBA, or COM, XML-RPC is impoverished in the
data it can transmit and obese in message size. XML-RPC abuses
the HTTP protocol to circumvent firewalls that exist for good
reasons, and as a consequence transmits messages lacking
statefullness and incurs a channel bottleneck. Compared to
SOAP, XML-RPC lacks both important security mechanisms and a
robust object model. Just as a data representation, XML-RPC is
slow, cumbersome, and incomplete compared to native programming
language mechanisms like Java's serialize
, Python's pickle
,
Perl's Data::Dumper
, or similar modules for Ruby, Lisp, PHP,
and many other languages.
In other words, XML-RPC is the perfect embodiment Richard Gabriel's motto "worse is better." I can hardly write more glowingly about XML-RPC than the prior paragraph, and I think the protocol is a perfect match for a huge variety of tasks. To understand why, I think it is worth quoting Gabriel characterizing the "worse-is-better philosophy":
* Simplicity: the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
* Correctness: the design must be correct in all observable aspects. It is slightly better to be simple than correct.
* Consistency: the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
* Completeness: the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.
Writing years before the specific technology existed, Gabriel identifies the virtues of XML-RPC perfectly.
I have written a moderately popular module for Python called
xml_pickle
. The purpose of that module (discussed previously
in this column) is to serialize Python objects, using an
interface mostly the same as the standard cPickle
and
pickle
modules. The only difference is that the
representation is in XML. My intention all along with that
module was to create a very lightweight format that could also
be read from other programming languages (and across Python
versions). A DTD accompanies the module for users wanting to
validate "XML pickles", but feedback from users suggest that
formal validation is rarely bothered with.
A recurrent question I have received from users of xml_pickle
is whether XML-RPC would be a better choice, given its more
widespread use and existing implementations in many programming
languages. While the answer to the narrow question probably
favors xml_pickle
, the comparison is worthwhile--and it
raises some points about data-type richness.
On the face of things, XML-RPC would seem to do something
different than xml_pickle
. XML-RPC calls remote procedures,
and gets results back. This below typical usage example
appears at the XML-RPC website and in the Programming Web
Services with XML-RPC book:
>>> import xmlrpclib >>> betty = xmlrpclib.Server("http://betty.userland.com") >>> print betty.examples.getStateName(41) South Dakota
By contrast, xml_pickle
creates string representations of
local in-memory objects. Those do not seem the same. But in
fact, most of what XML-RPC needs to do to call a remote
procedure is first to convert its arguments to suitable XML
representations--in other words, pickle/serialize the
parameters. A return value from an XML-RPC call can similarly
contain a nested data structure. Moreover the .dumps()
method of the xmlrpclib
is both named identically to an
xml_pickle
module (both inspired by several standard
modules), and does the same thing--write the XML serialization
without peforming any actual call.
At a first level of examination, xml_pickle
and xmlrpclib
are functionally interchangeable--at least if one only cares
about the serialization aspect. As we will see, a closer look
finds some differences.
Let us create an object, then serialize it by two means. Some contrasts will come to the fore:
>>> import xmlrpclib >>> class C: pass ... >>> c = C() >>> c.bool, c.int, c.tup = (xmlrpclib.True, 37, (11.2, 'spam') ) >>> print xmlrpclib.dumps((c,),'PyObject') <?xml version='1.0'?> <methodCall> <methodName>PyObject</methodName> <params> <param> <value><struct> <member> <name>tup</name> <value><array><data> <value><double>11.2</double></value> <value><string>spam</string></value> </data></array></value> </member> <member> <name>bool</name> <value><boolean>1</boolean></value> </member> <member> <name>int</name> <value><int>37</int></value> </member> </struct></value> </param> </params> </methodCall>
A few things should be noted already. First is that the whole
XML document has a root <methodCall>
element which is
irrelevant for our current purposes. Other than a few bytes
extra, however, it makes no difference. Likewise, the
<methodName>
is superfluous, but the example gives a name
that indicates the role of the document. Moreover, a call to
xmlrpclib.dumps()
accepts a tuple of objects, but we are only
interested in "pickling" one--if there were others, they would
have their own <param>
element. But other than some
wrapping, the attributes of our object are well contained in
the <struct>
element's <member>
elements.
Now let us look at what xml_pickle
does (the object is the
same as above):
>>> from xml_pickle import XML_Pickler >>> print XML_Pickler(c).dumps() <?xml version="1.0"?> <!DOCTYPE PyObject SYSTEM "PyObjects.dtd"> <PyObject class="C" id="1840428"> <attr name="bool" type="PyObject" class="Boolean" id="1320396"> <attr name="value" type="numeric" value="1" /> </attr> <attr name="int" type="numeric" value="37" /> <attr name="tup" type="tuple" id="1130924"> <item type="numeric" value="11.199999999999999" /> <item type="string" value="spam" /> </attr> </PyObject>
There is both less and more to the xml_pickle
version (the
actual size is similar between the two). Notice that even
though Python does not have a builtin boolean type, when a
class is used to represent a new type, xml_pickle
adjusts
with aplomb (albeit more verbosely). XML-RPC, by contrast, is
limited to serializing its 8 data types, and nothing else. Of
course, two of those types--'<array>' and <struct>
--are
themselves collections, and can be compound. There is also the
little matter that xml_pickle
can point multiple collection
members to the same underlying object; but that is absent by
design from XML-RPC (and introduced in later versions of
xml_pickle
also). As a small matter, xml_pickle
contains
only a single numeric
type attribute, but the actual pattern
of the value
attribute allows decoding to integer, float,
complex, etc. No real generality is lost or gained by these
strategies, although the XML-RPC style will appeal
aesthetically to programmers in statically-typed languages.
The problem with XML-RPC as an object serialization format is that it just plain does not have enough types to handle the objects in most high-level programming languages. The most obvious demonstration of this fact is below:
>>> c = C() >>> c.foo = 'bar' >>> d = {'foo':'bar'} >>> print xmlrpclib.dumps((c,d),'PyObjects') <?xml version='1.0'?> <methodCall> <methodName>PyObjects</methodName> <params> <param> <value><struct> <member> <name>foo</name> <value><string>bar</string></value> </member> </struct></value> </param> <param> <value><struct> <member> <name>foo</name> <value><string>bar</string></value> </member> </struct></value> </param> </params> </methodCall>
Two things were serialized--an object instance and a
dictionary. While it is fair enough to say that Python objects
are particularly "dictionary-like", there is a lot of
information lost by representing a dictionary and an object in
exactly the same way. Moreover, the excessively generic
meaning for <struct>
in XML-RPC affects pretty much any OOP
language, or at least any language that has native
hash/dictionary constructs; it is not a Python quirk here. On
the other hand, failing to distinguish Python tuples and lists
within the <array>
type of XML-RPC is a fairly
Python-specific limitation.
xml_pickle
handles all the Python types much better
(including data types defined by user classes, as we saw).
There is actually no direct pickling of dictionaries in
xml_pickle
, basically because no one has asked for that (it
would be easy to add). But dictionaries that are object
attributes get pickled:
>>> c, c2 = C(), C() >>> c2.foo = 'bar' >>> d = {'foo':'bar'} >>> c.c, c.d = c2, d >>> print XML_Pickler(c).dumps() <?xml version="1.0"?> <!DOCTYPE PyObject SYSTEM "PyObjects.dtd"> <PyObject class="C" id="1917836"> <attr name="c" type="PyObject" class="C" id="1981484"> <attr name="foo" type="string" value="bar" /> </attr> <attr name="d" type="dict" id="1917900"> <entry> <key type="string" value="foo" /> <val type="string" value="bar" /> </entry> </attr> </PyObject>
Another virtue of the xml_pickle
approach that is implied in
the example is that dictionary keys need not be strings, as
they must be in XML-RPC <struct>
elements. However, Perl,
PHP and most langauges are closer to the XML-RPC model in this.
xml_pickle
Unfortunately, xml_pickle
lacks some types that many
programming languages have. If our goal is not simply to save
and restore Python objects, but to exchange objects across
languages, xml_pickle
is not currently quite adequate. The
issue of floats and integers is not really important in
principle; but designing an "unpickler" for, say, Java would be
made easier by having the XML parser be able to determine the
type needed, rather than defer that until the format of the
value
attribute is analyzed.
Of more serious concern for cross-language pickling are the
<boolean>
and <dateTime.iso8601>
tags that XML-RPC has, but
Python lacks as a built-in type. Even though I claimed that
xml_pickle
handled user classes that define custom data
types easily and well, that is not quite as true when it comes
to the cross-language case. For example, the below fragment of
an xml_pickle
representation describes an iso8601 Date/Time:
<attr name="dte" type="PyObject" class="DateTime" id="1984076"> <attr name="value" type="string" value="20011122T17:28:55" /> </attr>
There are two issues that make it difficult to utilize this
data in, say, Perl or REBOL or PHP. One matter is
the namespace the restored class. In Python, the namespace of
the restored xmlrpclib.DateTime
becomes, by default,
xml_pickle.DateTime
(but the namespaces can be manually
manipulated prior to unpickling). The way Python's
instantiation and namespaces work, little rests on this
fact--or at least not if our interest is in the instance
attributes rather than its methods. But various languages
handle scoping matters in very different ways.
More important, by far, is the fact that this custom class
cannot be easily recognized as a native type in languages where
it is one. Perl and PHP do not have a native DateTime type
anyway, so nothing is really lost as long as unpicklers in
those languages restore the value
instance attribute. But
something like REBOL has many more native data types. Not just
dates, but also exotic types like email addresses and URLs.
Those are lost in the xml_pickle
process. Of course, XML-RPC
also loses those data types. Either way, we are left with
plain string type to represent something more specific (or
<base64>
in XML-RPC, which xml_pickle
handles by escaping
highbit values--e.g. "\xff").
Neither XML-RPC nor xml_pickle
are entirely satisfactory as
means of representing the object instances of popular
programming languages. But both of them come pretty close.
Let me some suggest some approaches to patching up the short
gap between these protocols and a general object serialization
format.
"Fixing" xml_pickle
is actually amazingly simple. Just add
more types to the format. For example, since xml_pickle
was
first developed, the UnicodeType
has been added to Python.
Adding complete support for it took exactly four lines of new
code (although that was simplified slightly by the fact XML is
natively Unicode). Or again, at the requirement of users, the
numeric
module's ArrayType
was added with little more work.
Even if a type is not present in Python, a custom class could
be defined within xml_pickle
to add the behavior of that
type--e.g. maybe REBOL's "email address" type will be supported
with a fragment like:
<attr name="my_address" type="email" value="[email protected]" />
Once unpickled, xml_pickle
could either just treat "email" as
a synonym for "string", or we could implement an EmailAddress
class with some useful behaviors. One such behavior, if we
took the latter route would be pickling into the above
xml_pickle
fragment.
"Fixing" XML-RPC is more difficult. It would be easy to
suggest just adding a bunch of new data types. And purely
technically there would be no particular problem with this.
But as a social matter, XML-RPC's success makes it difficult to
introduce incompatible changes. A hypothetical "data-enhanced"
XML-RPC would not play nice with all the existing
implementations and installations. Actually, some implementors
have felt sufficiently bothered by the lack of a "nil" type
that they have added a non-standard (or at best semi-standard)
type to correspond to Java null
, Python None
, Perl undef
,
SQL NONE
and the like. But a bunch more types that only some
programming languages use is not going to fly.
One approach to enhancing XML-RPC as an object serializer is to
coopt the <struct>
element to do double-duty. Everything
that is incompletely typed by standard XML-RPC could be wrapped
in a <struct>
with single <member>
, where the <name>
indicates the special type. While existing XML-RPC libraries
do not do this, the XML-RPC protocol and DTD are so simple that
adding this behavior is fairly trivial (but requires modifying
the libraries, not just wrapping them, in most cases).
For example, XML-RPC cannot naively describe the difference between Python lists and tuples. So the below fragment is incomplete as a description of a Python object:
<array> <data> <value> <double>11.2</double> </value> <value> <string>spam</string> </value> </data> </array>
One could substitue the following representation, which is valid XML-RPC, and a suitable implementation could restore to a specific Python object:
<struct> <member> <name>NEWTYPE:tuple</name> <value> <array> <data> <value> <double>11.2</double> </value> <value> <string>spam</string> </value> </data> </array> </value> </member> </struct>
Representing a true <struct>
could happen two (or more)
ways. (1) every <struct>
could be wrapped in another
<struct>
(maybe with the <name>
"OLDTYPE:struct" or the
like). For Python, this is probably best anyway, since
dictionaries and object instances are both "NEWTYPE"s. (2)
The namespace-like prefix "NEWTYPE:" could be reserved for
this special usage (accidental collision seems unlikely).
Userland's XML-RPC homepage is, naturally, the place to start investigating XML-RPC. Many useful resources can be found there:
http://xmlrpc.com/
While at the XML-RPC homepage, it is particularly worthwhile to investigate the tutorials and articles they provide links for:
http://www.xmlrpc.com/directory/1568/tutorialspress
Kate Rhodes has written a nice comparison called "XML-RPC vs. SOAP." In it, she points to a number of details that belie SOAP's description as a "lightweight" protocol:
http://weblog.masukomi.org/writings/xml-rpc_vs_soap.htm
Richard P. Gabriel's rather famous paper "Lisp: Good News, Bad
News, How to Win Big" can be found in full at the below URL.
What everyone reads and refers to is the section called "The
Rise of Worse is Better
":
http://www.ai.mit.edu/docs/articles//good-news/good-news.html
The O'Reilly title Programming Web Services with XML-RPC, by Simon St. Laurent, Joe Johnston & Edd Dumbill, is quite excellent. It's spirit matches that of XML-RPC itself.
xml_pickle
can be found at:
http://gnosis.cx/download/xml_pickle.py
The associated DTD lives at:
http://gnosis.cx/download/PyObjects.dtd
Secret Lab's xmlrpc
Python module can be found at:
http://www.pythonware.com/products/xmlrpc/index.htm
David Mertz puts the "lite" in "lightweight". David may be reached at [email protected]; his life pored over at http://gnosis.cx/publish/. Suggestions and recommendations on this, past, or future, columns are welcomed.