XML ZONE TIP: Make your CGI scripts available via XML-RPC
Providing a programmatic interface to web services
David Mertz, Ph.D.
Interfacer, Gnosis Software, Inc.
April, 2003
For this large class of CGI scripts, it is both easy and
useful to provide an alternate XML-RPC interface to the same
calculation or lookup. If you do this, other developers can
quickly utilize the information you provide within their own
larger applications.
INTRODUCTION
------------------------------------------------------------------------
Many CGI scripts are, at their heart, just a form of remote
procedure call. A user specifies some information--perhaps in
an HTML form--and your web server returns a formatted page that
contains an answer to their inquiry. The data on such a return
page is surrounded in some prettifying HTML markup, but
basically it is the data that is of interest. Examples of
data-oriented CGI interfaces are search engines, stock price
checks, weather reports, personal information lookup, catalog
inventory, and so on.
A web browser is a fine interface human eyeballs; but a returned
HTML page is far from an optimal form for integration within
custom applications. What programmers often do to utilize the
data that comes from CGI queries is "screen-scraping" of
returned pages--that is, look for identifiable markup and
contents, and pull data elements from the texts. But
screen-scraping is error-prone: page layout may change over
time and/or be dependent on the specific results. A more
formal API is better for programmatic access to your CGI
functionality.
XML-RPC is exactly intended to enable application access to
queryable results over an HTTP channel. Its big sibling SOAP
can do a similar job, but the XML format of the SOAP is more
complicated than is needed for most purposes. An ideal system
is one where people can make queries in a web browser,
while custom applications can make the same queries via
XML-RPC. The underlying server can do almost exactly the same
thing in either case.
AN EXAMPLE
------------------------------------------------------------------------
I have created a service within my website that enables users
to send email to anonymized recipients. You can read about the
goals and architecture of Gnosis-Anon at its home page (see
Resources). At the same URL, you can enter a query into
an HTML form, and in return be presented with an HTML page
informing you of an anonym. From there, you need to use a
pencil to write the information down, or perhaps use
cut-and-paste into something besides your web browser.
Suppose you wanted to utilize the anonym automatically in an
application--e.g. a Mail User Agent (MUA) or Mail Transport
Agent (MTA). You might do some screen-scraping like:
#--------------- get-anonym-cgi.py ----------------------#
#!/usr/bin/env python
from urllib import urlencode, urlopen
from sys import argv
base_url = 'http://gnosis.cx/cgi-bin/encode_address.cgi'
query = urlencode({'duration':argv[1], 'email':argv[2]})
html_answer = urlopen(base_url+'?'+query).readlines()
result = "NO ANONYM FOUND!"
for line in html_answer:
if line.find("
Anonym:") >= 0:
start = line.find('')+4
end = line.find('')
result = line[start:end]
break
print result
You can run this with a command line like:
#--------------- Running get-anonym-cgi -----------------#
% get-anonym-cgi.py perm mertz@gnosis.cx
.rNCOolqsVQYu@gnosis.cx
This works if I do not change the format of the HTML. That is
a big if. A more robust (and simpler) client application might
look like:
#--------------- get-anonym-xmlrpc.py -------------------#
#!/usr/bin/env python
import sys, xmlrpclib
server = xmlrpclib.Server("http://gnosis.cx:8000")
print server.anonym(sys.argv[1], sys.argv[2])
This XML-RPC application behaves exactly the same as the CGI
based one.
SETTING UP THE XML-RPC SERVER
------------------------------------------------------------------------
There is not much difference between writing an XML-RPC server
and writing a CGI script. The actual calculation or lookup
code is identical--the only thing you need to change is the
format of the output, and a little extra work parsing the
inputs for CGI. My CGI script looks something like:
#--------------- encode_address.py ----------------------#
#!/usr/bin/env python
import cgi
query = cgi.FieldStorage()
email = query.getvalue('email','test@test.lan')
duration = query.getvalue('duration', 'Unknown')
anonym = FIND_THE_ANONYM(duration, email)
html_template = open('template').read()
html = html_template % (email, anonym, duration)
print "Content-Type: text/html"
print
print html
This leaves out the details of how 'FIND_THE_ANONYM()' works,
and what the HTML template looks like; but those details are
unimportant here.
#------------ anonym-xmlrpc-server.py -------------------#
from SimpleXMLRPCServer import SimpleXMLRPCServer
class Anonym:
def anonym(self, duration, email):
return FIND_THE_ANONYM(duration, email)
def container_test(self):
return {'spam':'eggs', 'bacon':'toast'}
server = SimpleXMLRPCServer(('', 8000))
server.register_instance(Anonym())
server.serve_forever()
As you see, the self-same lookup function is used; its return
value is what is returned to a remote call to the '.anonym()'
method. On the wire, return values are encoded as XML-RPC, but
Python's [xmlrpclib] module, as do analogous libraries in other
languages, automatically translates XML-RPC encoded structures
back into native data structures. The method
'.container_test()' above can be called remotely as well, what
the client will see is a Python dictionary.
NOTES
------------------------------------------------------------------------
The code examples given use Python, but implementations of both
XML-RPC clients and servers exist for a large number of
programming languages. Moreover, XML-RPC itself is completely
language-neutral: multiple clients written in different
languages can call the same server, and none of them care what
language the server was written in.
There -is- a difference in the way that a CGI script runs and
the way an XML-RPC server runs. The XML-RPC server is its own
process (and uses its own port); CGI scripts, on the other
hand, are automatically spawned by a general HTTPd server. But
both still travel over HTTP (or HTTPS) layers, so any issues
with firewalls, statefulness, and the like remain identical.
Some general-purpose HTTPd servers, moreover, support XML-RPC
internally. But if like me, you do not control the
configuration of your web host, it is easiest to write a
standalone XML-RPC server like the 8-line version above.
RESOURCES
------------------------------------------------------------------------
Dave Winer has created an XML-RPC interace to Google. It can
be found, along with usage examples, at:
http://www.xmlrpc.com/googleGateway
Google themselves choose to implement a programmatic API in the
somewhat heavier-weight XML dialect of SOAP. The principle is
the same as the XML-RPC version:
http://www.google.com/apis/
A prior article by the author looks at the data model of
XML-RPC:
http://www-106.ibm.com/developerworks/xml/library/x-matters15.html
Information and lookups for the Gnosis-Anon Mail Transport
Agent can be found at:
http://gnosis.cx/cgi-bin/encode_address.cgi
ABOUT THE AUTHOR
------------------------------------------------------------------------
{Picture of Author: http://gnosis.cx/cgi-bin/img_dqm.cgi}
David Mertz knows a little bit about a lot of things, but a lot
about fewer things than he once did. The smooth overcomes the
striated. David can be reached at mertz@gnosis.cx; his life
pored over at http://gnosis.cx/publish/.