Xml Zone Tip: Make Your Cgi Scripts Available Via Xml-rpc

Providing a programmatic interface to web services


David Mertz, Ph.D.
Interfacer, Gnosis Software, Inc.
April, 2003

For this large class of CGI scripts, it is both easy and useful to provide an alternate XML-RPC interface to the same calculation or lookup. If you do this, other developers can quickly utilize the information you provide within their own larger applications.

Introduction

Many CGI scripts are, at their heart, just a form of remote procedure call. A user specifies some information--perhaps in an HTML form--and your web server returns a formatted page that contains an answer to their inquiry. The data on such a return page is surrounded in some prettifying HTML markup, but basically it is the data that is of interest. Examples of data-oriented CGI interfaces are search engines, stock price checks, weather reports, personal information lookup, catalog inventory, and so on.

A web browser is a fine interface human eyeballs; but a returned HTML page is far from an optimal form for integration within custom applications. What programmers often do to utilize the data that comes from CGI queries is "screen-scraping" of returned pages--that is, look for identifiable markup and contents, and pull data elements from the texts. But screen-scraping is error-prone: page layout may change over time and/or be dependent on the specific results. A more formal API is better for programmatic access to your CGI functionality.

XML-RPC is exactly intended to enable application access to queryable results over an HTTP channel. Its big sibling SOAP can do a similar job, but the XML format of the SOAP is more complicated than is needed for most purposes. An ideal system is one where people can make queries in a web browser, while custom applications can make the same queries via XML-RPC. The underlying server can do almost exactly the same thing in either case.

An Example

I have created a service within my website that enables users to send email to anonymized recipients. You can read about the goals and architecture of Gnosis-Anon at its home page (see Resources). At the same URL, you can enter a query into an HTML form, and in return be presented with an HTML page informing you of an anonym. From there, you need to use a pencil to write the information down, or perhaps use cut-and-paste into something besides your web browser.

Suppose you wanted to utilize the anonym automatically in an application--e.g. a Mail User Agent (MUA) or Mail Transport Agent (MTA). You might do some screen-scraping like:

get-anonym-cgi.py

#!/usr/bin/env python
from urllib import urlencode, urlopen
from sys import argv
base_url = 'http://gnosis.cx/cgi-bin/encode_address.cgi'
query = urlencode({'duration':argv[1], 'email':argv[2]})
html_answer = urlopen(base_url+'?'+query).readlines()
result = "NO ANONYM FOUND!"
for line in html_answer:
    if line.find("<dt>Anonym:</dt>") >= 0:
        start = line.find('<dd>')+4
        end = line.find('</dd>')
        result = line[start:end]
        break
print result

You can run this with a command line like:

Running get-anonym-cgi

% get-anonym-cgi.py perm [email protected]
[email protected]

This works if I do not change the format of the HTML. That is a big if. A more robust (and simpler) client application might look like:

get-anonym-xmlrpc.py

#!/usr/bin/env python
import sys, xmlrpclib
server = xmlrpclib.Server("http://gnosis.cx:8000")
print server.anonym(sys.argv[1], sys.argv[2])

This XML-RPC application behaves exactly the same as the CGI based one.

Setting Up The Xml-rpc Server

There is not much difference between writing an XML-RPC server and writing a CGI script. The actual calculation or lookup code is identical--the only thing you need to change is the format of the output, and a little extra work parsing the inputs for CGI. My CGI script looks something like:

encode_address.py

#!/usr/bin/env python
import cgi
query = cgi.FieldStorage()
email = query.getvalue('email','[email protected]')
duration = query.getvalue('duration', 'Unknown')
anonym = FIND_THE_ANONYM(duration, email)
html_template = open('template').read()
html = html_template % (email, anonym, duration)
print "Content-Type: text/html"
print
print html

This leaves out the details of how FIND_THE_ANONYM() works, and what the HTML template looks like; but those details are unimportant here.

anonym-xmlrpc-server.py

from SimpleXMLRPCServer import SimpleXMLRPCServer
class Anonym:
    def anonym(self, duration, email):
        return FIND_THE_ANONYM(duration, email)
    def container_test(self):
        return {'spam':'eggs', 'bacon':'toast'}
server = SimpleXMLRPCServer(('', 8000))
server.register_instance(Anonym())
server.serve_forever()

As you see, the self-same lookup function is used; its return value is what is returned to a remote call to the .anonym() method. On the wire, return values are encoded as XML-RPC, but Python's xmlrpclib module, as do analogous libraries in other languages, automatically translates XML-RPC encoded structures back into native data structures. The method .container_test() above can be called remotely as well, what the client will see is a Python dictionary.

Notes

The code examples given use Python, but implementations of both XML-RPC clients and servers exist for a large number of programming languages. Moreover, XML-RPC itself is completely language-neutral: multiple clients written in different languages can call the same server, and none of them care what language the server was written in.

There is a difference in the way that a CGI script runs and the way an XML-RPC server runs. The XML-RPC server is its own process (and uses its own port); CGI scripts, on the other hand, are automatically spawned by a general HTTPd server. But both still travel over HTTP (or HTTPS) layers, so any issues with firewalls, statefulness, and the like remain identical. Some general-purpose HTTPd servers, moreover, support XML-RPC internally. But if like me, you do not control the configuration of your web host, it is easiest to write a standalone XML-RPC server like the 8-line version above.

Resources

Dave Winer has created an XML-RPC interace to Google. It can be found, along with usage examples, at:

http://www.xmlrpc.com/googleGateway

Google themselves choose to implement a programmatic API in the somewhat heavier-weight XML dialect of SOAP. The principle is the same as the XML-RPC version:

http://www.google.com/apis/

A prior article by the author looks at the data model of XML-RPC:

http://www-106.ibm.com/developerworks/xml/library/x-matters15.html

Information and lookups for the Gnosis-Anon Mail Transport Agent can be found at:

http://gnosis.cx/cgi-bin/encode_address.cgi

About The Author

Picture of Author David Mertz knows a little bit about a lot of things, but a lot about fewer things than he once did. The smooth overcomes the striated. David can be reached at [email protected]; his life pored over athttp://gnosis.cx/publish/.