LINUX ZONE FEATURE: The Twisted Framework
Part Four, Secure Clients and Servers

David Mertz, Ph.D.
Selector, Gnosis Software, Inc.
August, 2003

    In this final installment of his series on Twisted, David looks at
    specialized protocols and servers contained in the Twisted package,
    with a focus on secure connections.

INTRODUCTION
------------------------------------------------------------------------

  One thing the servers and clients in my prior installments had in
  common is that they operated completely in the clear,
  cryptographically speaking. Sometimes, however, you want to keep your
  connection free from prying eyes (or from tampering/spoofing).

  While protocols for determining permissions on server resources are
  interesting, for this installment, I want to look at protocols
  involving actual wire-level encryption. But for general background,
  readers might want to investigate web-oriented mechanisms like Basic
  Authentication, which is described in RFC-2617 and implemented in
  Apache and other web servers. The Twisted package [twisted.cred] is a
  general, but complex and complicated, framework for providing
  authentication services in general-purpose Twisted servers (not
  limited to web ones).

  There are two widespread APIs for wire-level encryption over the
  Internet: SSL and SSH. The former, SSL (Secure Sockets Layer) is
  widely implemented in web browsers and web servers; in principle,
  however, there is no reason SSL is specifically tied to the HTTP
  protocol. SSL combines a public-key infrastructure, complete with a
  "web-of-trust" based on Certificate Authorities, with creation of a
  session key for standard symmetrical encryption during the life of a
  particular connection.

  Twisted -does- come with an SSL framework; however, as with most
  things in Twisted, exactly how it might work is poorly documented--I
  tried downloading two likely support packages to try to get the
  Twisted v.1.0.6 script 'test_ssl.py' to work (see Resources). I am
  sure that with -some- version of the right 3rd party libraries (and
  some Twisted version)--and perhaps after corrections to erroneous
  examples--it is possible to use SSL with Twisted, but I have not done
  so for this article.

  The other widespread API for wire-level encryption is SSH (Secure
  Shell), well known from the tool of the same name (in lowercase:
  'ssh'). Many of the underlying cryptographic algorithms are shared
  between SSL and SSH, but SSH is focussed on creating encrypted shell
  connections (rather than using snooper-friendly programs/protocols
  like 'telnet' and 'rsh'). Twisted lets you write your own custom SSH
  clients and servers, which is quite nice. While you certainly -can-
  write a basic interactive remote shell, like that provided by the
  client and server 'shh' and 'sshd', you can also create more
  specialized tools to use these secure connections for higher-level
  purposes.

AN SSH WEBLOG CLIENT
------------------------------------------------------------------------

  In continuing with the example of this series of articles, I created a
  tool to examine hits to my webserver log file, but to do so over an
  encrypted SSH channel. This purposes is realistic, actually--perhaps I
  do not want to publically reveal the hits I get to someone monitoring
  my packet stream.

  Before I could get far in my efforts, I needed to figure out what the
  line 'import Crypto' in the [twisted.conch] package was trying to
  find. The name is obviously a hint, but I was also somewhat familiar
  with the Python cryptography library maintained by Andrew Kuchling
  (see Resources). A bit of googling, a download, and an install later,
  Twisted's 'test_conch.py' would run without complaint.  So on to the
  project of creating a custom SSH client.

  I based my client on the example provided in the Twisted file
  'doc/examples/sshsimpleclient.py'. I have simplified somewhat (as well
  as customizing); you you might want to look at what else is in the
  distributed example.  As with most Twisted components, [twisted.conch]
  consists of several layers, each of which can be customized.  I guess
  the name "conch" is a play on the word "shell" in Secure Shell.

  The -transport- level is a customization of 'SSHClientTransport'.  We
  may define several methods, but need to at least define
  '.verifyHostKey()' and '.connectionSecure()'.  In our implementation,
  we trust every host key, and simply give control back to the
  asynchronous reactor core by returning a 'defer.succeed' object.  Of
  course, if you wanted to verify a host against a known key, you could
  do that in '.verifyHostKey()'.

  Creating the channel is where the other layers come in.  A child of
  'SSHUserAuthClient' performs the actual login authentication; and if
  successful, it established a connection (for which I define a child of
  'SSHConnection').  This connection, in turn, creates a channel--a
  child of 'SSHChannel'.  It is the channel, which I named simply
  'Channel' that does the actual custom work.  Specifically, the channel
  does things like send and receive data and commands.  Let us look at
  my specific client:

      #---------------------- ssh-weblog.py ---------------------------#
      #!/usr/bin/env python
      """Monitor a remote weblog over SSH

        USAGE: ssh-weblog.py user@host logfile
      """
      from twisted.conch.ssh import transport, userauth, connection, channel
      from twisted.conch.ssh.common import NS
      from twisted.internet import defer, protocol, reactor
      from twisted.python import log
      from getpass import getpass
      import struct, sys, os
      import webloglib as wll
      #
      USER,HOST,CMD = None,None,None
      #
      class Transport(transport.SSHClientTransport):
          def verifyHostKey(self, hostKey, fingerprint):
              print 'host key fingerprint: %s' % fingerprint
              return defer.succeed(1)

          def connectionSecure(self):
              self.requestService(UserAuth(USER, Connection()))
      #
      class UserAuth(userauth.SSHUserAuthClient):
          def getPassword(self):
              return defer.succeed(getpass("password: "))
          def getPublicKey(self):
              return  # Empty implementation: always use password auth
      #
      class Connection(connection.SSHConnection):
          def serviceStarted(self):
              self.openChannel(Channel(2**16, 2**15, self))
      #
      class Channel(channel.SSHChannel):
          name = 'session'    # must use this exact string
          def openFailed(self, reason):
                  print '"%s" failed: %s' % (CMD,reason)
          def channelOpen(self, data):
              self.welcome = data   # Might display/process welcome screen
              d = self.conn.sendRequest(self,'exec',NS(CMD),wantReply=1)
          def dataReceived(self, data):
              recs = data.strip().split('\n')
              for rec in recs:
                  hit = [field.strip('"') for field in wll.log_fields(rec)]
                  resource = hit[wll.request].split()[1]
                  referrer = hit[wll.referrer]
                  if resource=='/kill-weblog-monitor':
                      print "Bye bye..."
                      self.closed()
                      return
                  elif hit[wll.status]=='200' and hit[wll.referrer]!='-':
                      print referrer, ' -->', resource
          def closed(self):
              self.loseConnection()
              reactor.stop()
      #
      if __name__=='__main__':
          if len(sys.argv) < 3:
              sys.stderr.write('__doc__')
              sys.exit()
          USER, HOST = sys.argv[1].split('@')
          CMD = 'tail -f -n 1 '+sys.argv[2]
          protocol.ClientCreator(reactor, Transport).connectTCP(HOST, 22)
          reactor.run()

  The overall structure of the client is like most of the Twisted
  applications we have seen.  It creates a protocol, and monitors events
  in an asyncronous loop (i.e. 'reactor.run()').

  The interesting part comes in the methods of 'Channel()'. As soon as
  the channel is opened, we execute a custom command--in this case, a
  'tail -f' on the weblog file whose name is specified on the command
  line. Naturally, the host, which is still a completely generic 'sshd'
  server rather than anything Twisted specific, starts sending some data
  back. The method 'dataReceived()' parses the data as it comes in
  (incrementally as 'tail' produces more).  For this specific client, we
  decide when to terminate based on the actual content of the weblog
  being parsed--which amounts to having a web-based way to kill the
  monitoring application.  While that specific configuration is probably
  unusual, the example demonstrates the general concept of severing the
  connection when some condition is met (it could be any condition).  A
  session looks like:

      #-------------- Sample session of weblog monitor ----------------#
      $ ./ssh-weblog.py gnosis@gnosis.cx access-log
      host key fingerprint: 56:54:76:b6:92:68:85:bb:61:d0:f0:0e:3d:91:ce:34
      password:
      http://gnosis.cx/publish/  --> /publish/whatsnew.html
      http://gnosis.cx/publish/whatsnew.html  --> /home/hugo.gif
      Bye bye...

  This is pretty much the same as all the other weblog monitors this
  series created.  I ended the above session by pointing a browser at
  <http://gnosis.cx/kill-weblog-monitor> from another window (otherwise,
  it would watch indefinitely).

MODIFYING THE SSH CLIENT
------------------------------------------------------------------------

  It is a simple matter to create other SSH clients that achive other
  purposes.  For example, I copied 'ssh-weblog.py' to the name 'scp.py',
  and made just a few changes to the code.  The '__main__' body parses
  options slightly differently, and the docstring was adjusted; beyond
  that, I simply modified the '.dataReceived()' method to read:

      #------------ scp.py (modified Channel method) ------------------#
      def dataReceived(self, data):
          open(DST,'wb').write(data)
          self.closed()

   (the variable CMD was set to '"cat "+sys.argv[2]').

   Viola! I have implemented the tool 'scp' that accompanies many SSH
   clients.

   These examples are both "run and collect" tools. That is, they are
   not interactive during the session. But you could easily create
   another tool that made additional calls to 'self.conn.sendRequest()'
   within 'Channel' methods. In fact, if the client was some kind of GUI
   client, you might add those data collection forms as callbacks within
   the reactor. That is, perhaps when certain forms are completed, new
   remote commands could be issued, and the results again collected for
   processing or presentation.

AN SSH WEBLOG SERVER
------------------------------------------------------------------------

  An SSH server uses much of the same structure as the client. As
  before, I simplify and customize 'doc/examples/sshsimpleserver.py' for
  my example. One twist is that a server is best created using an
  'SSHFactory' child that has been configured with appropriate keys and
  classes.

  In our SSH weblog server, we configure a password and username for an
  authorized user.  In the example, they are hardcoded, but you could
  obviously store them otherwise; perhaps configure a list of authorized
  weblog monitors.  Let us look at the example:

      #-------------------- ssh-weblog-server.py ----------------------#
      #!/usr/bin/env python2.3
      from twisted.cred import authorizer
      from twisted.conch import identity, error
      from twisted.conch.ssh import userauth, connection, channel, keys
      from twisted.conch.ssh.factory import SSHFactory
      from twisted.internet import reactor, protocol, defer
      import time
      #
      class Identity(identity.ConchIdentity):
          def validatePublicKey(self, data):
              return defer.succeed('')
          def verifyPlainPassword(self, password):
              if password=='password' and self.name == 'user':
                  return defer.succeed('')
              return defer.fail(error.ConchError('bad password'))
      #
      class Authorizer(authorizer.Authorizer):
          def getIdentityRequest(self, name):
              return defer.succeed(Identity(name, self))
      #
      class Connection(connection.SSHConnection):
          def gotGlobalRequest(self, *args):
              return 0
          def getChannel(self, channelType, windowSize, maxPacket, data):
              if channelType == 'session':
                  return Channel(remoteWindow=windowSize,
                            remoteMaxPacket=maxPacket, conn=self)
              return 0
      #
      class Channel(channel.SSHChannel):
          def channelOpen(self, data):
              weblog = open('../access.log')
              weblog.readlines()
              while 1:
                  time.sleep(5)
                  for rec in weblog.readlines():
                      self.write(rec)
          def request_pty_req(self, data):
              return 1    # ignore, but this gets send for shell requests
          def request_shell(self, data):
              self.client = protocol.Protocol()
              self.client.makeConnection(self)
              self.dataReceived = self.client.dataReceived
              return 1
          def loseConnection(self):
              self.client.connectionLost()
              channel.SSHChannel.loseConnection(self)
      #
      class Factory(SSHFactory):
          publicKeys = {'ssh-rsa':keys.getPublicKeyString(
                                  data=open('~/.ssh/id_rsa.pub').read())}
          privateKeys ={'ssh-rsa':keys.getPrivateKeyObject(
                                  data=open('~/.ssh/id_rsa').read())}
          services = {'ssh-userauth': userauth.SSHUserAuthServer,
                      'ssh-connection': Connection}
          authorizer = Authorizer()
      #
      reactor.listenTCP(8022, Factory())
      reactor.run()

  For brevity, the parsing and formatting of the weblog records is
  omitted, but the idea of using a open channel to write new records as
  they become available is almost the same as with the client approach.
  Of course, in this case, any generic SSH client can connect to the
  specialized server:

      #-------------- Sample session of weblog monitor ----------------#
      $ ssh gnosis.python-hosting.com -p 8022 -l user
      user@gnosis.python-hosting.com's password:
      141.154.146.89 - - [26/Aug/2003:02:47:40 -0500]
      "GET /voting-project/August.2003/0010.html HTTP/1.1" 200 8986
      "http://gnosis.python-hosting.com/voting-project/August.2003/0009.html"
      "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/85
      (KHTML, like Gecko) Safari/85"
      [...]

  Much as with the client approach, an enhanced version might become
  more interactive; the '.dataReceived()' method of the channel could be
  customized to do something useful with data sent from the (generic)
  client.

SOCIAL DYNAMICS
------------------------------------------------------------------------

  The biggest reservation I have about recommending the Twisted
  framework is, unfortunately, the "wild west" feel among its developer
  group. The software itself is quite powerful. But even more than in
  most open source projects, there is insufficient API consistency
  between releases, the documentation remains rough, and a thick skin is
  the main prerequisite for seeking help on its mailing list; you can
  get helpful responses, but only after wading through the acerbic ones.

  As this installment demonstrated--especially in my attempts to fill in
  pieces missing from the examples and documentation, Twisted could
  really stand to have a helpful community behind it.  Hopefully, with
  time, both the documentation and mailing list will improve in quality;
  the facilities hiding in the various corners of the Twisted framework
  are quite impressive.

RESOURCES
------------------------------------------------------------------------

  Twisted Matrix comes with quite a bit of documentation, and
  many examples.  Browse around its homepage to glean a greater
  sense of how Twisted Matrix works, and what has been
  implemented with it (or wait for the next installments here):

    http://twistedmatrix.com

  The Python Cryptography Toolkit, maintained by Andrew Kuchlink, can be
  download at the following URL. This toolkit includes numerous
  well-investigated public-key, private-key, and cryptographic hash
  functions, as well as some miscellaneous other protocols:

    http://www.amk.ca/python/code/crypto.html

  The sourceforge project "Python OpenSSL Wrappers" (POW) looks like an
  useful tool for SSL programming in Python.  However, it does not
  appear (from my trial-and-error) to be what Twisted is looking for in
  its SSL subsystem:

    http://sourceforge.net/projects/pow

  Most likely, for Twisted, the SSL wrapper you want is pyOpenSSL. At
  least after I installed that, I got past an import exception in
  Twisted's 'test_ssl.py' (but only so far as what appears to be an
  error in the test script):

    http://sourceforge.net/projects/pyopenssl/

  Some background on HTTP authentication techniques can be found in
  RFC-2617:

    http://www.ietf.org/rfc/rfc2617.txt

  An introduction to the SSL protocol can be found at:

    http://developer.netscape.com/tech/security/ssl/howitworks.html

  A simple version of a weblog server was presented in the
  developerWorks tip, _Use Simple API for XML as a long-running
  event processor_:

    http://www-106.ibm.com/developerworks/xml/library/x-tipasysax.html

ABOUT THE AUTHOR
------------------------------------------------------------------------

  {Picture of Author: http://gnosis.cx/cgi-bin/img_dqm.cgi}
  David Mertz believes that it is turtles all the way down. David
  may be reached at mertz@gnosis.cx; his life pored over at
  http://gnosis.cx/publish/. And buy his book: _Text Processing
  in Python_ (http://tinyurl.com/jskh).