Charming Python #b9: Making Python Run As Fast As C

Psyco - The Python Specializing Compiler

David Mertz, Ph.D.
Just-in-time Writer, Gnosis Software, Inc.
August, 2002

In some ways the design of Python resembles the design of Java. Both utilize a virtual machine that interprets specialized pseudo-compiled byte-codes. One area where JVM's are more advanced than Python is in optimizing the execution of byte-codes. To Python's rescue comes Psyco--the Python Specializing Compiler--which right now is an external module, but which could someday be included in Python itself. With only a tiny amount of extra programming, Psyco can often be used to increase the speed of Python code by orders of magnitude. This article looks at what Psco is, and tests it in some applications.

Introduction

Python is usually fast enough for what you want it to do. Ninety percent of the concerns that novice programmers express about the execution speed of an interpreted/byte-compiled language like Python are simply naive. On modern hardware, most unoptimized Python programs run as fast as they need to, and there is really no point in spending extra programming effort to make an application run faster.

This installment, therefore, is only interested in that other 10%. Once in while, Python programs (or programs in other languages) run impracticably slowly. What is needed for a given purpose varies greatly--shaving off milliseconds is rarely compelling (but more often than never), but speeding up tasks that run for minutes, hours, days, or weeks is often worthwhile. Moreover, it should be noted that not everything that runs slowly is CPU bound; if a database query takes hours to complete, for example, it makes little difference whether the resulting dataset takes one or two minutes to process. This installment is also not about I/O issues.

There are a number of ways to speed up Python programs. The first technique that should come to every programmers mind is an improvement to the algorithms and data structures used. Micro-optimizing the steps of an inefficient algorithm is a fool's errand. For example, if the complexity order of the current technique is O(n**2), speeding up the steps by 10x is a lot less helpful where it matters than is finding an O(n) substitute. This comparative moral applies even where approaches as extreme as a rewrite in assembly are considered--the right algorithm in Python will often go faster than the wrong algorithm in hand-tuned assembly.

The second technique to consider is to profile your Python application, with an eye toward rewriting key portions as C extension modules. Using an extension wrapper like SWIG, you can create a C extension that executes the most time consuming elements of your program as C code. Extending Python in this manner is relatively straightforward, but does require a bit of a learning curve (and knowledge of C). Very often you will find that the large majority of the time spent in executing your Python application is spent in just a handful of functions, so considerable gains are possible.

A third technique builds on the second. Greg Ewing has created a language called Pyrex, which melds Python and C. In particular, to use Pyrex, you write functions in a Python-like language that adds type-declarations to selected variables. Pyrex (the tool) processes ".pyx" file into ".c" extensions. Once compiled with a C compiler, these Pyrex (the language) modules can be imported into and used in your regular Python applications. Since Pyrex uses almost the same syntax as Python itself--including loop, branch and exception statements; assigment forms; block indentation; and so on--a Pyrex programmer need not learn C to write extensions. Moreover, Pyrex allows more seamless mixing of C-level variables and Python-level variables (objects) within the same code than does an extension written directly in C.

A final technique is the subject of this installment. The extension module Psyco can plug in to the guts of the Python interpreter, and selectively substitute optimized machine code for portions of Python's interpreted byte-code. Unlike the other techniques described, Psyco operates strictly at Python runtime. That is, Python source code is compiled by the python command to byte-code in exactly the same manner as before (except for a couple import statements and function calls added to invoke Psyco). But while the Python interpreter is running an application, Psyco sometimes checks to see if it can substitute some specialized machine code for regular Python byte-code actions. This specialized compilation is both very similar to what Java just-in-time compilers do--broadly speaking, at least--and is architecture specific. As of right now, Psyco is only available for i386 CPU architectures. The nice thing about Psyco is that you can use the very same Python code you have been writing all along (and very same is quite literally true), but let it run much faster.

How Psyco Works

To understand Psyco completely, you probably need to have a good grasp of both the Python interpreter's eval_frame() function and i386 Assembly. Unfortunately, I myself can claim neither expertise--but I think I can explain Psyco in outline wihtout going too far wrong. For more definative word, consult Psyco creator Armin Rigo's essay entitled Psyco, the Python Specializing Compiler (see Resources).

In regular Python, the eval_frame() function is the inner loop of the Python interpreter. Basically, the eval_frame() function looks at the current byte-code in an execution context, and switches control out to a function appropriate for implementing that byte-code. The specifics of what this support function will do depend, in general, upon that states of various Python objects held in memory. To make it simple, adding the Python objects "2" and "3" produces a different result than adding the objects "5" and "6". But both operations are dispatched in a similar way.

Psyco replaces the eval_frame() function with a compound evaluation unit. There are several ways that Psyco is able to improve upon what Python does. In the first place, Psyco compiles operations to somewhat opitimized machine code; in itself this produces only slight improvements, since what the machine code needs to accomplish is the same as what Python's dispatched functions do. Moreover, what is "specialized" in Psyco compilation is more than the choice of Python byte-codes, Psyco also specializes over variable values that are known in execution contexts. For example, in code like the below, the variable x is knowable for the duration of the loop:

x = 5
l = []
for i in range(1000):
    l.append(x*i)

An optimized version of this code need not multiply each i by "the content of the x variable/object"--it is less expensive to simply multiply each i by 5, saving a lookup/dereference.

Aside from creating i386 specific codes for small operations, Psyco caches this compiled machine code for later reuse. If Psyco is able to recognize that a particular operation is the same as something that was performed (and "specialized") earlier, it can rely on this cached code, rather than need to recompile the segment. This saves a bit more time.

The real savings in Psyco, however, relates to Psyco's categorization of operations into three different levels. For Psyco, there are "run-time", "compile-time" and "virtual-time" variable. Psyco promotes and demotes variables between the levels as needed. Run-time variables are simply the raw byte-codes and object structures that the regular Python interpreter handles. Compile-time variables are represented in machine registers and directly accessed memory locations, once operations have been compiled by Psyco into machine code.

The most interesting level is virtual-time variables. A Python variable is, internally, a complete structure, with lots of members--even when the object only represents an integer, for example. Psyco virtual time variables represent Python objects that could potentially be built if the need arose, but whose details are omitted until such time. For example, consider an assignment like:

x = 15 * (14 + (13 - (12 / 11)))

Standard Python builds and destroys a number of objects to compute this value. An entire integer object is built to hold the value of (12/11); then a value is pulled out of the temporary object's structure, and used to compute a new temporary object (13-PyInt). Psyco skips the objects, and just computes the values--knowing that an object can be created "if needed" from the value.

Using Psyco

Explaining Psyco is relatively difficult. Using Psyco is far easier. Basically, all there is to it is telling the Psyco module which which functions/methods to "specialize." No code changes need be made to any of your Python functions and classes themselves.

There are a couple approaches to specifying what Psyco should do. The "shotgun" approach is to enable Psyco just-in-time operation everywhere. To do that, put the following lines at the top of your module:

import psyco ; psyco.jit()
from psyco.classes import *

The first line tells Psyco to do its magic on all global functions. The second line--in Python 2.2 and above--tells Psyco to do the same with class methods. To target Psyco's behavior a bit more precisely, you can use the commands:

psyco.bind(somefunc)          # or method, class
newname = psyco.proxy(func)

The second form leaves func as an standard Python function, but optimizes calls involving newname. In almost all cases other than testing and debugging, the psyco.bind() form is what you will use.

The Performance Of Psyco

As magic as Psyco is, using it still requires a little of thought and testing. The main thing to understand is that Psyco is useful for handling blocks that loop many times, and knows how to optimize operations involving integers and floating point numbers. By the way, Rigo's excellent, but year old, introduction to Psyco states that Psyco only optimizes integers--that is no longer true, floats benefit also. For non-looping functions, and for operations on other types of objects, Psyco mostly just adds overhead for its analysis and internal-compilation. Moreover, for applications with large numbers of functions and classes, enabling Psyco application-wide adds a large burden in machine-code compilation and memory-usage for this caching. It is far better to selectively bind those functions that can benefit most from Psyco's optimizations.

I started my testing in a completely naive fashion. I simply considered what application I have run recently that I would not mind speeding up. The first example that came to mind was a text-manipulation program I use to convert drafts of my forthcoming book (Text Processing in Python) to LaTeX format. This application uses some string methods, some regular expressions, and some program logic driven mostly by regular expression and string matches. It is actually a terrible candidate for Psyco--but I use it, so I tried it.

First pass, all I did was add psyco.jit() to the top of my script. Painless enough. Unfortunately, the results were (expectedly) disappointing. Where the script initially took about 8.5 seconds to run, after Psyco "speedup" it ran in about 12 seconds. Not so good. Thought I: the just-in-time compilation probably has some startup overhead that swamps the running time. So next thing I tried was processing a much larger input file (consisting of multiple copies of the original one). This gave the very limited success of reducing running time from about 120 seconds to 110 seconds. The speedup was consistent across several runs, but fairly insignificant either way.

Second pass with my--still poor--text processing candidate. Instead of adding a blanket psyco.jit() call, I added only the line psyco.bind(main), since the main() function does loop a number of times (but only makes minimal use of integer arithmetic). The results here were nominally better. This approach shaved a few tenths of a second of the normal running time, and a few seconds off the large-input version. But still nothing spectacular (but also no harm done).

For a more relevant test of Psyco, I dug up some neural network code that I had written about in an earlier article (see Resources). This "code_recognizer" application can be trained to recognize the likely distribution of different ASCII values in different programming languages. Potentially something like this could be useful in guessing file types (say of lost network packets); but the code is actually completely generic as to what it is trained on--it could learn to recognize faces, or sounds, or tidal patterns just as easily. In any case, "code_recognizer" is based on the Python library bpnn which is also included (in modified form) as a test case with the Psyco 0.4 distribution. The important thing to know about "code_recognizer" for this article is that it does a lot of looping floating point math, and it takes a long time to run. We have got a good candidate for Psyco to work on here.

After a little playing around, I established several details about how to use Psyco. For this application, with just a small number of classes and functions, it does not make too much difference whether you use just-in-time or targetted binding. But the best result, by a few percentage points, still comes about by selectively binding the best optimizable classes. More significantly, however, it is important to understand the scope of Psyco binding.

The code_recognizer.py script contains lines like:

from bpnn import NN
class NN2(NN):
    # customized output methods, math core inherited

That is, the interesting stuff from Psyco's point-of-view is in the class bpnn.NN. Adding either psyco.jit() or psyco.bind(NN2) to the code_recognizer.py script has little effect. To get Psyco to do the desired optimization, you need to either add psyco.bind(NN) to code_recognizer.py or add psyco.jit() to bpnn.py. Contrary to what you might assume, just-in-time does not happen when an instance is created, or methods run, but rather in the scope where the class is defined. In addition, binding descendent classes does not specialize their methods that are inherited from elsewhere.

Once the small details of proper Psyco binding are worked out, the resultant speedups are rather impressive. Using the same test cases and training regime the referenced article presented (500 training patterns, 1000 training iterations), neural net training time was reduced from about 2000 seconds to about 600 seconds--better than a 3x speedup. Reducing the iterations as low as 10 showed proportional speedups (but worthless neural net recognition)--as did intermediate numbers of iterations.

I find bringing running time down from more than 1/2 hour to about 10 minutes with two lines of new code to be quite interesting. This speedup is still probably less than the speed of a similar application in C--and it is certainly less than the 100x speedup that a few isolated Psyco test cases exhibit. But this application is fairly "real life" and the improvements are enough to be significant in many contexts.

Whither Psyco

Psyco currently does not perform any sort of internal statistics or profiling, and does only minimal optimization of generated machine code. Potentially, a later version might know how to target those Python operations that could actually benefit most, and discard cached machine code for non-optimizable sections. In addition, perhaps a future Psyco could decide to perform more extensive (but more costly) optimizations on heavily run operations. Such runtime analysis would be similar to what Sun's HotSpot technology does for Java; the fact that Java, unlike Python, has type-declarations is actually less significant than many people assume (but prior work in optimization of Self, Smalltalk, Lisp, and Scheme make this point also).

Although I suspect it will never actually happen, it would be exciting to have Psyco-type technology integrated into some future version of Python itself. A few lines for imports and bindings is not much to do, but letting Python just inherently run much faster would be even more seamless. We will see.

Resources

Further information about Psyco can be found at the below URL. Armin Rigo's essay entitled Psyco, the Python Specializing Compiler is particularly useful in undestanding the theory and guts of Psyco:

http://homepages.ulb.ac.be/~arigo/psyco/

The Simplified Wrapper and Interface Generator (SWIG) is a very widely--perhaps predominantly--used tool for writing C/C++ modules for Python and other "scripting" languages. Information about SWIG can be found at:

http://www.swig.org

Greg Ewing has created the language Pyrex, which is used for writing Python extension modules. The idea behind Pyrex is to define a language that looks very close to Python itself, and that allows a mixture of Python and C datatypes to be combined, but which is ultimately transformed and compiled into a Python C-extension. Pyrex information (and the system itself) can be found at:

http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

John Max Skaller's Vyper language was intended to be an enhanced Python, implemented in OCaml. One upshot that was hoped for in the project was compilation to the same machine code OCaml generates, which is generally comparable with the speed of C. Unfortunately, Vyper is a dead project, and a compiling version was never completed. Back when the project was alive, I interviewed Skaller about the project:

http://www-106.ibm.com/developerworks/library/l-pyth7.html

I wrote with Andrew Blais "An Introduction to Neural Networks for IBM developerWorks. In that article, we provided some code based on Neil Schemenauer's Python module bpnn. The current article utilizes that neural network code to demonstrate Psyco's capabilities. For background on the code example--and on neural networks generally--please see:

http://www-106.ibm.com/developerworks/library/l-neural/

The bpnn module is included in the current Psyco distribution, in modified form, as a test case. The original module can be found at:

http://www.enme.ucalgary.ca/~nascheme/python/bpnn.py

About The Author

Picture of Author David Mertz' failures as a hunter, fisherman, and shepherd have led him to his life of critical criticism. Tomorrow he may try something else. David may be reached at [email protected]; his life pored over athttp://gnosis.cx/publish/. Suggestions and recommendations on this, past, or future, columns are welcomed.