CHARMING 2 PYTHON #B14: Numerical Python
Working with the [numeric] and [numarray] packages

David Mertz, Ph.D.
Idempotentate, Gnosis Software, Inc.
September, 2003

    Numerical Python (often called NumPy) is a widely used extension
    library for fast operations on fixed-type arrays, of any
    dimensionality, in Python. Since the underlying code is
    well-optimized C, any speed limitations of Python's interpreter
    usually go away when major operations are performed in NumPy calls.
    As successful as NumPy has been, its developers have decided to
    supercede NumPy with a new module called Numarray that is mostly,
    but not quite entirely, compatible with NumPy. This installment
    looks both at the general features of NumPy, and at the specific
    improvements forthcoming with Numarray.

INTRODUCTION
------------------------------------------------------------------------

  The first thing to know about the Numerical Python package is that it
  doesn't let you do anything you cannot already do with standard
  Python.  What it does, is let you do many of the same things -a heck
  of a lot faster-.  Actually, that's not quite all; there are a number
  of operations on arrays that are much more elegant to express in
  [numeric] or [numarray] than they are with standard Python datatypes
  and syntax.  But mostly it is the impressive speed that draws users to
  Numerical Python.

  In essence, Numerical Python just gives you a new datatype, the array.
  Numerical arrays contain only elements of a homogeneous datatype, in
  contrast to to lists, tuples, and dictionaries which may contain
  heterogeneous elements. The other useful thing about Numerical arrays
  is that they may be multi-dimensional--but the dimensionality of
  arrays is a bit different that the simple nestability of lists.
  Numerical Python draws on the experience of programmers--particularly
  those from scientific backgrounds--who have abstracted the best
  features of arrays languages like APL, FORTRAN, MATLAB, and S, and
  created arrays whose shape and dimensionality is easily changeable.
  Back to this soon.

  Operations on arrays in Numerical Python are performed -elementwise-.
  Even though two dimensional arrays are similar to matrices from linear
  algebra, operations (like multiply) have nothing to do with the
  operations in linear algebra (like matrix multiplication).

  Let us look at a quick example of all these points.  In plain Python,
  you could create a "2-D list" with, e.g.:

      >>> pyarr = [[1,2,3],
      ...          [4,5,6],
      ...          [7,8,9]]
      >>> print pyarr
      [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
      >>> pyarr[1][1] = 0
      >>> print pyarr
      [[1, 2, 3], [4, 0, 6], [7, 8, 9]]

  Good enough, but about all you can do with this structure is set and
  retrieve elements by individual (multi-dimesional) index.  In
  contrast, a Numerical array is a bit more flexible:

      >>> from numarray import *
      >>> numarr = array(pyarr)
      >>> print numarr
      [[1 2 3]
       [4 0 6]
       [7 8 9]]

  Not much has changed, but what about operations?

      >>> numarr2 = numarr * 2
      >>> print numarr2
      [[ 2  4  6]
       [ 8  0 12]
       [14 16 18]]
      >>> print numarr2 + numarr
      [[ 3  6  9]
       [12  0 18]
       [21 24 27]]

  And changing the shape:

      >>> numarr2.shape = (9,)
      >>> print numarr2
      [ 2  4  6  8  0 12 14 16 18]

CHANGES BETWEEN NUMERIC AND NUMARRAY:
------------------------------------------------------------------------

  For the most part, the newer package [numarray] is API compatible
  with the earlier [numeric].  However, experience using [numeric] by
  many users prompted developers to make a few incompatible
  improvements.  Instead of breaking the any application that relies on
  [numeric], developers launched a new project called [numarray].  As of
  this writing, [numarray] lacks a few features in [numeric], but
  implementation of most is planned.

  Some of the improvements for [numarray] are:

  (1) Element types are arranged in a hierarchical class structure to
  support isinstance() checks. [numeric] used only character type codes
  when specifying data types (but initializers in [numarray] can still
  accept the old character codes).

  (2) The type coercion rules have changed to preserve types in arrays
  (more often), rather than converting to the type of a Python scalar.

  (3) Added array attributes (not just getters and setters).

  (4) More flexible exception handling.

  New users probably do not to worry about any of these changes; it is
  best to start out with [numarray] at this point, in most cases.

TIMING EXAMPLE
------------------------------------------------------------------------

  Let us try to get a sense of just how much faster operations in
  Numerical Python can be than standard Python.  As a toy task, we will
  create a sequence of numbers, then double each one.  First, a few
  variants on standard Python approaches:

      def timer(fun, n, comment=""):
          from time import clock
          start = clock()
          print comment, len(fun(n)), "elements",
          print "in %.2f seconds" % (clock()-start)
      def double1(n): return map(lambda n: 2*n, xrange(n))
      timer(double1, 5000000, "Running map() on xrange iterator:")
      def double2(n): return [2*n for n in xrange(n)]
      timer(double2, 5000000, "Running listcomp on xrange iter: ")
      def double3(n):
          double = []
          for n in xrange(n):
              double.append(2*n)
          return double
      timer(double3, 5000000, "Building new list from iterator: ")

  We can check whether any speed differences exist between 'map()', list
  comprehensions, and traditional loops.  What about standard module
  [array], which also requires homogeneous element types, perhaps it
  will run faster:

      import array
      def double4(n): return [2*n for n in array.array('i',range(n))]
      timer(double4, 5000000, "Running listcomp on array.array: ")

  Finally, let us look at how [numarray] does.  As an extra check, we
  should see whether we still benefit if it is necessary to convert an
  array back into a standard list:

      from numarray import *
      def double5(n): return 2*arange(n)
      timer(double5, 5000000, "Numarray scalar multiplication:  ")
      def double6(n): return (2*arange(n)).tolist()
      timer(double6, 5000000, "Numarray mult, returning list:   ")

  Now running it:

      $ python2.3 timing.py
      Running map() on xrange iterator: 5000000 elements in 13.61 seconds
      Running listcomp on xrange iter:  5000000 elements in 16.46 seconds
      Building new list from iterator:  5000000 elements in 20.13 seconds
      Running listcomp on array.array:  5000000 elements in 25.58 seconds
      Numarray scalar multiplication:   5000000 elements in 0.61 seconds
      Numarray mult, returning list:    5000000 elements in 3.70 seconds

  The minor differences between techniques for working with lists is
  perhaps worth noticing, as is the misstep in trying standard module
  [array]. But the general moral is that [numarray] performs the
  operation in less than 1/20th the time. Coercing back to a standard
  list loses a lot of that gain.

  Do not put too much weight in such a simple comparison, but the
  speedup is probably about typical.  For big scientific calculations,
  moving from months to days--or days to hours--become quite worthwhile.

MODELING SYSTEMS
------------------------------------------------------------------------

  The typical use case for Numerical Python is in scientific modeling,
  or perhaps in related areas like graphic manipulations and
  convolutions or signal processing. To illustrate a number of features
  of [numarray], I will use a moderately realistic problem. Suppose you
  have a 3D physical space over which a quantity varies. Abstractly, any
  parametric space, with any number of dimensions, is equally applicable
  to [numarray]. But it is familiar and easy to imagine, for example, a
  room in which temperature varies from point to point. As winter comes
  to my New England home, the example starts to seem germane.

  While this point is perhaps obvious, it is worth mentioning
  explicitly.  My below examples work with smallish arrays because it is
  easier to illustrate with them.  But [numarray] remains fast for
  arrays with millions of elements, rather than merely dozens; the
  former are probably more common for real scientific models.

  As a start, let us create the "room".  There are a number of ways to
  do this--the most general is with the callable 'array()'.  Using that,
  we can generate Numerical arrays with a variety of initialization
  parameters, including initial data from any Python sequences.  But for
  our room, the convenience function 'zeros()' produces a nice--evenly
  chilly--room:

      >>> from numarray import *
      >>> room = zeros((4,3,5),Float)
      >>> print room
      [[[ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]]

       [[ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]]

       [[ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]]

       [[ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]
        [ 0.  0.  0.  0.  0.]]]

  Each 2-D "matrix" printed represents on horizontal slice of the
  3-D room--we will call it top-to-bottom.  First thing, let us raise
  the whole room temperature to a comfortable 70 degree Farenheit:

      >>> room += 70
      >>> print room
      [[[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]]

  There is a very important difference between Numerical arrays and
  Python lists to keep in mind for the rest of our operations. When you
  take slices of arrays--and we will see that slicing is quite flexible
  and powerful for multiple dimensions--what you get is not a copy, but
  a "view." There are multiple ways to point to the same data. Let us
  take a look. Suppose our room has a draft that comes in and cools the
  floor by four degrees:

      >>> floor = room[3]
      >>> floor -= 4
      >>> print room
      [[[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 66.  66.  66.  66.  66.]
        [ 66.  66.  66.  66.  66.]
        [ 66.  66.  66.  66.  66.]]]

  In contrast, the fireplace on the north wall heats each position next
  to it by 8 degrees, while itself being an ambient 90 degrees:

      >>> north = room[:,0]
      >>> near_fireplace = north[2:4,2:5]
      >>> near_fireplace += 8
      >>> north[3,2] = 90  # the fireplace cell itself
      >>> print room
      [[[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 70.  78.  78.  78.  70.]
        [ 70.  70.  70.  70.  70.]
        [ 70.  70.  70.  70.  70.]]

       [[ 66.  74.  90.  74.  66.]
        [ 66.  66.  66.  66.  66.]
        [ 66.  66.  66.  66.  66.]]]

  Here we get into some tricker indexing, where slices can be indicated
  along multiple dimensions.  These views are nice to keep around.  For
  example, you might wonder what the current temperatures are over the
  north wall:

      >>> print north
      [[ 70.  70.  70.  70.  70.]
       [ 70.  70.  70.  70.  70.]
       [ 70.  78.  78.  78.  70.]
       [ 66.  74.  90.  74.  66.]]

MORE OPERATIONS
------------------------------------------------------------------------

  This introductory look will only a small part of all the convenience
  functions and array methods/attributes that are available in
  [numarray].  My hope is only to give reader a general feel; the
  [numarray] documentation is quite excellent.

  Now that our room has been subject to various temperature fluctuation,
  we might want to judge its overall state.  For example, what is the
  current average temperature in the room:

      >>> add.reduce(room.flat)/len(room.flat)
      70.066666666666663

  This merits a bit of explanation. All the operations you can perform
  on arrays have corresponding -Universal Functions- (ufuncs).  So where
  we had done something like 'floor-= 4' in the above code, we could
  have equally well used 'subtract(floor,4,floor)'.  With three
  arguments, to 'substract()', the operation is performed in-place. You
  could also create a copy of 'floor' using 'floor=subtract(floor,4)';
  but this is probably not what you want, since the change will be in a
  new array, not a subsection of 'room'.

  However, ufuncs are not merely functions. They are callable objects
  that also have some methods of their own: '.reduce()' is probably the
  most useful of these. It works just like the Python built-in function
  'reduce()', with each operation being the underlying ufunc (the
  methods are much faster though when applied to Numerical arrays). In
  other words 'add.reduce()' means 'sum()' and 'multiply.reduce()' means
  'product()'.(those shortcut names are defined too).

  Before you can sum the room cell temperatures, you need to take a
  flat, 1-D, view of the data.  Otherwise, you add along the primary
  dimension, producing an new array of reduced dimension, e.g.:

      >>> add.reduce(room)
      array([[ 276.,  292.,  308.,  292.,  276.],
             [ 276.,  276.,  276.,  276.,  276.],
             [ 276.,  276.,  276.,  276.,  276.]])

  Such dimensional summary can be useful, but it is not what we want in
  this case.

  Since we are modeling a physical system, let us add a bit more
  realism.  Rooms have air micro-currents in them that mix up
  temperature.  We might model that by supposing that every once in a
  while, a cell adjusts to the temperatures around it:

      >>> def equalize(room):
      ...     z,y,x = map(randint, (1,1,1), room.shape)
      ...     zmin,ymin,xmin = maximum([z-2,y-2,x-2],[0,0,0]).tolist()
      ...     zmax,ymax,xmax = [z+1,y+1,x+1]
      ...     region = room[zmin:zmax,ymin:ymax,xmin:xmax].copy()
      ...     room[z-1,y-1,x-1] = sum(region.flat)/len(region.flat)
      ...     return room

  This model is a bit unrealistic, of course: cells pick up on the
  temperature around them, but do not impart a differential back to
  their neighbors. Still, let us look at what it does. First we pick a
  random cell--or actually we pick one more than the cell itself in each
  dimension, since '.shape' gives us the length rather than the maximum
  index. The value 'zmin' and friends make sure we only index down to
  zero, not in negative ranges; 'zmax' and friends are not really
  needed, since indexing past the size of an array dimension is just
  treated as the maximum value (much as with Python lists).

  Next thing, we need to define the region of adjacent cells.  Given our
  small room dimensions, picking a cell at a room surface, edge, or
  corner is rather common--the 'region' of a cell is likely to be
  smaller than the maximum 27-elements of a 3x3x3 subsection.  That is
  not a problem, we just need to be sure to use the right denominator
  for our average.  This new average temperature is assigned to the
  randomly chosen cell.

  You can perform as many equalization events as you wish to in your
  model.  Each call just adjusts one cell.  Asymptotically, a lot of
  calls will lead to equalization of the temperature in parts of the
  room.  Even though arrays are mutable, the function 'equalize()'
  returns its array.  This is useful if you only want to equalize a
  -copy- of the model:

      >>> print equalize(room.copy())
      [[[ 70.        70.        70.        70.        70.      ]
        [ 70.        70.        70.        70.        70.      ]
        [ 70.        70.        70.        70.        70.      ]]

       [[ 70.        70.        71.333333  70.        70.      ]
        [ 70.        70.        70.        70.        70.      ]
        [ 70.        70.        70.        70.        70.      ]]

       [[ 70.        78.        78.        78.        70.      ]
        [ 70.        70.        70.        70.        70.      ]
        [ 70.        70.        70.        70.        70.      ]]

       [[ 66.        74.        90.        74.        66.      ]
        [ 66.        66.        66.        66.        66.      ]
        [ 66.        66.        66.        68.        66.      ]]]

CONCLUSION
------------------------------------------------------------------------

  This introduction touched on only a bit of what [numarry] can do.  The
  full capabilities are quite fancy.  For example, you can populate
  arrays based on populating functions, which is likely to be useful for
  physical models.  You can specify subsections of arrays not only with
  slices, but also with index arrays--this not only lets you operate on
  non-contiguous segments of an array, it also lets you--via the
  'take()' function, redimension and reform arrays in interesting ways.
  Moreover, for the most part, I have presented only operations between
  scalars and arrays; you can also perform operations between arrays,
  including ones of different dimensionality.  There is a lot to it, but
  the API manages to remain fairly intuitive throughout.

  I encourage readers to install [numarray] and/or [numeric] on their
  own systems.  It is not hard to get started, and the fast
  manipulations provided have a surprisingly large area of
  application--often where you would not, at first, expect it.

RESOURCES
------------------------------------------------------------------------

  The Numerical Python homepage is at the below URL.  This has good
  links to documentation on both [numarray] and [numeric].

    http://www.pfdubois.com/numpy/

  The project itself is hosted by Sourceforge, and you can find the
  Numerical Python project page at:

    http://sourceforge.net/projects/numpy

ABOUT THE AUTHOR
------------------------------------------------------------------------

  {Picture of Author: http://gnosis.cx/cgi-bin/img_dqm.cgi}
  While David Mertz also likes laziness and impatience, this
  installment is about hubris.  David may be reached at
  mertz@gnosis.cx; his life pored over at http://gnosis.cx/publish/.
  Suggestions and recommendations on this, past, or future, columns
  are welcomed.  Check out his book, _Text Processing in Python_ at
  <http://gnosis.cx/TPiP>.