THIS IS ENDMATTER
APPENDIX -- A Selective and Impressionistic Short Review of Python
-------------------------------------------------------------------
A reader who is coming to Python for the first time would be well
served reading Guido van Rossum's _Python Tutorial_, which can be
downloaded from , or picking up one of the
several excellent books devoted to teaching Python to novices. As
indicated in the Preface, the audience of this book is a bit
different.
The above said, some readers of this book might use Python only
infrequently, or not have used Python for a while, or may be
sufficiently versed in numerous other programming languages, that
a quick review on Python constructs suffices for understanding.
This appendix will briefly mention each major element of the
Python language itself, but will not address any libraries (even
standard and ubiquitous ones that may be discussed in the main
chapters). Not all fine points of syntax and semantics will be
covered here, either. This review, however, should suffice for a
reader to understand all the examples in this book.
Even readers who are familiar with Python might enjoy skimming
this review. The focus and spin of this summary are a bit
different from most introductions. I believe that the way I
categorize and explain a number of language features can
provide a moderately novel--but equally accurate--perspective
on the Python language. Ideally, a Python programmer will come
away from this review with a few new insights on the familiar
constructs she uses every day. This appendix does not shy
away from using some abstract terms from computer science--if
a particular term is not familiar to you, you will not lose
much by skipping over the sentence it occurs in; some of these
terms are explained briefly in the Glossary.
SECTION -- What Kind of Language is Python?
--------------------------------------------------------------------
Python is a byte-code compiled programming language that supports
multiple programming paradigms. Python is sometimes called an
interpreted and/or scripting language because no separate
compilation step is required to run a Python program; in more
precise terms, Python uses a virtual machine (much like Java or
Smalltalk) to run machine-abstracted instructions. In most
situations a byte-code compiled version of an application is
cached to speed future runs, but wherever necessary compilation
is performed "behind the scenes."
In the broadest terms, Python is an imperative programming
language, rather than a declarative (functional or logical) one.
Python is dynamically and strongly typed, with very late binding
compared to most languages. In addition, Python is an
object-oriented language with strong introspective facilities,
and one that generally relies on conventions rather than
enforcement mechanisms to control access and visibility of names.
Despite its object-oriented core, much of the syntax of Python is
designed to allow a convenient procedural style that masks the
underlying OOP mechanisms. Although Python allows basic
functional programming (FP) techniques, side effects are the
norm, evaluation is always strict, and no compiler optimization
is performed for tail recursion (nor on almost any other
construct).
Python has a small set of reserved words, delimits blocks and
structure based on indentation only, has a fairly rich collection
of built-in data structures, and is generally both terse and
readable compared to other programming languages. Much of the
strength of Python lies in its standard library and in a flexible
system of importable modules and packages.
SECTION -- Namespaces and Bindings
--------------------------------------------------------------------
The central concept in Python programming is that of a namespace.
Each context (i.e., scope) in a Python program has available to
it a hierarchically organized collection of namespaces; each
namespace contains a set of names, and each name is bound to an
object. In older versions of Python, namespaces were arranged
according to the "three-scope rule" (builtin/global/local), but
Python version 2.1 and later add lexically nested scoping. In
most cases you do not need to worry about this subtlety, and
scoping works the way you would expect (the special cases that
prompted the addition of lexical scoping are mostly ones with
nested functions and/or classes).
There are quite a few ways of binding a name to an object
within the current namespace/scope and/or within some other
scope. These various ways are listed below.
TOPIC -- Assignment and Dereferencing
--------------------------------------------------------------------
A Python statement like 'x=37' or 'y="foo"' does a few things. If
an object--e.g., '37' or '"foo"'--does not exist, Python creates
one. If such an object -does- exist, Python locates it. Next, the
name 'x' or 'y' is added to the current namespace, if it does not
exist already, and that name is bound to the corresponding
object. If a name already exists in the current namespace, it is
re-bound. Multiple names, perhaps in multiple scopes/namespaces,
can be bound to the same object.
A simple assignment statement binds a name into the current
namespace, unless that name has been declared as global. A
name declared as global is bound to the global (module-level)
namespace instead. A qualified name used on the left of an
assignment statement binds a name into a specified
namespace--either to the attributes of an object, or to the
namespace of a module/package, for example:
>>> x = "foo" # bind 'x' in global namespace
>>> def myfunc(): # bind 'myfunc' in global namespace
... global x, y # specify namespace for 'x', 'y'
... x = 1 # rebind global 'x' to 1 object
... y = 2 # create global name 'y' and 2 object
... z = 3 # create local name 'z' and 3 object
...
>>> import package.module # bind name 'package.module'
>>> package.module.w = 4 # bind 'w' in namespace package.module
>>> from mymod import obj # bind object 'obj' to global namespace
>>> obj.attr = 5 # bind name 'attr' to object 'obj'
Whenever a (possibly qualified) name occurs on the right side of
an assignment, or on a line by itself, the name is dereferenced to
the object itself. If a name has not been bound inside some
accessible scope, it cannot be dereferenced; attempting to do so
raises a 'NameError' exception. If the name is followed by left
and right parentheses (possibly with comma-separated expressions
between them), the object is invoked/called after it is
dereferenced. Exactly what happens upon invocation can be
controlled and overridden for Python objects; but in general,
invoking a function or method runs some code, and invoking a
class creates an instance. For example:
>>> pkg.subpkg.func() # invoke a function from a namespace
>>> x = y # deref 'y' and bind same object to 'x'
TOPIC -- Function and Class Definitions
--------------------------------------------------------------------
Declaring a function or a class is simply the preferred way of
describing an object and binding it to a name. But the 'def' and
'class' declarations are "deep down" just types of assignments.
In the case of functions, the `lambda` operator can also be used
on the right of an assignment to bind an "anonymous" function to
a name. There is no equally direct technique for classes, but
their declaration is still similar in effect:
>>> add1 = lambda x,y: x+y # bind 'add1' to function in global ns
>>> def add2(x, y): # bind 'add2' to function in global ns
... return x+y
...
>>> class Klass: # bind 'Klass' to class object
... def meth1(self): # bind 'meth1' to method in 'Klass' ns
... return 'Myself'
TOPIC -- 'import' Statements
--------------------------------------------------------------------
Importing, or importing -from-, a module or a package adds or
modifies bindings in the current namespace. The 'import'
statement has two forms, each with a bit different effect.
Statements of the forms:
>>> import modname
>>> import pkg.subpkg.modname
>>> import pkg.modname as othername
add a new module object to the current namespace. These
module objects themselves define namespaces that you can
bind values in or utilize objects within.
Statements of the forms:
>>> from modname import foo
>>> from pkg.subpkg.modname import foo as bar
...instead add the names 'foo' or 'bar' to the current namespace.
In any of these forms of 'import', any statements in the imported
module are executed--the difference between the forms is simply
the effect upon namespaces.
There is one more special form of the 'import' statement; for
example:
>>> from modname import *
The asterisk in this form is not a generalized glob or regular
expression pattern, it is a special syntactic form. "Import star"
imports every name in a module namespace into the current
namespace (except those named with a leading underscore, which
can still be explicitly imported if needed). Use of this form is
somewhat discouraged because it risks adding names to the current
namespace that you do not explicitly request and that may rebind
existing names.
TOPIC -- 'for' Statements
--------------------------------------------------------------------
Although 'for' is a looping construct, the way it works is by
binding successive elements of an iterable object to a name (in
the current namespace). The following constructs are (almost)
equivalent:
>>> for x in somelist: # repeated binding with 'for'
... print x
...
>>> ndx = 0 # rebinds 'ndx' if it was defined
>>> while 1: # repeated binding in 'while'
... x = somelist[ndx]
... print x
... ndx = ndx+1
... if ndx >= len(somelist):
... del ndx
... break
TOPIC -- 'except' Statements
--------------------------------------------------------------------
The 'except' statement can optionally bind a name to an exception
argument:
>>> try:
... raise "ThisError", "some message"
... except "ThisError", x: # Bind 'x' to exception argument
... print x
...
some message
SECTION -- Datatypes
--------------------------------------------------------------------
Python has a rich collection of basic datatypes. All of Python's
collection types allow you to hold heterogeneous elements inside
them, including other collection types (with minor limitations).
It is straightforward, therefore, to build complex data
structures in Python.
Unlike many languages, Python datatypes come in two varieties:
mutable and immutable. All of the atomic datatypes are immutable,
as is the collection type 'tuple'. The collections 'list' and
'dict' are mutable, as are class instances. The mutability of a
datatype is simply a question of whether objects of that type can
be changed "in place"--an immutable object can only be created
and destroyed, but never altered during its existence. One upshot
of this distinction is that immutable objects may act as
dictionary keys, but mutable objects may not. Another upshot is
that when you want a data structure--especially a large one--that
will be modified frequently during program operation, you should
choose a mutable datatype (usually a list).
Most of the time, if you want to convert values between different
Python datatypes, an explicit conversion/encoding call is
required, but numeric types contain promotion rules to allow
numeric expressions over a mixture of types. The built-in
datatypes are listed below with discussions of each. The built-in
function `type()` can be used to check the datatype of an object.
TOPIC -- Simple Types
--------------------------------------------------------------------
bool
Python 2.3+ supports a Boolean datatype with the possible
values 'True' and 'False'. In earlier versions of Python,
these values are typically called '1' and '0'; even in
Python 2.3+, the Boolean values behave like numbers in
numeric contexts. Some earlier micro-releases of Python
(e.g., 2.2.1) include the -names- 'True' and 'False', but
not the Boolean datatype.
int
A signed integer in the range indicated by the register
size of the interpreter's CPU/OS platform. For most current
platforms, integers range from (2**31)-1 to negative
(2**31)-1. You can find the size on your platform by
examining `sys.maxint`. Integers are the bottom numeric
type in terms of promotions; nothing gets promoted -to- an
integer, but integers are sometimes promoted to other
numeric types. A float, long, or string may be explicitly
converted to an int using the `int()` function.
SEE ALSO, [int]
long
An (almost) unlimited size integral number. A long literal
is indicated by an integer followed by an 'l' or 'L' (e.g.,
'34L', '9876543210l'). In Python 2.2+, operations on ints
that overflow `sys.maxint` are automatically promoted to
longs. An int, float, or string may be explicitly
converted to a long using the `long()` function.
float
An IEEE754 floating point number. A literal floating point
number is distinguished from an int or long by containing a
decimal point and/or exponent notation (e.g., '1.0', '1e3',
'.453e-12', '37.'). A numeric expression that involves both
int/long types and float types promotes all component types
to floats before performing the computation. An int, long,
or string may be explicitly converted to a float using the
`float()` function.
SEE ALSO, [float]
complex
An object containing two floats, representing real and
imaginary components of a number. A numeric expression
that involves both int/long/float types and complex types
promotes all component types to complex before performing
the computation. There is no way to spell a literal
complex in Python, but an addition such as '1.1+2j' is the
usual way of computing a complex value. A 'j' or 'J'
following a float or int literal indicates an imaginary
number. An int, long, or string may be explicitly
converted to a complex using the `complex()` function. If
two float/int arguments are passed to `complex()`, the
second is the imaginary component of the constructed
number (e.g., 'complex(1.1,2)').
string
An immutable sequence of 8-bit character values. Unlike in
many programming languages, there is no "character" type
in Python, merely strings that happen to have length one.
String objects have a variety of methods to modify strings,
but such methods always return a new string object rather
than modify the initial object itself. The built-in
`chr()` function will return a length-one string whose
ordinal value is the passed integer. The `str()` function
will return a string representation of a passed in object.
For example:
>>> ord('a')
97
>>> chr(97)
'a'
>>> str(97)
'97'
SEE ALSO, [string]
unicode
An immutable sequence of Unicode characters. There is no
datatype for a single Unicode character, but unicode
strings of length-one contain a single character. Unicode
strings contain a similar collection of methods to string
objects, and like the latter, unicode methods return new
unicode objects rather than modify the initial object. See
Chapter 2 and Appendix C for additional discussion, of
Unicode.
TOPIC -- String Interpolation
--------------------------------------------------------------------
Literal strings and unicode strings may contain embedded format
codes. When a string contains format codes, values may be
-interpolated- into the string using the '%' operator and a
tuple or dictionary giving the values to substitute in.
Strings that contain format codes may follow either of two
patterns. The simpler pattern uses format codes with the syntax
'%[flags][len[.precision]]'. Interpolating a string with
format codes on this pattern requires '%' combination with a
tuple of matching length and content datatypes. If only one
value is being interpolated, you may give the bare item rather
than a tuple of length one. For example:
>>> "float %3.1f, int %+d, hex %06x" % (1.234, 1234, 1234)
'float 1.2, int +1234, hex 0004d2'
>>> '%e' % 1234
'1.234000e+03'
>>> '%e' % (1234,)
'1.234000e+03'
The (slightly) more complex pattern for format codes embeds a
name within the format code, which is then used as a string key
to an interpolation dictionary. The syntax of this pattern is
'%(key)[flags][len[.precision]]'. Interpolating a string
with this style of format codes requires '%' combination with a
dictionary that contains all the named keys, and whose
corresponding values contain acceptable datatypes. For example:
>>> dct = {'ratio':1.234, 'count':1234, 'offset':1234}
>>> "float %(ratio)3.1f, int %(count)+d, hex %(offset)06x" % dct
'float 1.2, int +1234, hex 0004d2'
You -may not- mix tuple interpolation and dictionary
interpolation within the same string.
I mentioned that datatypes must match format codes. Different
format codes accept a different range of datatypes, but the
rules are almost always what you would expect. Generally,
numeric data will be promoted or demoted as necessary, but
strings and complex types cannot be used for numbers.
One useful style of using dictionary interpolation is against
the global and/or local namespace dictionary. Regular
bound names defined in scope can be interpolated into strings.
>>> s = "float %(ratio)3.1f, int %(count)+d, hex %(offset)06x"
>>> ratio = 1.234
>>> count = 1234
>>> offset = 1234
>>> s % globals()
'float 1.2, int +1234, hex 0004d2'
If you want to look for names across scope, you can create an
ad hoc dictionary with both local and global names:
>>> vardct = {}
>>> vardct.update(globals())
>>> vardct.update(locals())
>>> interpolated = somestring % vardct
The flags for format codes consist of the following:
#*--------------- Format code flags ----------------------#
0 Pad to length with leading zeros
- Align the value to the left within its length
_ (space) Pad to length with leading spaces
+ Explicitly indicate the sign of positive values
When a length is included, it specifies the -minimum- length of
the interpolated formatting. Numbers that will not fit within
a length simply occupy more bytes than specified. When a
precision is included, the length of those digits to the right
of the decimal are included in the total length:
>>> '[%f]' % 1.234
'[1.234000]'
>>> '[%5f]' % 1.234
'[1.234000]'
>>> '[%.1f]' % 1.234
'[1.2]'
>>> '[%5.1f]' % 1.234
'[ 1.2]'
>>> '[%05.1f]' % 1.234
'[001.2]'
The formatting types consist of the following:
#*-------------- Format type codes -----------------------#
d Signed integer decimal
i Signed integer decimal
o Unsigned octal
u Unsigned decimal
x Lowercase unsigned hexadecimal
X Uppercase unsigned hexadecimal
e Lowercase exponential format floating point
E Uppercase exponential format floating point
f Floating point decimal format
g Floating point: exponential format if -4 < exp < precision
G Uppercase version of 'g'
c Single character: integer for chr(i) or length-one string
r Converts any Python object using repr()
s Converts any Python object using str()
% The '%' character, e.g.: '%%%d' % (1) --> '%1'
One more special format code style allows the use of a '*' in
place of a length. In this case, the interpolated tuple must
contain an extra element for the formatted length of each
format code, preceding the value to format. For example:
>>> "%0*d # %0*.2f" % (4, 123, 4, 1.23)
'0123 # 1.23'
>>> "%0*d # %0*.2f" % (6, 123, 6, 1.23)
'000123 # 001.23'
TOPIC -- Printing
--------------------------------------------------------------------
The least-sophisticated form of textual output in Python is
writing to open files. In particular, the STDOUT and STDERR
streams can be accessed using the pseudo-files `sys.stdout` and
`sys.stderr`. Writing to these is just like writing to any
other file; for example:
>>> import sys
>>> try:
... # some fragile action
... sys.stdout.write('result of action\n')
... except:
... sys.stderr.write('could not complete action\n')
...
result of action
You cannot seek within STDOUT or STDERR--generally you should
consider these as pure sequential outputs.
Writing to STDOUT and STDERR is fairly inflexible, and most of
the time the 'print' statement accomplishes the same purpose
more flexibly. In particular, methods like `sys.stdout.write()`
only accept a single string as an argument, while 'print' can
handle any number of arguments of any type. Each argument is
coerced to a string using the equivalent of 'repr(obj)'. For
example:
>>> print "Pi: %.3f" % 3.1415, 27+11, {3:4,1:2}, (1,2,3)
Pi: 3.142 38 {1: 2, 3: 4} (1, 2, 3)
Each argument to the 'print' statment is evaluated before it is
printed, just as when an argument is passed to a function. As a
consequence, the canonical representation of an object is
printed, rather than the exact form passed as an argument. In my
example, the dictionary prints in a different order than it was
defined in, and the spacing of the list and dictionary is
slightly different. String interpolation is also peformed and is
a very common means of defining an output format precisely.
There are a few things to watch for with the 'print' statement.
A space is printed between each argument to the statement. If
you want to print several objects without a separating space,
you will need to use string concatenation or string
interpolation to get the right result. For example:
>>> numerator, denominator = 3, 7
>>> print repr(numerator)+"/"+repr(denominator)
3/7
>>> print "%d/%d" % (numerator, denominator)
3/7
By default, a 'print' statement adds a linefeed to the end of
its output. You may eliminate the linefeed by adding a
trailing comma to the statement, but you still wind up with a
space added to the end:
>>> letlist = ('a','B','Z','r','w')
>>> for c in letlist: print c, # inserts spaces
...
a B Z r w
Assuming these spaces are unwanted, you must either use
`sys.stdout.write()` or otherwise calculate the space-free
string you want:
>>> for c in letlist+('\n',): # no spaces
... sys.stdout.write(c)
...
aBZrw
>>> print ''.join(letlist)
aBZrw
There is a special form of the 'print' statement that redirects
its output somewhere other than STDOUT. The 'print' statement
itself can be followed by two greater-than signs, then a
writable file-like object, then a comma, then the remainder of
the (printed) arguments. For example:
>>> print >> open('test','w'), "Pi: %.3f" % 3.1415, 27+11
>>> open('test').read()
'Pi: 3.142 38\n'
Some Python programmers (including your author) consider this
special form overly "noisy," but it -is- occassionally useful
for quick configuration of output destinations.
If you want a function that would do the same thing as a
'print' statement, the following one does so, but without any
facility to eliminate the trailing linefeed or redirect output:
#*--------- Functional version of print statement --------#
def print_func(*args):
import sys
sys.stdout.write(' '.join(map(repr,args))+'\n')
Readers could enhance this to add the missing capabilities, but
using 'print' as a statement is the clearest approach,
generally.
SEE ALSO, `sys.stderr`, `sys.stdout`
TOPIC -- Container Types
--------------------------------------------------------------------
tuple
An immutable sequence of (heterogeneous) objects. Being
immutable, the membership and length of a tuple cannot be
modified after creation. However, tuple elements and
subsequences can be accessed by subscripting and slicing,
and new tuples can be constructed from such elements and
slices. Tuples are similar to "records" in some other
programming languages.
The constructor syntax for a tuple is commas between listed
items; in many contexts, parentheses around a constructed
list are required to disambiguate a tuple for other
constructs such as function arguments, but it is the commas
not the parentheses that construct a tuple. Some examples:
>>> tup = 'spam','eggs','bacon','sausage'
>>> newtup = tup[1:3] + (1,2,3) + (tup[3],)
>>> newtup
('eggs', 'bacon', 1, 2, 3, 'sausage')
The function `tuple()` may also be used to construct a
tuple from another sequence type (either a list or custom
sequence type).
SEE ALSO, [tuple]
list
A mutable sequence of objects. Like a tuple, list elements
can be accessed by subscripting and slicing; unlike a
tuple, list methods and index and slice assignments can
modify the length and membership of a list object.
The constructor syntax for a list is surrounding square
braces. An empty list may be constructed with no objects
between the braces; a length-one list can contain simply an
object name; longer lists separate each element object with
commas. Indexing and slices, of course, also use square
braces, but the syntactic contexts are different in the
Python grammar (and common sense usually points out the
difference). Some examples:
>>> lst = ['spam', (1,2,3), 'eggs', 3.1415]
>>> lst[:2]
['spam', (1, 2, 3)]
The function `list()` may also be used to construct a
list from another sequence type (either a tuple or custom
sequence type).
SEE ALSO, [list]
dict
A mutable mapping between immutable keys and object values.
At most one entry in a dict exists for a given key; adding
the same key to a dictionary a second time overrides the
previous entry (much as with binding a name in a
namespace). Dicts are unordered, and entries are accessed
either by key as index; by creating lists of contained
objects using the methods '.keys()', '.values()', and
'.items()'; or--in recent Python versions--with the
'.popitem()' method. All the dict methods generate
contained objects in an unspecified order.
The constructor syntax for a dict is surrounding curly
brackets. An empty dict may be constructed with no objects
between the brackets. Each key/value pair entered into a
dict is separated by a colon, and successive pairs are
separated by commas. For example:
>>> dct = {1:2, 3.14:(1+2j), 'spam':'eggs'}
>>> dct['spam']
'eggs'
>>> dct['a'] = 'b' # add item to dict
>>> dct.items()
[('a', 'b'), (1, 2), ('spam', 'eggs'), (3.14, (1+2j))]
>>> dct.popitem()
('a', 'b')
>>> dct
{1: 2, 'spam': 'eggs', 3.14: (1+2j)}
In Python 2.2+, the function `dict()` may also be used to
construct a dict from a sequence of pairs or from a custom
mapping type. For example:
>>> d1 = dict([('a','b'), (1,2), ('spam','eggs')])
>>> d1
{'a': 'b', 1: 2, 'spam': 'eggs'}
>>> d2 = dict(zip([1,2,3],['a','b','c']))
>>> d2
{1: 'a', 2: 'b', 3: 'c'}
SEE ALSO, [dict]
sets.Set
Python 2.3+ includes a standard module that implements a
set datatype. For earlier Python versions, a number of
developers have created third-party implementations of
sets. If you have at least Python 2.2, you can download and
use the [sets] module from (or
browse the Python CVS)--you will need to add the definition
'True,False=1,0' to your local version, though.
A set is an unordered collection of hashable objects.
Unlike a list, no object can occur in a set more than once;
a set resembles a dict that has only keys but no values.
Sets utilize bitwise and Boolean syntax to perform basic
set-theoretic operations; a subset test does not have a
special syntactic form, instead using the '.issubset()' and
'.issuperset()' methods. You may also loop through set
members in an unspecified order. Some examples illustrate
the type:
>>> from sets import Set
>>> x = Set([1,2,3])
>>> y = Set((3,4,4,6,6,2)) # init with any seq
>>> print x, '//', y # make sure dups removed
Set([1, 2, 3]) // Set([2, 3, 4, 6])
>>> print x | y # union of sets
Set([1, 2, 3, 4, 6])
>>> print x & y # intersection of sets
Set([2, 3])
>>> print y-x # difference of sets
Set([4, 6])
>>> print x ^ y # symmetric difference
Set([1, 4, 6])
You can also check membership and iterate over set members:
>>> 4 in y # membership check
1
>>> x.issubset(y) # subset check
0
>>> for i in y:
... print i+10,
...
12 13 14 16
>>> from operator import add
>>> plus_ten = Set(map(add, y, [10]*len(y)))
>>> plus_ten
Set([16, 12, 13, 14])
`sets.Set` also supports in-place modification of sets;
`sets.ImmutableSet`, naturally, does not allow
modification.
>>> x = Set([1,2,3])
>>> x |= Set([4,5,6])
>>> x
Set([1, 2, 3, 4, 5, 6])
>>> x &= Set([4,5,6])
>>> x
Set([4, 5, 6])
>>> x ^= Set([4,5])
>>> x
Set([6])
TOPIC -- Compound Types
--------------------------------------------------------------------
class instance
A class instance defines a namespace, but this namespace's
main purpose is usually to act as a data container (but a
container that also knows how to perform actions; i.e., has
methods). A class instance (or any namespace) acts very
much like a dict in terms of creating a mapping between
names and values. Attributes of a class instance may be
set or modified using standard qualified names and may
also be set within class methods by qualifying with the
namespace of the first (implicit) method argument,
conventionally called 'self'. For example:
>>> class Klass:
... def setfoo(self, val):
... self.foo = val
...
>>> obj = Klass()
>>> obj.bar = 'BAR'
>>> obj.setfoo(['this','that','other'])
>>> obj.bar, obj.foo
('BAR', ['this', 'that', 'other'])
>>> obj.__dict__
{'foo': ['this', 'that', 'other'], 'bar': 'BAR'}
Instance attributes often dereference to other class
instances, thereby allowing hierarchically organized
namespace quantification to indicate a data structure.
Moreover, a number of "magic" methods named with leading
and trailing double-underscores provide optional syntactic
conveniences for working with instance data. The most
common of these magic methods is '.__init__()', which
initializes an instance (often utilizing arguments). For
example:
>>> class Klass2:
... def __init__(self, *args, **kw):
... self.listargs = args
... for key, val in kw.items():
... setattr(self, key, val)
...
>>> obj = Klass2(1, 2, 3, foo='FOO', bar=Klass2(baz='BAZ'))
>>> obj.bar.blam = 'BLAM'
>>> obj.listargs, obj.foo, obj.bar.baz, obj.bar.blam
((1, 2, 3), 'FOO', 'BAZ', 'BLAM')
There are quite a few additional "magic" methods that
Python classes may define. Many of these methods let class
instances behave more like basic datatypes (while still
maintaining special class behaviors). For example, the
'.__str__()' and '.__repr__()' methods control the string
representation of an instance; the '.__getitem__()' and
'.__setitem__()' methods allow indexed access to instance
data (either dict-like named indices, or list-like numbered
indices); methods like '.__add__()', '.__mul__()',
'.__pow__()', and '.__abs__()' allow instances to behave in
number-like ways. The _Python Reference Manual_ discusses
magic methods in detail.
In Python 2.2 and above, you can also let instances behave
more like basic datatypes by inheriting classes from these
built-in types. For example, suppose you need a datatype
whose "shape" contains both a mutable sequence of elements
and a '.foo' attribute. Two ways to define this datatype
are:
>>> class FooList(list): # works only in Python 2.2+
... def __init__(self, lst=[], foo=None):
... list.__init__(self, lst)
... self.foo = foo
...
>>> foolist = FooList([1,2,3], 'FOO')
>>> foolist[1], foolist.foo
(2, 'FOO')
>>> class OldFooList: # works in older Pythons
... def __init__(self, lst=[], foo=None):
... self._lst, self.foo = lst, foo
... def append(self, item):
... self._lst.append(item)
... def __getitem__(self, item):
... return self._lst[item]
... def __setitem__(self, item, val):
... self._lst[item] = val
... def __delitem__(self, item):
... del self._lst[item]
...
>>> foolst2 = OldFooList([1,2,3], 'FOO')
>>> foolst2[1], foolst2.foo
(2, 'FOO')
If you need more complex datatypes than the basic types, or even
than an instance whose class has magic methods, often these can
be constructed by using instances whose attributes are bound in
link-like fashion to other instances. Such bindings can be
constructed according to various topologies, including circular
ones (such as for modeling graphs). As a simple example, you
can construct a binary tree in Python using the following
node class:
>>> class Node:
... def __init__(self, left=None, value=None, right=None):
... self.left, self.value, self.right = left, value, right
... def __repr__(self):
... return self.value
...
>>> tree = Node(Node(value="Left Leaf"),
... "Tree Root",
... Node(left=Node(value="RightLeft Leaf"),
... right=Node(value="RightRight Leaf") ))
>>> tree,tree.left,tree.left.left,tree.right.left,tree.right.right
(Tree Root, Left Leaf, None, RightLeft Leaf, RightRight Leaf)
In practice, you would probably bind intermediate nodes to
names, in order to allow easy pruning and rearrangement.
SEE ALSO, [int], [float], [list], [string], [tuple],
[UserDict], [UserList], [UserString]
SECTION -- Flow Control
--------------------------------------------------------------------
Depending on how you count it, Python has about a half-dozen flow
control mechanisms, which is much simpler than most programming
languages. Fortunately, Python's collection of mechanisms is well
chosen, with a high--but not obsessively high--degree of
orthogonality between them.
From the point of view of this introduction, exception handling
is mostly one of Python's flow control techniques. In a language
like Java, an application is probably considered "happy" if it
does not throw any exceptions at all, but Python programmers find
exceptions less "exceptional"--a perfectly good design might exit
a block of code -only- when an exception is raised.
Two additional aspects of the Python language are not usually
introduced in terms of flow control, but nonetheless amount to
such when considered abstractly. Both functional programming
style operations on lists and Boolean shortcutting are, at the
heart, flow control constructs.
TOPIC -- 'if'/'then'/'else' Statements
--------------------------------------------------------------------
Choice between alternate code paths is generally performed with
the 'if' statement and its optional 'elif' and 'else' components.
An 'if' block is followed by zero or more 'elif' blocks; at the
end of the compound statement, zero or one 'else' blocks occur.
An 'if' statement is followed by a Boolean expression and a
colon. Each 'elif' is likewise followed by a Boolean expression
and colon. The 'else' statement, if it occurs, has no Boolean
expression after it, just a colon. Each statement introduces a
block containing one or more statements (indented on the
following lines or on the same line, after the colon).
Every expression in Python has a Boolean value, including every
bare object name or literal. Any empty container (list, dict,
tuple) is considered false; an empty string or unicode string is
false; the number 0 (of any numeric type) is false. As well, an
instance whose class defines a '.__nonzero__()' or '.__len__()'
method is false if these methods return a false value. Without
these special methods, every instance is true. Much of the time,
Boolean expressions consist of comparisons between objects, where
comparisons actually evaluate to the canonical objects "0" or
"1". Comparisons are '<', '>', '==', '>=', '<=', '<>', '!=',
'is', 'is not', 'in', and 'not in'. Sometimes the unary operator
'not' precedes such an expression.
Only one block in an "if/elif/else" compound statement is executed
during any pass--if multiple conditions hold, the first one that
evaluates as true is followed. For example:
>>> if 2+2 <= 4:
... print "Happy math"
...
Happy math
>>> x = 3
>>> if x > 4: print "More than 4"
... elif x > 3: print "More than 3"
... elif x > 2: print "More than 2"
... else: print "2 or less"
...
More than 2
>>> if isinstance(2, int):
... print "2 is an int" # 2.2+ test
... else:
... print "2 is not an int"
Python has no "switch" statement to compare one value with
multiple candidate matches. Occasionally, the repetition of
an expression being compared on multiple 'elif' lines looks
awkward. A "trick" in such a case is to use a dict as a
pseudo-switch. The following are equivalent, for example:
>>> if var.upper() == 'ONE': val = 1
... elif var.upper() == 'TWO': val = 2
... elif var.upper() == 'THREE': val = 3
... elif var.upper() == 'FOUR': val = 4
... else: val = 0
...
>>> switch = {'ONE':1, 'TWO':2, 'THREE':3, 'FOUR':4}
>>> val = switch.get(var.upper(), 0)
TOPIC -- Boolean Shortcutting
--------------------------------------------------------------------
The Boolean operators 'or' and 'and' are "lazy." That is, an
expression containing 'or' or 'and' evaluates only as far as it
needs to determine the overall value. Specifically, if the
first disjoin of an 'or' is true, the value of that disjoin
becomes the value of the expression, without evaluating the
rest; if the first conjoin of an 'and' is false, its value
likewise becomes the value of the whole expression.
Shortcutting is formally sufficient for switching and is
sometimes more readable and concise than "if/elif/else" blocks.
For example:
>>> if this: # 'if' compound statement
... result = this
... elif that:
... result = that
... else:
... result = 0
...
>>> result = this or that or 0 # boolean shortcutting
Compound shortcutting is also possible, but not necessarily
easy to read; for example:
>>> (cond1 and func1()) or (cond2 and func2()) or func3()
TOPIC -- 'for'/'continue'/'break' Statements
--------------------------------------------------------------------
The 'for' statement loops over the elements of a sequence. In
Python 2.2+, looping utilizes an iterator object (which
may not have a predetermined length)--but standard sequences
like lists, tuples, and strings are automatically transformed to
iterators in 'for' statements. In earlier Python versions, a
few special functions like 'xreadlines()' and 'xrange()' also
act as iterators.
Each time a 'for' statement loops, a sequence/iterator element is
bound to the loop variable. The loop variable may be a tuple with
named items, thereby creating bindings for multiple names in
each loop. For example:
>>> for x,y,z in [(1,2,3),(4,5,6),(7,8,9)]: print x, y, z, '*',
...
1 2 3 * 4 5 6 * 7 8 9 *
A particularly common idiom for operating on each item in a
dictionary is:
>>> for key,val in dct.items():
... print key, val, '*',
...
1 2 * 3 4 * 5 6 *
When you wish to loop through a block a certain number of
times, a common idiom is to use the 'range()' or 'xrange()'
built-in functions to create ad hoc sequences of the needed
length. For example:
>>> for _ in range(10):
... print "X", # '_' is not used in body
...
X X X X X X X X X X
However, if you find yourself binding over a range just to repeat
a block, this often indicates that you have not properly
understood the loop. Usually repetition is a way of operating on
a collection of related -things- that could instead be explicitly
bound in the loop, not just a need to do exactly the same thing
multiple times.
If the 'continue' statement occurs in a 'for' loop, the next loop
iteration proceeds without executing later lines in the block. If
the 'break' statement occurs in a 'for' loop, control passes past
the loop without executing later lines (except the 'finally'
block if the 'break' occurs in a 'try').
TOPIC -- 'map()', 'filter()', 'reduce()', and List Comprehensions
--------------------------------------------------------------------
Much like the 'for' statement, the built-in functions `map()`,
`filter()`, and `reduce()` perform actions based on a sequence of
items. Unlike a 'for' loop, these functions explicitly return a
value resulting from this application to each item. Each of these
three functional-programming style functions accepts a function
object as a first argument and sequence(s) as subsequent
argument(s).
The `map()` function returns a list of items of the same length
as the input sequence, where each item in the result is a
"transformation" of one item in the input. Where you
explicitly want such transformed items, use of `map()` is often
both more concise and clearer than an equivalent 'for' loop;
for example:
>>> nums = (1,2,3,4)
>>> str_nums = []
>>> for n in nums:
... str_nums.append(str(n))
...
>>> str_nums
['1', '2', '3', '4']
>>> str_nums = map(str, nums)
>>> str_nums
['1', '2', '3', '4']
If the function argument of `map()` accepts (or can accept)
multiple arguments, multiple sequences can be given as later
arguments. If such multiple sequences are of different lengths,
the shorter ones are padded with 'None' values. The special value
'None' may be given as the function argument, producing a
sequence of tuples of elements from the argument sequences.
>>> nums = (1,2,3,4)
>>> def add(x, y):
... if x is None: x=0
... if y is None: y=0
... return x+y
...
>>> map(add, nums, [5,5,5])
[6, 7, 8, 4]
>>> map(None, (1,2,3,4), [5,5,5])
[(1, 5), (2, 5), (3, 5), (4, None)]
The `filter()` function returns a list of those items in the
input sequence that satisfy a condition given by the function
argument. The function argument must accept one parameter,
and its return value is interpreted as a Boolean (in the usual
manner). For example
>>> nums = (1,2,3,4)
>>> odds = filter(lambda n: n%2, nums)
>>> odds
(1, 3)
Both `map()` and `filter()` can use function arguments that
have side effects, thereby making it possible--but not usually
desirable--to replace every 'for' loop with a `map()` or
`filter()` function. For example:
>>> for x in seq:
... # bunch of actions
... pass
...
>>> def actions(x):
... # same bunch of actions
... return 0
...
>>> filter(actions, seq)
[]
Some epicycles are needed for the scoping of block variables and
for 'break' and 'continue' statements. But as a general picture,
it is worth being aware of the formal equivalence between these
very different-seeming techniques.
The `reduce()` function takes as a function argument a function
with two parameters. In addition to a sequence second argument,
`reduce()` optionally accepts a third argument as an initializer.
For each item in the input sequence, `reduce()` combines the
previous aggregate result with the item, until the sequence is
exhausted. While `reduce()`--like `map()` and `filter()`--has a
loop-like effect of operating on every item in a sequence, its
main purpose is to create some sort of aggregation, tally, or
selection across indefinitely many items. For example:
>>> from operator import add
>>> sum = lambda seq: reduce(add, seq)
>>> sum([4,5,23,12])
44
>>> def tastes_better(x, y):
... # some complex comparison of x, y
... # either return x, or return y
... # ...
...
>>> foods = [spam, eggs, bacon, toast]
>>> favorite = reduce(tastes_better, foods)
List comprehensions (listcomps) are a syntactic form that was
introduced with Python 2.0. It is easiest to think of list
comprehensions as a sort of cross between for loops and the
`map()` or `filter()` functions. That is, like the functions,
listcomps are expressions that produce lists of items, based on
"input" sequences. But listcomps also use the keywords 'for' and
'if' that are familiar from statements. Moreover, it is typically
much easier to read a compound list comprehension expression than
it is to read corresponding nested `map()` and `filter()`
functions.
For example, consider the following small problem: You have a
list of numbers and a string of characters; you would like to
construct a list of all pairs that consist of a number from the
list and a character from the string, but only if the ASCII
ordinal is larger than the number. In traditional imperative
style, you might write:
>>> bigord_pairs = []
>>> for n in (95,100,105):
... for c in 'aei':
... if ord(c) > n:
... bigord_pairs.append((n,c))
...
>>> bigord_pairs
[(95, 'a'), (95, 'e'), (95, 'i'), (100, 'e'), (100, 'i')]
In a functional programming style you might write the nearly
unreadable:
>>> dupelms=lambda lst,n: reduce(lambda s,t:s+t,
... map(lambda l,n=n: [l]*n, lst))
>>> combine=lambda xs,ys: map(None,xs*len(ys), dupelms(ys,len(xs)))
>>> bigord_pairs=lambda ns,cs: filter(lambda (n,c):ord(c)>n,
... combine(ns,cs))
>>> bigord_pairs((95,100,105),'aei')
[(95, 'a'), (95, 'e'), (100, 'e'), (95, 'i'), (100, 'i')]
In defense of this FP approach, it has not -only- accomplished
the task at hand, but also provided the general combinatorial
function 'combine()' along the way. But the code is still
rather obfuscated.
List comprehensions let you write something that is both
concise and clear:
>>> [(n,c) for n in (95,100,105) for c in 'aei' if ord(c)>n]
[(95, 'a'), (95, 'e'), (95, 'i'), (100, 'e'), (100, 'i')]
As long as you have listcomps available, you hardly -need- a
general 'combine()' function, since it just amounts to
repeating the 'for' clause in a listcomp.
Slightly more formally, a list comprehension consists of the
following: (1) Surrounding square brackets (like a list
constructor, which it is). (2) An expression that usually, but
not by requirement, contains some names that get bound in the
'for' clauses. (3) One or more 'for' clauses that bind a name
repeatedly (just like a 'for' loop). (4) Zero or more 'if'
clauses that limit the results. Generally, but not by
requirement, the 'if' clauses contain some names that were
bound by the 'for' clauses.
List comprehensions may nest inside each other freely. Sometimes
a 'for' clause in a listcomp loops over a list that is defined by
another listcomp; once in a while a nested listcomp is even used
inside a listcomp's expression or 'if' clauses. However, it is
almost as easy to produce difficult-to-read code by excessively
nesting listcomps as it is by nesting `map()` and `filter()`
functions. Use caution and common sense about such nesting.
It is worth noting that list comprehensions are not as
referentially transparent as functional programming style
calls. Specifically, any names bound in 'for' clauses
remain bound in the enclosing scope (or global if the name is
so declared). These side effects put a minor extra burden on
you to choose distinctive or throwaway names for use in
listcomps.
TOPIC -- 'while'/'else'/'continue'/'break' Statements
--------------------------------------------------------------------
The 'while' statement loops over a block as long as the
expression after the 'while' remains true. If an 'else' block is
used within a compound 'while' statement, as soon as the
expression becomes false, the 'else' block is executed. The
'else' block is chosen even if the 'while' expression is
initially false.
If the 'continue' statement occurs in a 'while' loop, the next
loop iteration proceeds without executing later lines in the
block. If the 'break' statement occurs in a 'while' loop, control
passes past the loop without executing later lines (except the
'finally' block if the 'break' occurs in a 'try'). If a 'break'
occurs in a 'while' block, the 'else' block is not executed.
If a 'while' statement's expression is to go from being true
to being false, typically some name in the expression will be
re-bound within the 'while' block. At times an expression will
depend on an external condition, such as a file handle or a
socket, or it may involve a call to a function whose Boolean
value changes over invocations. However, probably the most
common Python idiom for 'while' statements is to rely on a
'break' to terminate a block. Some examples:
>>> command = ''
>>> while command != 'exit':
... command = raw_input('Command > ')
... # if/elif block to dispatch on various commands
...
Command > someaction
Command > exit
>>> while socket.ready():
... socket.getdata() # do something with the socket
... else:
... socket.close() # cleanup (e.g. close socket)
...
>>> while 1:
... command = raw_input('Command > ')
... if command == 'exit': break
... # elif's for other commands
...
Command > someaction
Command > exit
TOPIC -- Functions, Simple Generators, and the 'yield' Statement
--------------------------------------------------------------------
Both functions and object methods allow a kind of nonlocality in
terms of program flow, but one that is quite restrictive. A
function or method is called from another context, enters at its
top, executes any statements encountered, then returns to the
calling context as soon as a 'return' statement is reached (or
the function body ends). The invocation of a function or method
is basically a strictly linear nonlocal flow.
Python 2.2 introduced a flow control construct, called
generators, that enables a new style of nonlocal branching. If a
function or method body contains the statement 'yield', then it
becomes a -generator function-, and invoking the function returns
a -generator iterator- instead of a simple value. A generator
iterator is an object that has a '.next()' method that returns
values. Any instance object can have a '.next()' method, but a
generator iterator's method is special in having "resumable
execution."
In a standard function, once a 'return' statement is encountered,
the Python interpreter discards all information about the
function's flow state and local name bindings. The returned value
might contain some information about local values, but the flow
state is always gone. A generator iterator, in contrast,
"remembers" the entire flow state, and all local bindings,
between each invocation of its '.next()' method. A value is
returned to a calling context each place a 'yield' statement is
encountered in the generator function body, but the calling
context (or any context with access to the generator iterator) is
able to jump back to the flow point where this last 'yield'
occurred.
In the abstract, generators seem complex, but in practice they
prove quite simple. For example:
>>> from __future__ import generators # not needed in 2.3+
>>> def generator_func():
... for n in [1,2]:
... yield n
... print "Two yields in for loop"
... yield 3
...
>>> generator_iter = generator_func()
>>> generator_iter.next()
1
>>> generator_iter.next()
2
>>> generator_iter.next()
Two yields in for loop
3
>>> generator_iter.next()
Traceback (most recent call last):
File "", line 1, in ?
StopIteration
The object 'generator_iter' in the example can be bound in
different scopes, and passed to and returned from functions,
just like any other object. Any context invoking
'generator_iter.next()' jumps back into the last flow point
where the generator function body yielded.
In a sense, a generator iterator allows you to perform jumps
similar to the "GOTO" statements of some (older) languages, but
still retains the advantages of structured programming. The most
common usage for generators, however, is simpler than this. Most
of the time, generators are used as "iterators" in a loop
context; for example:
>>> for n in generator_func():
... print n
...
1
2
Two yields in for loop
3
In recent Python versions, the 'StopIteration' exception is used
to signal the end of a 'for' loop. The generator iterator's
'.next()' method is implicitly called as many times as possible
by the 'for' statement. The name indicated in the 'for'
statement is repeatedly re-bound to the values the 'yield'
statement(s) return.
TOPIC -- Raising and Catching Exceptions
--------------------------------------------------------------------
Python uses exceptions quite broadly and probably more naturally
than any other programming language. In fact there are certain
flow control constructs that are awkward to express by means
other than raising and catching exceptions.
There are two general purposes for exceptions in Python. On the
one hand, Python actions can be invalid or disallowed in various
ways. You are not allowed to divide by zero; you cannot open (for
reading) a filename that does not exist; some functions require
arguments of specific types; you cannot use an unbound name on
the right side of an assignment; and so on. The exceptions raised
by these types of occurrences have names of the form
'[A-Z].*Error'. Catching -error- exceptions is often a useful way
to recover from a problem condition and restore an application to
a "happy" state. Even if such error exceptions are not caught in
an application, their occurrence provides debugging clues since
they appear in tracebacks.
The second purpose for exceptions is for circumstances a
programmer wishes to flag as "exceptional." But understand
"exceptional" in a weak sense--not as something that indicates
a programming or computer error, but simply as something
unusual or "not the norm." For example, Python 2.2+ iterators
raise a 'StopIteration' exception when no more items can be
generated. Most such implied sequences are not infinite
length, however; it is merely the case that they contain a
(large) number of items, and they run out only once at the end.
It's not "the norm" for an iterator to run out of items, but it
is often expected that this will happen eventually.
In a sense, raising an exception can be similar to executing a
'break' statement--both cause control flow to leave a block.
For example, compare:
>>> n = 0
>>> while 1:
... n = n+1
... if n > 10: break
...
>>> print n
11
>>> n = 0
>>> try:
... while 1:
... n = n+1
... if n > 10: raise "ExitLoop"
... except:
... print n
...
11
In two closely related ways, exceptions behave differently than
do 'break' statements. In the first place, exceptions could be
described as having "dynamic scope," which in most contexts is
considered a sin akin to "GOTO," but here is quite useful. That
is, you never know at compile time exactly where an exception
might get caught (if not anywhere else, it is caught by the
Python interpreter). It might be caught in the exception's block,
or a containing block, and so on; or it might be in the local
function, or something that called it, or something that called
the caller, and so on. An exception is a -fact- that winds its
way through execution contexts until it finds a place to settle.
The upward propagation of exceptions is quite opposite to the
downward propagation of lexically scoped bindings (or even to the
earlier "three-scope rule").
The corollary of exceptions' dynamic scope is that, unlike
'break', they can be used to exit gracefully from deeply nested
loops. The "Zen of Python" offers a caveat here: "Flat is better
than nested." And indeed it is so, if you find yourself nesting
loops -too- deeply, you should probably refactor (e.g., break
loops into utility functions). But if you are nesting -just
deeply enough-, dynamically scoped exceptions are just the thing
for you. Consider the following small problem: A "Fermat triple"
is here defined as a triple of integers (i,j,k) such that "i**2 +
j**2 == k**2". Suppose that you wish to determine if any Fermat
triples exist with all three integers inside a given numeric
range. An obvious (but entirely nonoptimal) solution is:
>>> def fermat_triple(beg, end):
... class EndLoop(Exception): pass
... range_ = range(beg, end)
... try:
... for i in range_:
... for j in range_:
... for k in range_:
... if i**2 + j**2 == k**2:
... raise EndLoop, (i,j,k)
... except EndLoop, triple:
... # do something with 'triple'
... return i,j,k
...
>>> fermat_triple(1,10)
(3, 4, 5)
>>> fermat_triple(120,150)
>>> fermat_triple(100,150)
(100, 105, 145)
By raising the 'EndLoop' exception in the middle of the nested
loops, it is possible to catch it again outside of all the
loops. A simple 'break' in the inner loop would only break out
of the most deeply nested block, which is pointless. One might
devise some system for setting a "satisfied" flag and testing
for this at every level, but the exception approach is much
simpler. Since the 'except' block does not actually -do-
anything extra with the triple, it could have just been
returned inside the loops; but in the general case, other
actions can be required before a 'return'.
It is not uncommon to want to leave nested loops when something
has "gone wrong" in the sense of a "*Error" exception.
Sometimes you might only be in a position to discover a problem
condition within nested blocks, but recovery still makes better
sense outside the nesting. Some typical examples are problems
in I/O, calculation overflows, missing dictionary keys or list
indices, and so on. Moreover, it is useful to assign 'except'
statements to the calling position that really needs to handle
the problems, then write support functions as if nothing can go
wrong. For example:
>>> try:
... result = complex_file_operation(filename)
... except IOError:
... print "Cannot open file", filename
The function 'complex_file_operation()' should not be burdened
with trying to figure out what to do if a bad 'filename' is given
to it--there is really nothing to be done in that context.
Instead, such support functions can simply propagate their
exceptions upwards, until some caller takes responsibility for
the problem.
The 'try' statement has two forms. The 'try/except/else' form is
more commonly used, but the 'try/finally' form is useful for
"cleanup handlers."
In the first form, a 'try' block must be followed by one or more
'except' blocks. Each 'except' may specify an exception or tuple
of exceptions to catch; the last 'except' block may omit an
exception (tuple), in which case it catches every exception that
is not caught by an earlier 'except' block. After the 'except'
blocks, you may optionally specify an 'else' block. The 'else'
block is run only if no exception occurred in the 'try' block.
For example:
>>> def except_test(n):
... try: x = 1/n
... except IOError: print "IO Error"
... except ZeroDivisionError: print "Zero Division"
... except: print "Some Other Error"
... else: print "All is Happy"
...
>>> except_test(1)
All is Happy
>>> except_test(0)
Zero Division
>>> except_test('x')
Some Other Error
An 'except' test will match either the exception actually
listed or any descendent of that exception. It tends to make
sense, therefore, in defining your own exceptions to inherit
from related ones in the [exceptions] module. For example:
>>> class MyException(IOError): pass
>>> try:
... raise MyException
... except IOError:
... print "got it"
...
got it
In the "try/finally" form of the 'try' statement, the 'finally'
statement acts as general cleanup code. If no exception occurs in
the 'try' block, the 'finally' block runs, and that is that. If
an exception -was- raised in the 'try' block, the 'finally' block
still runs, but the original exception is re-raised at the end of
the block. However, if a 'return' or 'break' statement is
executed in a 'finally' block--or if a new exception is raised in
the block (including with the 'raise' statement)--the 'finally'
block never reaches its end, and the original exception
disappears.
A 'finally' statement acts as a cleanup block even when its
corresponding 'try' block contains a 'return', 'break', or
'continue' statement. That is, even though a 'try' block might
not run all the way through, 'finally' is still entered to clean
up whatever the 'try' -did- accomplish. A typical use of this
compound statement opens a file or other external resource at the
very start of the 'try' block, then performs several actions that
may or may not succeed in the rest of the block; the 'finally' is
responsible for making sure the file gets closed, whether or not
all the actions on it prove possible.
The "try/finally" form is never strictly needed since a bare
'raise' statement will re-raise the last exception. It is
possible, therefore, to have an 'except' block end with the
'raise' statement to propagate an error upward after taking some
action. However, when a cleanup action is desired whether or not
exceptions were encountered, the "try/finally" form can save a
few lines and express your intent more clearly. For example:
>>> def finally_test(x):
... try:
... y = 1/x
... if x > 10:
... return x
... finally:
... print "Cleaning up..."
... return y
...
>>> finally_test(0)
Cleaning up...
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in finally_test
ZeroDivisionError: integer division or modulo by zero
>>> finally_test(3)
Cleaning up...
0
>>> finally_test(100)
Cleaning up...
100
TOPIC -- Data as Code
--------------------------------------------------------------------
Unlike in languages in the Lisp family, it is -usually- not a
good idea to create Python programs that execute data values. It
is -possible-, however, to create and run Python strings during
program runtime using several built-in functions. The modules
[code], [codeop], [imp], and [new] provide additional
capabilities in this direction. In fact, the Python interactive
shell itself is an example of a program that dynamically reads
strings as user input, then executes them. So clearly, this
approach is occasionally useful.
Other than in providing an interactive environment for advanced
users (who themselves know Python), a possible use for the
"data as code" model is with applications that themselves
generate Python code, either to run later or to communicate
with another application. At a simple level, it is not
difficult to write compilable Python programs based on
templatized functionality; for this to be useful, of course,
you would want a program to contain some customization that was
determinable only at runtime.
eval(s [,globals=globals() [,locals=locals()]])
Evaluate the expression in string 's' and return the result
of that evaluation. You may specify optional arguments
'globals' and 'locals' to specify the namespaces to use for
name lookup. By default, use the regular global and local
namespace dictionaries. Note that only an expression can
be evaluated, not a statement suite.
Most of the time when a (novice) programmer thinks of
using `eval()` it is to compute some value--often
numeric--based on data encoded in texts. For example,
suppose that a line in a report file contains a list of
dollar amounts, and you would like the sum of these
numbers. A naive approach to the problem uses `eval()`:
>>> line = "$47 $33 $51 $76"
>>> eval("+".join([d.replace('$','') for d in line.split()]))
207
While this approach is generally slow, that is not an
important problem. A more significant issue is that
`eval()` runs code that is not known until runtime;
potentially 'line' could contain Python code that causes
harm to the system it runs on or merely causes an
application to malfunction. Imagine that instead of a
dollar figure, your data file contained 'os.rmdir("/")'. A
better approach is to use the safe type coercion functions
`int()`, `float()`, and so on.
>>> nums = [int(d.replace('$','')) for d in line.split()]
>>> from operator import add
>>> reduce(add, nums)
207
exec
The `exec` statement is a more powerful sibling of the
`eval()` function. Any valid Python code may be run if
passed to the `exec` statement. The format of the `exec`
statement allows optional namespace specification, as with
`eval()`:
'execīcodeī[inīglobalsī[,locals]]'
For example:
>>> s = "for i in range(10):\n print i,\n"
>>> exec s in globals(), locals()
0 1 2 3 4 5 6 7 8 9
The argument 'code' may be either a string, a code object,
or an open file object. As with `eval()` the security
dangers and speed penalties of `exec` usually outweigh any
convenience provided. However, where 'code' is clearly
under application control, there are occasionally uses for
this statement.
__import__(s [,globals=globals() [,locals=locals() [,fromlist]]])
Import the module named 's', using namespace dictionaries
'globals' and 'locals'. The argument 'fromlist' may be
omitted, but if specified as a nonempty list of
strings--e.g., '[""]'--the fully qualified subpackage will
be imported. For normal cases, the `import` statement is
the way you import modules, but in the special circumstance
that the value of 's' is not determined until runtime, use
`__import__()`.
>>> op = __import__('os.path',globals(),locals(),[''])
>>> op.basename('/this/that/other')
'other'
input([prompt])
Equivalent to 'eval(raw_input(prompt))', along with all the
dangers associated with `eval()` generally. Best practice
is to always use `raw_input()`, but you might see `input()`
in existing programs.
raw_input([prompt])
Return a string from user input at the terminal. Used to
obtain values interactive in console-based applications.
>>> s = raw_input('Last Name: ')
Last Name: Mertz
>>> s
'Mertz'
SECTION -- Functional Programming
--------------------------------------------------------------------
This section largely recapitulates briefer descriptions
elsewhere in this appendix; but a common unfamiliarity with
functional programming merits a longer discussion. Additional
material on functional programming in Python--mostly of a
somewhat exotic nature--can be found in articles at:
.
It is hard to find any consensus about exactly what functional
programming -is-, among either its proponents or detractors. It
is not really entirely clear to what extent FP is a feature of
languages, and to what extent a feature of programming styles.
Since this is a book about Python, we can leave aside discussions
of predominantly functional languages like Lisp, Scheme, Haskell,
ML, Ocaml, Clean, Mercury, Erlang, and so on, we can focus on
what makes a Python program more or less functional.
Programs that lean towards functional programming, within
Python's multiple paradigms, tend to have many of the following
features:
1. Functions are treated as first-class objects that are
passed as arguments to other functions and methods, and
returned as values from same.
2. Solutions are expressed more in terms of -what- is to be
computed than in terms of -how- the computation is
performed.
3. Side effects, especially rebinding names repeatedly, are
minimized. Functions are referentially transparent (see
Glossary).
4. Expressions are emphasized over statements; in particular,
expressions often describe how a result collection is
related to a prior collection--most especially list
objects.
5. The following Python constructs are used prevalently: the
built-in functions `map()`, `filter()`, `reduce()`,
`apply()`, `zip()`, and `enumerate()`; extended call
syntax; the `lambda` operator; list comprehensions;
and switches expressed as Boolean operators.
Many experienced Python programmers consider FP constructs to
be as much of a wart as a feature. The main drawback of a
functional programming style (in Python, or elsewhere) is that
it is easy to write unmaintainable or obfuscated programming
code using it. Too many `map()`, `reduce()` and `filter()`
functions nested inside each other lose all the self-evidence
of Python's simple statement and indentation style. Adding
unnamed `lambda` functions into the mix makes matters that much
worse. The discussion in Chapter 1 of higher-order functions
gives some examples.
TOPIC -- Emphasizing Expressions using 'lambda'
--------------------------------------------------------------------
The `lambda` operator is used to construct an "anonymous"
function. In contrast to the more common 'def' declaration, a
function created with `lambda` can only contain a single
expression as a result, not a sequence of statements, nested
blocks, and so on. There are inelegant ways to emulate statements
within a `lambda`, but generally you should think of `lambda` as
a less-powerful cousin of 'def' declarations.
Not all Python programmers are happy with the `lambda`
operator. There is certainly a benefit in readability to
giving a function a descriptive name. For example, the second
style below is clearly more readable than the first:
>>> from math import sqrt
>>> print map(lambda (a,b): sqrt(a**2+b**2),((3,4),(7,11),(35,8)))
[5.0, 13.038404810405298, 35.902646142032481]
>>> sides = ((3,4),(7,11),(35,8))
>>> def hypotenuse(ab):
... a,b = ab[:]
... return sqrt(a**2+b**2)
...
>>> print map(hypotenuse, sides)
[5.0, 13.038404810405298, 35.902646142032481]
By declaring a named function 'hypotenuse()', the intention of
the calculation becomes much more clear. Once in a while, though,
a function used in `map()` or in a callback (e.g., in [Tkinter],
[xml.sax], or [mx.TextTools]) really is such a one-shot thing
that a name only adds noise.
However, you may notice in this book that I fairly commonly use
the `lambda` operator to define a name. For example, you might
see something like:
>>> hypotenuse = lambda (a,b): sqrt(a**2+b**2)
This usage is mostly for documentation. A side matter is that a
few characters are saved in assigning an anonymous function to a
name, versus a 'def' binding. But conciseness is not particularly
important. This function definition form documents explicitly
that I do not expect any side effects--like changes to globals
and data structures--within the 'hypotenuse()' function. While
the 'def' form is also side effect free, that fact is not
advertised; you have to look through the (brief) code to
establish it. Strictly speaking, there are ways--like calling
`setattr()`--to introduce side effects within a `lambda`, but as
a convention, I avoid doing so, as should you.
Moreover, a second documentary goal is served by a `lambda`
assignment like the one above. Whenever this form occurs, it is
possible to literally substitue the right-hand expression
anywhere the left-hand name occurs (you need to add extra
surrounding parentheses usually, however). By using this form, I
am emphasizing that the name is simply a short-hand for the
defined expression. For example:
>>> hypotenuse = lambda a,b: sqrt(a**2+b**2)
>>> (lambda a,b: sqrt(a**2+b**2))(3,4), hypotenuse(3,4)
(5.0, 5.0)
Bindings with 'def', in general, lack substitutability.
TOPIC -- Special List Functions
--------------------------------------------------------------------
Python has two built-in functions that are strictly operations
on sequences, but that are frequently useful in conjunction
with the "function-plus-list" built-in functions.
zip(seq1 [,seq2 [,...]])
The `zip()` function, in Python 2.0+, combines multiple
sequences into one sequence of tuples. Think of the teeth
of a zipper for an image and the source of the name.
The function `zip()` is almost the same as 'map(None,...)',
but `zip()` truncates when it reaches the end of the
shortest sequence. For example:
>>> map(None, (1,2,3,4), [5,5,5])
[(1, 5), (2, 5), (3, 5), (4, None)]
>>> zip((1,2,3,4), [5,5,5])
[(1, 5), (2, 5), (3, 5)]
Especially in combination with `apply()`, extended call
syntax, or simply tuple unpacking, `zip()` is useful for
operating over multiple related sequences at once; for
example:
>>> lefts, tops = (3, 7, 35), (4, 11, 8)
>>> map(hypotenuse, zip(lefts, tops))
[5.0, 13.038404810405298, 35.902646142032481]
A little quirk of `zip()` is that it is -almost- its own
inverse. A little use of extended call syntax is needed
for inversion, though. The expression 'zip(*zip(*seq))' is
idempotent (as an exercise, play with variations).
Consider:
>>> sides = [(3, 4), (7, 11), (35, 8)]
>>> zip(*zip(*sides))
[(3, 4), (7, 11), (35, 8)]
enumerate(collection)
Python 2.3 adds the `enumerate()` built-in function for
working with a sequence and its index positions at the same
time. Basically, 'enumerate(seq)' is equivalent to
'zip(range(len(seq)),seq)', but `enumerate()` is a lazy
iterator that need not construct the entire list to loop
over. A typical usage is:
>>> items = ['a','b']
>>> i = 0 # old-style explicit increment
>>> for thing in items:
... print 'index',i,'contains',thing
... i += 1
index 0 contains a
index 1 contains b
>>> for i,thing in enumerate(items):
... print 'index',i,'contains',thing
...
index 0 contains a
index 1 contains b
TOPIC -- List-Application Functions as Flow Control
--------------------------------------------------------------------
I believe that text processing is one of the areas of Python
programming where judicious use of functional programming
techniques can greatly aid both clarity and conciseness. A
strength of FP style--specifically the Python built-in functions
`map()`, `filter()`, and `reduce()`--is that they are not merely
about -functions-, but also about -sequences-. In text processing
contexts, most loops are ways of iterating over chunks of text,
frequently over lines. When you wish to do something to a
sequence of similar items, FP style allows the code to focus on
the action (and its object) instead of on side issues of loop
constructs and transient variables.
In part, a `map()`, `filter()`, or `reduce()` call is a kind of
flow control. Just as a 'for' loop is an instruction to perform
an action a number of times, so are these list-application
functions. For example:
#*----------------- Explicit 'for' loop -----------------#
for x in range(100):
sys.stdout.write(str(x))
and:
#*--------------- List-application loop -----------------#
filter(sys.stdout.write, map(str, range(100)))
are just two different ways of calling the 'str()' function 100
times (and the 'sys.stdout.write()' method with each result). The
two differences are that the FP style does not bother rebinding a
name for each iteration, and that each call to a list-application
function returns a value--a list for `map()` and `filter()`,
potentially any sort of value for `reduce()`. Functions/methods
like `sys.stdout.write` that are called wholly for their
side effects almost always return 'None'; by using `filter()`
rather than `map()` around these, you avoid constructing a
throwaway list--or rather you construct just an empty list.
TOPIC -- Extended Call Syntax and 'apply()'
--------------------------------------------------------------------
To call a function in a dynamic way, it is sometimes useful to
build collections of arguments in data structures prior to the
call. Unpacking a sequence containing several positional
arguments is awkward, and unpacking a dictionary of keyword
arguments simply cannot be done with the Python 1.5.2 standard
call syntax. For example, consider the 'salutation()' function:
>>> def salutation(title,first,last,use_title=1,prefix='Dear'):
... print prefix,
... if use_title: print title,
... print '%s %s,' % (first, last)
...
>>> salutation('Dr.','David','Mertz',prefix='To:')
To: Dr. David Mertz,
Suppose you read names and prefix strings from a text file or
database and wish to call 'salutation()' with arguments
determined at runtime. You might use:
>>> rec = get_next_db_record()
>>> opts = calculate_options(rec)
>>> salutation(rec[0], rec[1], rec[2],
... use_title=opts.get('use_title',1),
... prefix=opts.get('prefix','Dear'))
This call can be performed more concisely as:
>>> salutation(*rec, **opts)
Or as:
>>> apply(salutation, rec, opts)
The calls 'func(*args,**keywds)' and 'apply(func,args,keywds)'
are equivalent. The argument 'args' must be a sequence of the
same length as the argument list for 'func'. The (optional)
argument 'keywds' is a dictionary that may or may not contain
keys matching keyword arguments (if not, it has no effect).
In most cases, the extended call syntax is more readable, since
the call closely resembles the -declaration- syntax of generic
positional and keyword arguments. But in a few
cases--particularly in higher-order functions--the older
`apply()` built-in function is still useful. For example,
suppose that you have an application that will either perform
an action immediately or defer it for later, depending on some
condition. You might program this application as:
#*----------- apply() as first-class function -----------#
defer_list = []
if some_runtime_condition():
doIt = apply
else:
doIt = lambda *x: defer_list.append(x)
#...do stuff like read records and options...
doIt(operation, args, keywds)
#...do more stuff...
#...carry out deferred actions...
map(lambda (f,args,kw): f(*args,**kw), defer_list)
Since `apply()` is itself a first-class function rather than a
syntactic form, you can pass it around--or in the example,
bind it to a name.