David Mertz, Ph.D.
Frog Prince, Gnosis Software
June, 2005
Since the "golden age" of Python 1.5.2--for a long time a stable and solid version--Python has greatly increased its number of syntactic features and built-in functions and types. One area where features have accumulated to a very large degree is in ways to access methods by using attribute syntax. A huge variety of "behind the scenes" techniques, each with pluses and minuses, have grown within the latest Python versions.
In this article, David discusses the non-obvious features and misfeatures that have been added to the last several Python versions; and weighs in on which are truly valuable, and which unnecessary complication. David hopes to provide a collection of valuable things to watch out for to all those programmers who use Python less than full time--either programmer in other languages, or people like scientists for whom programming is only a side task. Where some quandries are raised, solutions are suggested.
In most object-oriented languages, methods and attributes are almost-but-not-quite the same thing. Both things can be attached to a class and/or to an instance. Whatever the details of implementation though, there seems to be a difference too. Methods are things attached to an object that you can call to initiate actions and calculations; attributes simply have values that can be retrieved (and perhaps modified).
There are langauges--like Java--where what I have described is pretty
much the end of the story. Attributes are attributes, and methods are
methods. Java, by convention, puts a heavy emphasis on encapsulation
and data hiding; the result is an encouragement of the use of
"setters" and "getters" to access otherwise private attribute data. To
the Java way of thinking, using explicit method calls covers in
advance the case where you might want to add computation or
side-effects to data access or modification. Of course, the result of
the Java attitude is verboseness, and the imposition of a sometimes
artificial seeming discipline: you wind up writing foo.getBar()
instead of foo.bar
, and foo.setBar(value)
instead of
foo.bar=value
.
Ruby is worth mentioning as a somewhat odd creature. It actually
insists on data hiding to an even greater degree than Java does: all
attributes are always "private", you can never directly access
instance data. At the same time, Ruby uses some syntax conventions
that make method calls look like attribute access does in other
languages. The first element of this is Ruby's option parentheses in
method calls; the second part is its use of semi-special naming of
methods with symbols that are operators in most languages. So in
Ruby, foo.bar
is just a shorter option for calling foo.bar()
; and
"setting" foo.bar=value
is shorthand for foo.bar=(value)
. Behind
the scenes, everything goes through a method call.
Python is much more flexible than either Java or Ruby, but that fact
is as much a criticism as it is praise. If you access foo.bar
, or
set foo.bar=value
in Python, you might be using a simple data value,
or you might be calling some semi-hidden code. Moreover, in the latter
case, there are at least a half-dozen different ways you might reach
that block of code, each one having slightly different behavior than
the others, with dizzying subtleties and nuances. This deluge of
options harms the regularity of Python, and makes it harder to
understand for non-experts (or even for experts). I know why things
have gotten where they are: new capabilities have been added to
Python's object-oriented underpinnings in several steps. But I am not
terribly pleased by the chaos.
Since the old days (before Python 2.1), Python has had a magic method
called .__getattr__()
that class could define to return computed
values rather than simple data accesses. Correspondingly, the magic
methods .__setattr__()
and .__delattr__()
could cause code to run
with "attributes" were set or deleted. The problem with this old
system was that you never really knew whether or not the code would
actually be called, since it depended on whether an attribute with the
same name as the one accessed existed in obj.__dict__
. You could try
to create .__setattr__()
and .__delattr__()
methods that
controlled what wound up there, but even that did not prevent direct
manipulation of obj.__dict__
by other code. Both changed inheritance
trees and passing objects to external functions could often make it
non-obvious whether a method would or would not actually run when
working with an object. For example:
>>> class Foo(object): ... def __getattr__(self, name): ... return "Value of %s" % name >>> foo = Foo() >>> foo.just_this = "Some value" >>> foo.just_this 'Some value' >>> foo.something_else 'Value of something_else'
Accessing foo.just_this
skips the method code, while accessing
foo.something_else
runs it; other than the fact this shell session
is short, nothing makes the difference obvious. In fact, asking the
obvious hasattr()
question give you a misleading answer:
>>> hasattr(foo,'never_mentioned') True >>> foo2.__dict__.has_key('never_mentioned') # this works False >>> foo2.__dict__.has_key('just_this') True
With Python 2.2, we gained a new mechanism for creating "restricted"
classes. Exactly what the new-style class _slots_
attribute is
supposed to be used for is nowhere made unambiguous. For the most
part, the Python documentation advices to use .__slots__
only for
performance optimization in classes that might have a very large
number of instances--but specifically not as a way to declare
attributes. Nonetheless, the latter is what slots do: they create a
class without a .__dict__
attribute that only has the attributes
explicitly named (methods, however, are still declared as normal
within the class body). It is peculiar, but this gives you a way to
ensure that method code is called on attribute access:
>>> class Foo2(object): ... __slots__ = ('just_this') ... def __getattr__(self, name): ... return "Value of %s" % name >>> foo2 = Foo2() >>> foo2.just_this = "I'm slotted" >>> foo2.just_this "I'm slotted" >>> foo2.something_else = "I'm not slotted" AttributeError: 'Foo' object has no attribute 'something_else' >>> foo2.something_else 'Value of something_else'
The declaration of .__slots__
guarantees that only those attributes
you specify can be accessed directly, everything else will go through
the .__getattr__()
call. If you have also created a
.__setattr__()
method, you can make an assignment do something other
than raise an AttributeError
(but be sure to let the "slotted" value
pass through on assignment). E.g.:
>>> class Foo3(object): ... __slots__ = ('x') ... def __setattr__(self, name, val): ... if name in Foo.__slots__: ... object.__setattr__(self, name, val) ... def __getattr__(self, name): ... return "Value of %s" % name ... >>> foo3 = Foo3() >>> foo3.x 'Value of x' >>> foo3.x = 'x' >>> foo3.x 'x' >>> foo3.y 'Value of y' >>> foo3.y = 'y' # Doesn't do anything, but doesn't raise exception >>> foo3.y 'Value of y'
.__getattribute__()
method
In Python 2.2 and later, you have the option of using the method
.__getattribute__()
instead of the confusingly similarly named
old-style .__getattr__()
. Well, you have the option if you use
new-style classes (which generally you should anyway). The
.__getattribute__()
method is more powerful than its sibling in that
it intercepts all attribute access, whether or not an attribute
is defined in obj.__dict__
or obj.__slots__
. A drawback of using
.__getattribute__()
is that since all access goes though the method.
If you use this method, a bit of special programming is needed if you
want to return (or manipulate) the "real" value of the attribute:
usually you do this by calling .__getattribute__()
on the superclass
(usually object
). For example:
>>> class Foo4(object): ... def __getattribute__(self, name): ... try: ... return object.__getattribute__(self, name) ... except: ... return "Value of %s" % name ... >>> foo4 = Foo4() >>> foo4.x = 'x' >>> foo4.x 'x' >>> foo4.y 'Value of y'
In all versions of Python, .__setattr__()
and .__delattr__()
also
intercept all the write and delete access to attributes, rather than
merely those absent from obj.__dict__
.
We are moving along nicely in an enumeration of ways to make attributes act like methods. Within these magic methods, you can examine the specific attribute name being accessed, assigned, or deleted. In fact, if you like, you can check names by regular expression or by other computation. In principle you can make all sorts of runtime decisions about how to handle use of some given pseudo-attribute--for example, perhaps you do not simply want to compare the attribute name to a string pattern, but actually look up whether the name is an attribute that has been stored in a persistent database.
Much of the time, however, you would just like for a few "attributes" to act in a special manner, but let other attributes operate as plain attributes. The plain attributes should neither trigger any special code, nor suffer the time penalty of working through method code. In these cases you can use descriptors for attributes. Or closely related to descriptors, you can define properties. Behind the scenes, properties and descriptors amount to the same thing, but the syntax of defining them is rather different; with the difference in definition styles, you get advantages and disadvantages.
Let us first look at a descriptor. The idea here is to assign an
instance of a special kind of class to an attribute within another
class. This special "descriptor" class is a new-style class that
contains methods called .__get__()
, .__set__()
and __delete__()
,
or at least a subset of those. If the descriptor class implements at
least the first two, it is called a "data descriptor"; if it
implements only the first, it is called a "non-data descriptor".
The latter (non-data descriptor) is likely to be used to return a callable object. A non-data descriptor is, in a sense, often a fancy name for a method--but the particular method returned by descriptor access could be determined at runtime. This starts to get us into the scary realm of things similar to metaclasses and decorators, that I have written about before in this column. Of course, a regular method can also decide what code to run based on runtime conditions, so there is nothing fundamentally new about this concept of runtime determination of what a "method" does.
In any case, a data descriptor is more general, so let us look at one. Such a descriptor could return something callable--any Python function or method can return anything, after all. But our example just deals with simple values (and side effects). We want to make any use of a few attributes log the action to STDERR:
>>> class ErrWriter(object): ... def __get__(self, obj, type=None): ... print >> sys.stderr, "get", self, obj, type ... return self.data ... def __set__(self, obj, value): ... print >> sys.stderr, "set", self, obj, value ... self.data = value ... def __delete__(self, obj): ... print >> sys.stderr, "delete", self, obj ... del self.data >>> class Foo(object): ... this = ErrWriter() ... that = ErrWriter() ... other = 4 >>> foo = Foo() >>> foo.this = 5 set <__main__.ErrWriter object at 0x5cec90> <__main__.Foo object at 0x5cebf0> 5 >>> print foo.this get <__main__.ErrWriter object at 0x5cec90> <__main__.Foo object at 0x5cebf0> <class '__main__.Foo'> 5 >>> print foo.other 4 >>> foo.other = 6 >>> print foo.other 6
The class Foo
defines this
and that
as descriptors of the
ErrWriter
class. The attribute other
is just a plain class
attribute. Actually, there is a caveat here. On first access to
foo.other
, we read the class attribute; after we assign a value, we
actually read the instance attribute instead. The class attribute is
still there, just hidden, i.e.:
>>> foo.other 6 >>> foo.__class__.other 4
In contrast, the descriptor stays a class-level object, even though you can access it through the instance. This has the usually undesirable effect of making the descriptor something like a singleton. E.g.:
>>> foo2 = Foo() >>> foo2.this get <__main__.ErrWriter object at 0x5cec90> <__main__.Foo object at 0x5cebf0> <class '__main__.Foo'> 5
To simulate a usual "per instance" behavior, you would need to make
use of the obj
passed into ErrWriter
magic methods. This obj
is
the instance that has the descriptor. So you might define a
non-singleton descriptor like:
class ErrWriter(object): def __init__(self): self.inst = {} def __get__(self, obj, type=None): return self.inst[obj] def __set__(self, obj, value): self.inst[obj] = value def __delete__(self, obj): del self.inst[obj]
Properties work like descriptors, but are generally defined inside a
particular class rather than being created as "utility descriptors"
that various classes might utilize. As with "regular" descriptors,
the idea is to define "getters", "setters" and "deleters". After
that, you use the special function property()
to turn those methods
into a descriptor. For the hyper-technical readers (those whose
brains are still more-or-less operational): property
is not really a
function, but a type--don't worry about it.
Oddly, properties bring us full circle to the brief description I gave at top about how the Ruby programming languge works. A property is just a thing that looks like an attribute syntactically (as used), but is defined by defining all the getters, setters, and so on. If you wanted to, you could impose complete "Ruby-discipline" in Python, and never access "real" attributes. More likely you will want to mix-and-match though. Let us see how a property works:
class FooP(object): def getX(self): return self.__x def setX(self, value): self.__x = value def delX(self): del self.__x x = property(getX, setX, delX, "I'm the 'x' property.")
The names of the getter, setter, and deleter are nothing reserved. Usually you will want to use sensible names like the above. What they actually do can be anything, but often it is reasonable to use double-underscore versions of names for the attributes. These attributes get attached to the instance, just with the usual Python name mangling for "semi-hidden" attributes. Moreover, the methods remain perfectly usable too:
>>> foop = FooP() >>> foop.x = 'FooP x' >>> foop.getX() 'FooP x' >>> foop._FooP__x 'FooP x' >>> foop.x 'FooP x'
We have seen, in this installment, far too many ways to make Python instance attributes act like (or be) method calls. Unlike in some of these installments, I really do not have any clear advice for readers on how to cut through the complexity. I would like to be able to tell you to simply choose one of the described techniques, and ignore the others as inferior or less general. Unfortunately, each technique described has strengths, and has weaknesses. Each is pretty reasonable for certain programming contexts, even though the syntax and semantics of each is so radically different from the others.
Moreover, while I have not described it in this article, I have thought (vaguely) of a number of other even more obscure ways that a programmer might use metaclasses, class factories, and decorators to obtain similar effects to the "standard" half-dozen techniques I have outlined. Those ideas would truly probe into some dark corners of Python metaprogramming.
What would be nice would be if all the things I described were possible, but the variations among them were simply parameterized in some straightforward way rather than using wholly different syntax and organization. The grand goal of Python 3000 is a simplification along these lines; but I have not seen any concrete proposals on how such unification and simplification of attributes-as-methods might work. One idea that occurs to me is that Python might enable decorators for classes (along with the current use in methods and functions), and also provide some standard module of decorators for the most common "magic attribute" behaviors. This is speculation, and I do not know exactly how it might work, but I can just imagine such a thing could hide the complexity from the 95% of Python programmers who really do not wish to worry too much about Python internals and cryptic mojo.
Raymond Hettinger's "How-To Guide for Descriptors" is an excellent, albeit fairly dense, description of the low-down on exactly how descriptors work.
http://users.rcn.com/python/download/Descriptor.htm
The Python PEP 3000 describes the goals for Python 3.0 (also called "Python 3000"). The main purpose of this future version is removal of redundancy in programming techniques. Nothing is currently detailed about reducing the chaos about attributes-as-methods, but a programmer can dream:
http://www.python.org/dev/peps/pep-3000/
David Mertz almost enjoys problems because of the solutions they enable. David may be reached at [email protected]; his life pored over athttp://gnosis.cx/publish/. Check out David's book Text Processing in Python (http://gnosis.cx/TPiP/).