Charming Python #b26: Python Elegance, Python Warts, Part 2

Properties, attributes, methods and custom access


David Mertz, Ph.D.
Frog Prince, Gnosis Software
June, 2005

Since the "golden age" of Python 1.5.2--for a long time a stable and solid version--Python has greatly increased its number of syntactic features and built-in functions and types. One area where features have accumulated to a very large degree is in ways to access methods by using attribute syntax. A huge variety of "behind the scenes" techniques, each with pluses and minuses, have grown within the latest Python versions.

Introduction

In this article, David discusses the non-obvious features and misfeatures that have been added to the last several Python versions; and weighs in on which are truly valuable, and which unnecessary complication. David hopes to provide a collection of valuable things to watch out for to all those programmers who use Python less than full time--either programmer in other languages, or people like scientists for whom programming is only a side task. Where some quandries are raised, solutions are suggested.

Attributes And Methods

In most object-oriented languages, methods and attributes are almost-but-not-quite the same thing. Both things can be attached to a class and/or to an instance. Whatever the details of implementation though, there seems to be a difference too. Methods are things attached to an object that you can call to initiate actions and calculations; attributes simply have values that can be retrieved (and perhaps modified).

There are langauges--like Java--where what I have described is pretty much the end of the story. Attributes are attributes, and methods are methods. Java, by convention, puts a heavy emphasis on encapsulation and data hiding; the result is an encouragement of the use of "setters" and "getters" to access otherwise private attribute data. To the Java way of thinking, using explicit method calls covers in advance the case where you might want to add computation or side-effects to data access or modification. Of course, the result of the Java attitude is verboseness, and the imposition of a sometimes artificial seeming discipline: you wind up writing foo.getBar() instead of foo.bar, and foo.setBar(value) instead of foo.bar=value.

Ruby is worth mentioning as a somewhat odd creature. It actually insists on data hiding to an even greater degree than Java does: all attributes are always "private", you can never directly access instance data. At the same time, Ruby uses some syntax conventions that make method calls look like attribute access does in other languages. The first element of this is Ruby's option parentheses in method calls; the second part is its use of semi-special naming of methods with symbols that are operators in most languages. So in Ruby, foo.bar is just a shorter option for calling foo.bar(); and "setting" foo.bar=value is shorthand for foo.bar=(value). Behind the scenes, everything goes through a method call.

Python is much more flexible than either Java or Ruby, but that fact is as much a criticism as it is praise. If you access foo.bar, or set foo.bar=value in Python, you might be using a simple data value, or you might be calling some semi-hidden code. Moreover, in the latter case, there are at least a half-dozen different ways you might reach that block of code, each one having slightly different behavior than the others, with dizzying subtleties and nuances. This deluge of options harms the regularity of Python, and makes it harder to understand for non-experts (or even for experts). I know why things have gotten where they are: new capabilities have been added to Python's object-oriented underpinnings in several steps. But I am not terribly pleased by the chaos.

The old fashion way

Since the old days (before Python 2.1), Python has had a magic method called .__getattr__() that class could define to return computed values rather than simple data accesses. Correspondingly, the magic methods .__setattr__() and .__delattr__() could cause code to run with "attributes" were set or deleted. The problem with this old system was that you never really knew whether or not the code would actually be called, since it depended on whether an attribute with the same name as the one accessed existed in obj.__dict__. You could try to create .__setattr__() and .__delattr__() methods that controlled what wound up there, but even that did not prevent direct manipulation of obj.__dict__ by other code. Both changed inheritance trees and passing objects to external functions could often make it non-obvious whether a method would or would not actually run when working with an object. For example:

>>> class Foo(object):
...     def __getattr__(self, name):
...         return "Value of %s" % name
>>> foo = Foo()
>>> foo.just_this = "Some value"
>>> foo.just_this
'Some value'
>>> foo.something_else
'Value of something_else'

Accessing foo.just_this skips the method code, while accessing foo.something_else runs it; other than the fact this shell session is short, nothing makes the difference obvious. In fact, asking the obvious hasattr() question give you a misleading answer:

>>> hasattr(foo,'never_mentioned')
True
>>> foo2.__dict__.has_key('never_mentioned')  # this works
False
>>> foo2.__dict__.has_key('just_this')
True

The slots hack

With Python 2.2, we gained a new mechanism for creating "restricted" classes. Exactly what the new-style class _slots_ attribute is supposed to be used for is nowhere made unambiguous. For the most part, the Python documentation advices to use .__slots__ only for performance optimization in classes that might have a very large number of instances--but specifically not as a way to declare attributes. Nonetheless, the latter is what slots do: they create a class without a .__dict__ attribute that only has the attributes explicitly named (methods, however, are still declared as normal within the class body). It is peculiar, but this gives you a way to ensure that method code is called on attribute access:

>>> class Foo2(object):
...     __slots__ = ('just_this')
...     def __getattr__(self, name):
...         return "Value of %s" % name
>>> foo2 = Foo2()
>>> foo2.just_this = "I'm slotted"
>>> foo2.just_this
"I'm slotted"
>>> foo2.something_else = "I'm not slotted"
AttributeError: 'Foo' object has no attribute 'something_else'
>>> foo2.something_else
'Value of something_else'

The declaration of .__slots__ guarantees that only those attributes you specify can be accessed directly, everything else will go through the .__getattr__() call. If you have also created a .__setattr__() method, you can make an assignment do something other than raise an AttributeError (but be sure to let the "slotted" value pass through on assignment). E.g.:

>>> class Foo3(object):
...     __slots__ = ('x')
...     def __setattr__(self, name, val):
...         if name in Foo.__slots__:
...             object.__setattr__(self, name, val)
...     def __getattr__(self, name):
...         return "Value of %s" % name
...
>>> foo3 = Foo3()
>>> foo3.x
'Value of x'
>>> foo3.x = 'x'
>>> foo3.x
'x'
>>> foo3.y
'Value of y'
>>> foo3.y = 'y'   # Doesn't do anything, but doesn't raise exception
>>> foo3.y
'Value of y'

The .__getattribute__() method

In Python 2.2 and later, you have the option of using the method .__getattribute__() instead of the confusingly similarly named old-style .__getattr__(). Well, you have the option if you use new-style classes (which generally you should anyway). The .__getattribute__() method is more powerful than its sibling in that it intercepts all attribute access, whether or not an attribute is defined in obj.__dict__ or obj.__slots__. A drawback of using .__getattribute__() is that since all access goes though the method. If you use this method, a bit of special programming is needed if you want to return (or manipulate) the "real" value of the attribute: usually you do this by calling .__getattribute__() on the superclass (usually object). For example:

>>> class Foo4(object):
...     def __getattribute__(self, name):
...         try:
...             return object.__getattribute__(self, name)
...         except:
...             return "Value of %s" % name
...
>>> foo4 = Foo4()
>>> foo4.x = 'x'
>>> foo4.x
'x'
>>> foo4.y
'Value of y'

In all versions of Python, .__setattr__() and .__delattr__() also intercept all the write and delete access to attributes, rather than merely those absent from obj.__dict__.

Descriptors

We are moving along nicely in an enumeration of ways to make attributes act like methods. Within these magic methods, you can examine the specific attribute name being accessed, assigned, or deleted. In fact, if you like, you can check names by regular expression or by other computation. In principle you can make all sorts of runtime decisions about how to handle use of some given pseudo-attribute--for example, perhaps you do not simply want to compare the attribute name to a string pattern, but actually look up whether the name is an attribute that has been stored in a persistent database.

Much of the time, however, you would just like for a few "attributes" to act in a special manner, but let other attributes operate as plain attributes. The plain attributes should neither trigger any special code, nor suffer the time penalty of working through method code. In these cases you can use descriptors for attributes. Or closely related to descriptors, you can define properties. Behind the scenes, properties and descriptors amount to the same thing, but the syntax of defining them is rather different; with the difference in definition styles, you get advantages and disadvantages.

Let us first look at a descriptor. The idea here is to assign an instance of a special kind of class to an attribute within another class. This special "descriptor" class is a new-style class that contains methods called .__get__(), .__set__() and __delete__(), or at least a subset of those. If the descriptor class implements at least the first two, it is called a "data descriptor"; if it implements only the first, it is called a "non-data descriptor".

The latter (non-data descriptor) is likely to be used to return a callable object. A non-data descriptor is, in a sense, often a fancy name for a method--but the particular method returned by descriptor access could be determined at runtime. This starts to get us into the scary realm of things similar to metaclasses and decorators, that I have written about before in this column. Of course, a regular method can also decide what code to run based on runtime conditions, so there is nothing fundamentally new about this concept of runtime determination of what a "method" does.

In any case, a data descriptor is more general, so let us look at one. Such a descriptor could return something callable--any Python function or method can return anything, after all. But our example just deals with simple values (and side effects). We want to make any use of a few attributes log the action to STDERR:

>>> class ErrWriter(object):
...     def __get__(self, obj, type=None):
...         print >> sys.stderr, "get", self, obj, type
...         return self.data
...     def __set__(self, obj, value):
...         print >> sys.stderr, "set", self, obj, value
...         self.data = value
...     def __delete__(self, obj):
...         print >> sys.stderr, "delete", self, obj
...         del self.data
>>> class Foo(object):
...     this = ErrWriter()
...     that = ErrWriter()
...     other = 4
>>> foo = Foo()
>>> foo.this = 5
set <__main__.ErrWriter object at 0x5cec90>
    <__main__.Foo object at 0x5cebf0> 5
>>> print foo.this
get <__main__.ErrWriter object at 0x5cec90>
    <__main__.Foo object at 0x5cebf0> <class '__main__.Foo'>
5
>>> print foo.other
4
>>> foo.other = 6
>>> print foo.other
6

The class Foo defines this and that as descriptors of the ErrWriter class. The attribute other is just a plain class attribute. Actually, there is a caveat here. On first access to foo.other, we read the class attribute; after we assign a value, we actually read the instance attribute instead. The class attribute is still there, just hidden, i.e.:

>>> foo.other
6
>>> foo.__class__.other
4

In contrast, the descriptor stays a class-level object, even though you can access it through the instance. This has the usually undesirable effect of making the descriptor something like a singleton. E.g.:

>>> foo2 = Foo()
>>> foo2.this
get <__main__.ErrWriter object at 0x5cec90>
    <__main__.Foo object at 0x5cebf0> <class '__main__.Foo'>
5

To simulate a usual "per instance" behavior, you would need to make use of the obj passed into ErrWriter magic methods. This obj is the instance that has the descriptor. So you might define a non-singleton descriptor like:

class ErrWriter(object):
    def __init__(self):
        self.inst = {}
    def __get__(self, obj, type=None):
        return self.inst[obj]
    def __set__(self, obj, value):
        self.inst[obj] = value
    def __delete__(self, obj):
        del self.inst[obj]

Properties

Properties work like descriptors, but are generally defined inside a particular class rather than being created as "utility descriptors" that various classes might utilize. As with "regular" descriptors, the idea is to define "getters", "setters" and "deleters". After that, you use the special function property() to turn those methods into a descriptor. For the hyper-technical readers (those whose brains are still more-or-less operational): property is not really a function, but a type--don't worry about it.

Oddly, properties bring us full circle to the brief description I gave at top about how the Ruby programming languge works. A property is just a thing that looks like an attribute syntactically (as used), but is defined by defining all the getters, setters, and so on. If you wanted to, you could impose complete "Ruby-discipline" in Python, and never access "real" attributes. More likely you will want to mix-and-match though. Let us see how a property works:

class FooP(object):
    def getX(self): return self.__x
    def setX(self, value): self.__x = value
    def delX(self): del self.__x
    x = property(getX, setX, delX, "I'm the 'x' property.")

The names of the getter, setter, and deleter are nothing reserved. Usually you will want to use sensible names like the above. What they actually do can be anything, but often it is reasonable to use double-underscore versions of names for the attributes. These attributes get attached to the instance, just with the usual Python name mangling for "semi-hidden" attributes. Moreover, the methods remain perfectly usable too:

>>> foop = FooP()
>>> foop.x = 'FooP x'
>>> foop.getX()
'FooP x'
>>> foop._FooP__x
'FooP x'
>>> foop.x
'FooP x'

Let Confusion Reign

We have seen, in this installment, far too many ways to make Python instance attributes act like (or be) method calls. Unlike in some of these installments, I really do not have any clear advice for readers on how to cut through the complexity. I would like to be able to tell you to simply choose one of the described techniques, and ignore the others as inferior or less general. Unfortunately, each technique described has strengths, and has weaknesses. Each is pretty reasonable for certain programming contexts, even though the syntax and semantics of each is so radically different from the others.

Moreover, while I have not described it in this article, I have thought (vaguely) of a number of other even more obscure ways that a programmer might use metaclasses, class factories, and decorators to obtain similar effects to the "standard" half-dozen techniques I have outlined. Those ideas would truly probe into some dark corners of Python metaprogramming.

What would be nice would be if all the things I described were possible, but the variations among them were simply parameterized in some straightforward way rather than using wholly different syntax and organization. The grand goal of Python 3000 is a simplification along these lines; but I have not seen any concrete proposals on how such unification and simplification of attributes-as-methods might work. One idea that occurs to me is that Python might enable decorators for classes (along with the current use in methods and functions), and also provide some standard module of decorators for the most common "magic attribute" behaviors. This is speculation, and I do not know exactly how it might work, but I can just imagine such a thing could hide the complexity from the 95% of Python programmers who really do not wish to worry too much about Python internals and cryptic mojo.

Resources

Raymond Hettinger's "How-To Guide for Descriptors" is an excellent, albeit fairly dense, description of the low-down on exactly how descriptors work.

http://users.rcn.com/python/download/Descriptor.htm

The Python PEP 3000 describes the goals for Python 3.0 (also called "Python 3000"). The main purpose of this future version is removal of redundancy in programming techniques. Nothing is currently detailed about reducing the chaos about attributes-as-methods, but a programmer can dream:

http://www.python.org/dev/peps/pep-3000/

About The Author

Picture of Author David Mertz almost enjoys problems because of the solutions they enable. David may be reached at [email protected]; his life pored over athttp://gnosis.cx/publish/. Check out David's book Text Processing in Python (http://gnosis.cx/TPiP/).