itertools
ModuleDavid Mertz, Ph.D.
Idempotentate, Gnosis Software, Inc.
April, 2003
Python 2.2 introduced simple generators to Python language, and reconceived standard loops in terms of underlying iterators. With Python 2.3, generators become standard (no need for _future_), and the new moduleitertools
is introduced to work flexibly with iterators. Over time, I think more capabilities will be introduced to this module, but already it is a very interesting thing. Theitertools
modules is essentially a set of combinatorial higher-order functions, but ones that work with lazy iterators rather than with finite lists. This installment explores the new module, and gives reader a sense of the new expressive power available with combinatorial iterators.
The idea of iterators was introduced into Python with version
2.2. Well, that is not quite true: hints of the idea were already
present in the older function xrange()
, and the file method
.xreadlines()
. Python 2.2 generalized the notion in much of its
internal implementation, and made programming custom iterators
much more straightforward by introducing the yield
keyword (the
presence of yield
turns a function into a generator, which in
turn returns an iterator).
The motivation behind iterators is twofold: (1) Working with data as sequences is very frequently the most straightforward approach; (2) A sequence that is processed in linear order often does not need to actually exist all at once.
The x*()
premonitions provide clear examples of these
principles. If you want to do something a billion times, your
program will probably take a while to execute, but there is no
general need for it to claim a lot of memory. Likewise, for many
types of files, processing can be a line-by-line matter, and
there is no need to store the whole file in memory. All sorts
of other sequences can best be treated lazily too--they might
depend on data arriving incrementally over a channel, or upon a
computation that proceeds step-by-step.
Most of the time, an iterator is utilized within a for
loop,
just like a true sequence is. Iterators provide a .next()
method that can be invoked explicitly, but 99% of the time, what
you will see is along the line of:
for x in iterator: do_something_with(x)
The loop is terminated when the behind-the-scenes call to
iterator.next()
raises a StopIteration
exception. By the
way, a true sequence can be turned into an iterator by calling
iter(seq)
--this will not save any memory, but it can be
useful in the functions discussed below.
There is something schizophrenic in Python's attitudes towards
functional programming (FP). On the one hand, many Python
developers disparage the traditional FP functions map()
,
filter()
, and reduce()
--usually recommending using list
comprehensions in their place. But the whole of the itertools
module is composed of function of the very same sort, merely
operating on "lazy sequences" (iterators) rather than on
completed sequences (lists, tuples). Furthermore, there is no
syntax in Python 2.3 for "iterator comprehensions," which would
seem to have to same motivation as list comprehensions.
I suspect Python will eventually grow some form of iterator
comprehension; but it depends on finding a suitably natural
syntax for them. In the meanwhile, we have a number of useful
combinatorial functions in the itertools
module. Broadly, what
every one of these functions do is accept some
parameters--usually including some basis iterators--and return a
new iterator. For example, the functions ifilter()
, imap()
,
and izip()
are directly equivalent to the respective builtins
that lack the initial i
.
There is no ireduce()
, in itertools
, although it might seem
natural; a possible Python implementation is:
def ireduce(func, iterable, init=None): if init is None: iterable = iter(iterable) curr = iterable.next() else: curr = init for x in iterable: curr = func(curr, x) yield curr
The use case for ireduce()
is similar to that for reduce()
.
For example, suppose you wanted to add a list of numbers
contained in a large file, but stop when a condition is met. You
could monitor the ongoing total with:
from operator import add from itertools import * nums = open('numbers') for tot in takewhile(condition, ireduce(add, imap(int, nums)): print "total =", tot
A more real-world example is probably something like applying a stream of events to a stateful object, such as to a GUI widget. But even the simple example above shows the FP flavor of iterator combinators.
All the functions in in itertools
can easily be implemented in
pure Python, as generators. The main point of including the
module in Python 2.3+ is to provide standard behaviors and
names for some useful functions. While programmers could write
their own versions, each one would create a slightly
incompatible variation in practice. The other point, however,
is to implement iterator combinators in efficient C code.
Using itertools
functions will be a bit faster than writing
your own combinators. The standard documentation shows
equivalent pure-Python implementations for each itertools
function, so there is no need to repeat those in this article.
The functions in itertools
are basic enough--and distinctly
named enough--that it probably makes sense to import all the
names from the module. The function enumerate()
, for example,
might sensibly live in itertools
, but is instead a builtin
function in Python 2.3+. Notably, you can easily express
enumerate()
in terms of itertools
functions:
from itertools import * enumerate = lambda iterable: izip(count(), iterable)
Let us look first at the few itertools
functions that do not
use other iterators as a basis, but simply create iterators "from
scratch." times()
returns an iterator that yields an identical
object multiple times; in itself, this capability is moderately
useful, but it is really nice as a substitute for the superfluous
use of xrange()
and index variable to simply repeat an action.
I.e., rather than:
for i in xrange(1000): do_something()
You can now use the more neutral:
for _ in times(1000): do_something
If no second argument is given to times()
, it simply yields
None
repeatedly. The function repeat()
is similar to
times()
, but unboundedly returns the same object. This
iterator is useful either where a loop has an independent
break
condition, or within combinators like izip()
and
imap()
.
The function count()
is a bit like a cross between repeat()
and xrange()
. count()
returns consecutive integers
unboundedly (starting at an optional argument). However, given
that count()
does not currently support overflow to longs
correctly now, you might as well use xrange(n,sys.maxint)
still; it's not literally unbounded, but for most purposes it
amounts to the same thing. Like repeat()
, count()
is
particularly useful inside other iterator combinators.
A few of the real combinatorial functions in itertools
have
been mentioned in passing. ifilter()
, izip()
, and imap()
act just as you would expect from their corresponding sequence
functions. ifilterfalse()
is a convenience so that you do not
need to negate a predicate function in a lambda
or a def
(and
it saves significant function call overhead). But functionally,
you could define ifilterfalse()
(approximately, ignoring the
None
predicate) as:
def ifilterfalse(predicate, iterable): return ifilter(lambda predicate: not predicate, iterable)
The functions dropwhile()
and takewhile()
partition an
iterator by a predicate. The former ignores yielded elements
until a predicate is fulfilled, the latter yields while a
predicate holds. dropwhile()
skips an indefinite number of
initial elements of an iterator, so might not begin iterating
until after a delay. takewhile()
starts right away, but
terminates the iterator if the passed in predicate become true.
The function islice()
is basically just the iterator version
of list slicing. You can specify a start, stop, and step, just
as with regular slices. If a start is given, a number of
elements are dropped until the passed in iterator reaches the
element of interest. This is another case where I think
refinement to Python is possible--the best thing would be for
iterators to simply recognize slices, just as lists do (as a
synonym for what islice()
does).
One final function, starmap()
is a slight variation on
imap()
. If the function passed in as an argument takes
multiple arguments, the iterable passed should yield tuples of
the right size. Basically this is the same as imap()
with
several iterables passed in, only with the collection of
iterables previously combined with izip()
.
The functions included in itertools
make for a good start. If
nothing else, they encourage Python programmers to become more
comfortable with utilizing and combining iterators. In a general
way, widespread use of iterators is clearly important to the
future of Python. But past what is included, there are some
others that I would recommend for future updates to the module.
Readers can easily use these immediately--of course, if they
are later included, the names or interface details could
differ.
One category that would seem to be of general use is functions
that take multiple iterables as arguments, then yield
individual elements from each iterable. In contrast to this,
izip()
yields tuples of elements, and imap()
yields values
computed based on the basis elements. The two obvious
arrangements, to my mind, are chain()
and weave()
. The
first is similar in effect to concatenation of sequences (but
appropriately lazy). That is, where for plain sequences you
might use, e.g.:
for x in ('a','b','c') + (1, 2, 3): do_something(x)
For general iterables, you could use:
for x in chain(iter1, iter2, iter3): do_something(x)
A Python implementation is:
def chain(*iterables): for iterable in iterables: for item in iterable: yield item
With iterables, you might also combine several by interspersing
them. There is not builtin syntax to do the same with
sequences, but weave()
itself works fine for completed
sequences also. A possible implementation is below (Magnus Lie
Hetland proposed a similar function on comp.lang.python):
def weave(*iterables): "Intersperse several iterables, until all are exhausted" iterables = map(iter, iterables) while iterables: for i, it in enumerate(iterables): try: yield it.next() except StopIteration: del iterables[i]
Let me illustrate the behavior of weave()
since it might not be
immediately obvious from the implementation:
>>> for x in weave('abc', xrange(4), [10,11,12,13,14]): ... print x, ... a 0 10 b 1 11 c 2 12 13 3 14
Even after some iterators are exhausted, the remaining ones continue to yield values, until everything available is yielded at some point.
I will propose just one more possible itertools
function. This
one owes a lot to functional programming ways of conceiving
problems. icompose()
has a certain symmetry with the function
ireduce()
suggested above. But where ireduce()
feeds a (lazy)
sequence of values to a function, and yields each result,
icompose()
applies a sequence of functions to the return value
of each predecessor function. A likely use of ireduce()
is to
feed a sequence of events to an long-lived object. icompose()
might instead repeatedly pass an object to a mutator function
that returns a new object. The first is a fairly traditional OOP
way of thinking about events, while the second is more of an FP
way of thinking.
Here is a possible implementation of icompose()
:
def icompose(functions, initval): currval = initval for f in functions: currval = f(currval) yield currval
Iterators--conceived as lazy sequences--are a powerful concept
that opens new styles of Python programming. There is a subtle
difference, though, between thinking of an iterator as just a
data source, and thinking of it in a sequential style. Neither
way of thinking is more true per se, but the latter opens the
path towards a combinatorial shorthand for manipulating
programmatic events. The combinatorial functions in
itertools
--and especially some it might grow, like those I
suggest--come close to a declarative style of programming. To my
mind, these declarative styles are generally less error-prone,
and more powerful.
Charming Python: Iterators and simple generators
http://www-106.ibm.com/developerworks/linux/library/l-pycon
Charming Python: Implementing "weightless threads" with Python
http://www-106.ibm.com/developerworks/linux/library/l-pythrd.html
Charming Python: Generator-based state machines
http://www-106.ibm.com/developerworks/linux/library/l-pygen.html
While David Mertz also likes hubris and impatience, this installment is about laziness. David may be reached at [email protected]; his life pored over athttp://gnosis.cx/publish/. Suggestions and recommendations on this, past, or future, columns are welcomed.