itertools ModuleDavid Mertz, Ph.D.
 Idempotentate, Gnosis Software, Inc.
 April, 2003
 
Python 2.2 introduced simple generators to Python language, and reconceived standard loops in terms of underlying iterators. With Python 2.3, generators become standard (no need for _future_), and the new moduleitertoolsis introduced to work flexibly with iterators. Over time, I think more capabilities will be introduced to this module, but already it is a very interesting thing. Theitertoolsmodules is essentially a set of combinatorial higher-order functions, but ones that work with lazy iterators rather than with finite lists. This installment explores the new module, and gives reader a sense of the new expressive power available with combinatorial iterators.
  The idea of iterators was introduced into Python with version
  2.2. Well, that is not quite true: hints of the idea were already
  present in the older function xrange(), and the file method
  .xreadlines(). Python 2.2 generalized the notion in much of its
  internal implementation, and made programming custom iterators
  much more straightforward by introducing the yield keyword (the
  presence of yield turns a function into a generator, which in
  turn returns an iterator).
The motivation behind iterators is twofold: (1) Working with data as sequences is very frequently the most straightforward approach; (2) A sequence that is processed in linear order often does not need to actually exist all at once.
  The x*() premonitions provide clear examples of these
  principles. If you want to do something a billion times, your
  program will probably take a while to execute, but there is no
  general need for it to claim a lot of memory. Likewise, for many
  types of files, processing can be a line-by-line matter, and
  there is no need to store the whole file in memory.  All sorts
  of other sequences can best be treated lazily too--they might
  depend on data arriving incrementally over a channel, or upon a
  computation that proceeds step-by-step.
  Most of the time, an iterator is utilized within a for loop,
  just like a true sequence is. Iterators provide a .next()
  method that can be invoked explicitly, but 99% of the time, what
  you will see is along the line of:
for x in iterator:
    do_something_with(x)
  The loop is terminated when the behind-the-scenes call to
  iterator.next() raises a StopIteration exception.  By the
  way, a true sequence can be turned into an iterator by calling
  iter(seq)--this will not save any memory, but it can be
  useful in the functions discussed below.
  There is something schizophrenic in Python's attitudes towards
  functional programming (FP). On the one hand, many Python
  developers disparage the traditional FP functions map(),
  filter(), and reduce()--usually recommending using list
  comprehensions in their place. But the whole of the itertools
  module is composed of function of the very same sort, merely
  operating on "lazy sequences" (iterators) rather than on
  completed sequences (lists, tuples). Furthermore, there is no
  syntax in Python 2.3 for "iterator comprehensions," which would
  seem to have to same motivation as list comprehensions.
  I suspect Python will eventually grow some form of iterator
  comprehension; but it depends on finding a suitably natural
  syntax for them. In the meanwhile, we have a number of useful
  combinatorial functions in the itertools module. Broadly, what
  every one of these functions do is accept some
  parameters--usually including some basis iterators--and return a
  new iterator. For example, the functions ifilter(), imap(),
  and izip() are directly equivalent to the respective builtins
  that lack the initial i.
  There is no ireduce(), in itertools, although it might seem
  natural; a possible Python implementation is:
def ireduce(func, iterable, init=None):
    if init is None:
        iterable = iter(iterable)
        curr = iterable.next()
    else:
        curr = init
    for x in iterable:
        curr = func(curr, x)
        yield curr
  The use case for ireduce() is similar to that for reduce().
  For example, suppose you wanted to add a list of numbers
  contained in a large file, but stop when a condition is met. You
  could monitor the ongoing total with:
from operator import add
from itertools import *
nums = open('numbers')
for tot in takewhile(condition, ireduce(add, imap(int, nums)):
    print "total =", tot
A more real-world example is probably something like applying a stream of events to a stateful object, such as to a GUI widget. But even the simple example above shows the FP flavor of iterator combinators.
  All the functions in in itertools can easily be implemented in
  pure Python, as generators. The main point of including the
  module in Python 2.3+ is to provide standard behaviors and
  names for some useful functions.  While programmers could write
  their own versions, each one would create a slightly
  incompatible variation in practice.  The other point, however,
  is to implement iterator combinators in efficient C code.
  Using itertools functions will be a bit faster than writing
  your own combinators.  The standard documentation shows
  equivalent pure-Python implementations for each itertools
  function, so there is no need to repeat those in this article.
  The functions in itertools are basic enough--and distinctly
  named enough--that it probably makes sense to import all the
  names from the module. The function enumerate(), for example,
  might sensibly live in itertools, but is instead a builtin
  function in Python 2.3+. Notably, you can easily express
  enumerate() in terms of itertools functions:
from itertools import * enumerate = lambda iterable: izip(count(), iterable)
  Let us look first at the few itertools functions that do not
  use other iterators as a basis, but simply create iterators "from
  scratch." times() returns an iterator that yields an identical
  object multiple times; in itself, this capability is moderately
  useful, but it is really nice as a substitute for the superfluous
  use of xrange() and index variable to simply repeat an action.
  I.e., rather than:
for i in xrange(1000):
    do_something()
You can now use the more neutral:
for _ in times(1000):
    do_something
  If no second argument is given to times(), it simply yields
  None repeatedly.  The function repeat() is similar to
  times(), but unboundedly returns the same object.  This
  iterator is useful either where a loop has an independent
  break condition, or within combinators like izip() and
  imap().
  The function count() is a bit like a cross between repeat()
  and xrange().  count() returns consecutive integers
  unboundedly (starting at an optional argument).  However, given
  that count() does not currently support overflow to longs
  correctly now, you might as well use xrange(n,sys.maxint)
  still; it's not literally unbounded, but for most purposes it
  amounts to the same thing.  Like repeat(), count() is
  particularly useful inside other iterator combinators.
  A few of the real combinatorial functions in itertools have
  been mentioned in passing. ifilter(), izip(), and imap()
  act just as you would expect from their corresponding sequence
  functions. ifilterfalse() is a convenience so that you do not
  need to negate a predicate function in a lambda or a def (and
  it saves significant function call overhead). But functionally,
  you could define ifilterfalse() (approximately, ignoring the
  None predicate) as:
def ifilterfalse(predicate, iterable):
    return ifilter(lambda predicate: not predicate, iterable)
  The functions dropwhile() and takewhile() partition an
  iterator by a predicate.  The former ignores yielded elements
  until a predicate is fulfilled, the latter yields while a
  predicate holds.  dropwhile() skips an indefinite number of
  initial elements of an iterator, so might not begin iterating
  until after a delay.  takewhile() starts right away, but
  terminates the iterator if the passed in predicate become true.
  The function islice() is basically just the iterator version
  of list slicing.  You can specify a start, stop, and step, just
  as with regular slices.  If a start is given, a number of
  elements are dropped until the passed in iterator reaches the
  element of interest.  This is another case where I think
  refinement to Python is possible--the best thing would be for
  iterators to simply recognize slices, just as lists do (as a
  synonym for what islice() does).
  One final function, starmap() is a slight variation on
  imap().  If the function passed in as an argument takes
  multiple arguments, the iterable passed should yield tuples of
  the right size.  Basically this is the same as imap() with
  several iterables passed in, only with the collection of
  iterables previously combined with izip().
  The functions included in itertools make for a good start. If
  nothing else, they encourage Python programmers to become more
  comfortable with utilizing and combining iterators. In a general
  way, widespread use of iterators is clearly important to the
  future of Python.  But past what is included, there are some
  others that I would recommend for future updates to the module.
  Readers can easily use these immediately--of course, if they
  are later included, the names or interface details could
  differ.
  One category that would seem to be of general use is functions
  that take multiple iterables as arguments, then yield
  individual elements from each iterable.  In contrast to this,
  izip() yields tuples of elements, and imap() yields values
  computed based on the basis elements.  The two obvious
  arrangements, to my mind, are chain() and weave().  The
  first is similar in effect to concatenation of sequences (but
  appropriately lazy).  That is, where for plain sequences you
  might use, e.g.:
for x in ('a','b','c') + (1, 2, 3):
    do_something(x)
For general iterables, you could use:
for x in chain(iter1, iter2, iter3):
    do_something(x)
A Python implementation is:
def chain(*iterables):
    for iterable in iterables:
        for item in iterable:
            yield item
  With iterables, you might also combine several by interspersing
  them.  There is not builtin syntax to do the same with
  sequences, but weave() itself works fine for completed
  sequences also.  A possible implementation is below (Magnus Lie
  Hetland proposed a similar function on comp.lang.python):
def weave(*iterables):
    "Intersperse several iterables, until all are exhausted"
    iterables = map(iter, iterables)
    while iterables:
        for i, it in enumerate(iterables):
            try:
                yield it.next()
            except StopIteration:
                del iterables[i]
  Let me illustrate the behavior of weave() since it might not be
  immediately obvious from the implementation:
>>> for x in weave('abc', xrange(4), [10,11,12,13,14]):
...     print x,
...
a 0 10 b 1 11 c 2 12 13 3 14
Even after some iterators are exhausted, the remaining ones continue to yield values, until everything available is yielded at some point.
  I will propose just one more possible itertools function. This
  one owes a lot to functional programming ways of conceiving
  problems. icompose() has a certain symmetry with the function
  ireduce() suggested above. But where ireduce() feeds a (lazy)
  sequence of values to a function, and yields each result,
  icompose() applies a sequence of functions to the return value
  of each predecessor function. A likely use of ireduce() is to
  feed a sequence of events to an long-lived object. icompose()
  might instead repeatedly pass an object to a mutator function
  that returns a new object. The first is a fairly traditional OOP
  way of thinking about events, while the second is more of an FP
  way of thinking.
  Here is a possible implementation of icompose():
def icompose(functions, initval):
    currval = initval
    for f in functions:
        currval = f(currval)
        yield currval
  Iterators--conceived as lazy sequences--are a powerful concept
  that opens new styles of Python programming. There is a subtle
  difference, though, between thinking of an iterator as just a
  data source, and thinking of it in a sequential style. Neither
  way of thinking is more true per se, but the latter opens the
  path towards a combinatorial shorthand for manipulating
  programmatic events. The combinatorial functions in
  itertools--and especially some it might grow, like those I
  suggest--come close to a declarative style of programming. To my
  mind, these declarative styles are generally less error-prone,
  and more powerful.
Charming Python: Iterators and simple generators
http://www-106.ibm.com/developerworks/linux/library/l-pycon
Charming Python: Implementing "weightless threads" with Python
http://www-106.ibm.com/developerworks/linux/library/l-pythrd.html
Charming Python: Generator-based state machines
http://www-106.ibm.com/developerworks/linux/library/l-pygen.html
  
  While David Mertz also likes hubris and impatience, this
  installment is about laziness.  David may be reached at
  [email protected]; his life pored over athttp://gnosis.cx/publish/.
  Suggestions and recommendations on this, past, or future, columns
  are welcomed.