Developer

Lazy list builders: Generators in Python

Sometimes your program is just too motivated, and does all this work you don't need or want it to do -- you want it to be lazier. That's where generators come in. Using a generator in Python lets you choose exactly how much you want done, and when.

Sometimes your program is just too motivated, and does all this work you don't need or want it to do — you want it to be lazier. That's where generators come in. Using a generator in Python lets you choose exactly how much you want done, and when.

Last week we introduced you to list comprehensions, which give you a more natural way of representing the contents of a list. This article demonstrates their cousins: generators, which can build a sequence piece by piece, doing as much or as little as you need.

This is called lazy evaluation, which delays the computation of a particular value up until the point where it is needed by the program. Lazy evaluation is useful in many types of programming since it provides a performance benefit by eliminating unnecessary calculation of values that are never used. Some languages, such as Haskell and Miranda, use lazy evaluation by default, while others, like Scheme, OCaml and Python allow it through special syntax. In Python the way you achieve lazy evaluation is through generators.

A generator allows you to write a function that can return a result and pause, resuming in the same place the next time you call the function. You don't need any special syntax to write a generator, except to denote when an intermediate value is to be returned using the yield statement. The easiest way to demonstrate this is with an example:

>>> def generator1():
...     yield "first"
...     yield "second"
...     yield "third"
... 
>>> gen = generator1()
>>> gen

>>> gen.next()
'first'
>>> gen.next()
'second'
>>> gen.next()
'third'
>>> gen.next()
Traceback (most recent call last):
  File "", line 1, in ?
StopIteration
>>> 

In this example, first we define a generator called generator1 which will yield three values, the strings "first", "second" and "third". When we create a new generator object (gen) it begins the function. Each time you call the generator's next method, it continues the function until the next yield. When the generator reaches the end of the block, or reaches a return statement, it throws a StopIteration exception.

Generators are essentially iterators, which means they can be used instead of sequences (lists etc.) in many expressions. For example, rather than the above you could just as easily write:

>>> gen2 = generator1()
>>> for i in gen2:
...     print i
... 
first
second
third

Likewise, you can convert a generator into a list all at once:

>>> gen3 = generator1()
>>> list(gen3)
['first', 'second', 'third']

Let's take an example of where this is actually useful. The following is a fairly standard function that produces combinations, that is, the number of unique ways a sequence can be chopped into smaller sequences:

def combination(seq, length):
	if not length: return [[]]
	else:
		l = []
		for i in xrange(len(seq)):
			for result in combination(seq[i+1:], length-1):
				l += [[seq[i]]+result]
		return l

It works as follows:

>>> combination("ABCDE", 3)
[['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'B', 'E'], ['A', 'C', 'D'], ['A', 'C', 'E'], ['A', 'D', 'E'], ['B', 'C', 'D'], ['B', 'C', 'E'], ['B', 'D', 'E'], ['C', 'D', 'E']]
>>> combination("ABCDE", 2)
[['A', 'B'], ['A', 'C'], ['A', 'D'], ['A', 'E'], ['B', 'C'], ['B', 'D'], ['B', 'E'], ['C', 'D'], ['C', 'E'], ['D', 'E']]
>>> combination("ABCDE", 5)
[['A', 'B', 'C', 'D', 'E']]

Now let's turn it into a version that uses generators. The trick is simply to replace every point at which you would add a value to the end results list and replace it with a yield statement:

def xcombination(seq,length):
if not length: yield []
else:
for i in xrange(len(seq)):
for result in xcombination(seq[i+1:], length-1):
yield [seq[i]]+result

Now we have a version that works exactly the same, except that it does the work only as you need it:

>>> comb = xcombination("ABCDE", 3)
>>> comb.next()
['A', 'B', 'C']
>>> comb.next()
['A', 'B', 'D']
>>> list(comb)
[['A', 'B', 'E'], ['A', 'C', 'D'], ['A', 'C', 'E'], ['A', 'D', 'E'], ['B', 'C', 'D'], ['B', 'C', 'E'], ['B', 'D', 'E'], ['C', 'D', 'E']]
>>> comb2 = xcombination("ABCDE",2)
>>> for i in xrange(3):
... print comb2.next()
... ['A', 'B']
['A', 'C']
['A', 'D']

In the last command, even though there are 10 different combinations of letters, only three are generated, saving us 70 percent of the computation time of the standard function.

The really useful thing about generators is that they can, for most needs, be used as a drop in replacement for list comprehensions. All you need to do is replace the brackets that would go around list comprehensions with parentheses. Take the last example in the article on list comprehensions. Rather than:

>>> guests = ['Chris', 'Brendan', 'Jimmy', 'Mel', 'Mike', 'Jess']
>>> [(seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2]
...

You can write:

>>> guests = ['Chris', 'Brendan', 'Jimmy', 'Mel', 'Mike', 'Jess']
>>> ((seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2)

And then you can use it like you would any other generator:

>>> seating = ((seat1, seat2) for seat1 in guests for seat2 in guests if seat1 != seat2)
>>> for i in xrange(10):
... print seating.next()
...
('Chris', 'Brendan')
('Chris', 'Jimmy')
('Chris', 'Mel')
('Chris', 'Mike')
('Chris', 'Jess')
('Brendan', 'Chris')
('Brendan', 'Jimmy')
('Brendan', 'Mel')
('Brendan', 'Mike')
('Brendan', 'Jess')

Up til now we've been using generators as a way of building lists to save computation time over other methods. That's great, but where they really shine is when computing the whole list would be impossible. Take the Fibonacci sequence, in which each number is the sum of the previous two. Say we want a function that generates the sequence up to a given number:

>>> def fib(n):
...     a, b = 0, 1
...     l = [a]
...     while b >> fib(20)
[0, 1, 1, 2, 3, 5, 8, 13]

Now this works fine, so long as we want to stop counting up when we reach a certain number. If we want to change the criteria for stopping, however, we have to rewrite the function. Instead we can implement it as a generator (implementation borrowed from the python PEP on generators):

>>> def xfib():
...     a,b = 0,1
...     while True:
...             yield b
...             a, b = b, a+b
... 
>>> fibseries = xfib()
>>> b = fibseries.next()
>>> while b 

Or, if we want to stop at the first palindrome (greater than one digit) we just need to change the loop condition:

>>> fibseries = xfib()
>>> b = fibseries.next()
>>> while b ... print b
... b = fibseries.next()
...
1
1
2
3
5
8
13
21
34

And that's it (The next value in the series is 55). By using a generator we can keep the list building implementation separate from the logic relating to when to stop generating it, while still only calculating as many values as we need.

When should you use generators over list comprehensions? Firstly, if you're going to use the whole list anyway, it's better to use a list comprehension — they're faster since they don't have the added overhead of a generator function call. If you're going to use the first part of the list, use a generator, since you'll save CPU time.

Editor's Picks

Free Newsletters, In your Inbox