Since I wrote about list comprehensions in Python a few weeks ago I’ve been getting feedback from people impressed with their power, and looking forward to re-factoring their code using comprehensions rather than explicit loops.

Like anything though, there are situations where list comprehensions are useful, but also situations where you’re better served by using some other form. In this article we’ll take an example of where a function factory is the better choice.

The example for this article is generating HTML colour codes in Python. If you needed to produce a smooth(ish) gradient in HTML, and you weren’t too fussy about how it was encoded, then you could knock something up using a table and changing the bgcolor attribute. In this case a list comprehension would be a good approach:

def colourslist(red, green, blue, steps):

 def f((start,end), i, steps): 
 return start + i * (end-start)/float(steps) 

 colours = [tuple([f(c,i,steps-1) 
 for c in [red,green,blue]]) 
 for i in range(steps)]
 return ["%02x%02x%02x" % (rgb) for rgb in colours]

print "<html><body>"
print "<table cellspacing=0 cellpadding=0><tr>"
for colour in colourslist((90,90),(90,255),(90,90),200):
 print "<td width=1px bgcolor=%s> </td>" % (colour)
print "</tr></table>"
print "</body></html>"

Which gives us the expected output of:


Design patterns are at times a pretty contentious issue among Python programmers. Those coming to Python from object oriented languages often can’t live without them, while those who came from languages without an emphasis on patterns can regard them as a set of hammers for people who look at all programs as nails. Whatever you choose to call it, the factory pattern is a useful tool to be used on a range of problems.

The factory pattern involves writing a class that spits out other classes, depending on the parameters it is invoked with. A function factory is similar, but instead of returning classes, we write a function that spits out other functions, customised to the situation. Since functions in Python are first class objects, this is easy:

def colourfactory(red, green, blue, steps):
 def cf(i):
 def f((start,end), i, steps):
 return start + i * (end-start)/float(steps)
 rgb = tuple([f(c,i,steps-1) for c in [red,green,blue]])
 return "#%02x%02x%02x" % (rgb)
 return cf

Calling colourfactory with the same parameters as colourlist will result in a function that is capable of generating the same results. For the previous example where we’re using the same list, this is less efficient, since you incur the added cost of calling the function for every result, but there are three main situations where creating a function is a much better solution:

  • When you need a sparse list — when you don’t need the whole list there’s no reason to spend the time generating it. A function factory allows you to only use the results you need.
  • When you need the list out of order — if you want to access the results in a random order, it’s easier and cleaner to generate them in the order that you need them. A function factory approach will let you order the results any way you wish.
  • When keeping the whole list of results in memory is unfeasible — if you’ve got to generate 100 million results then you’ve got to do the work, but if you’re only using each one once then there’s no reason to store them all in RAM. Using a function factory allows you to build your results piece by piece.

This last advantage can just as easily be achieved by using a generator rather than a list comprehension, but it still has the disadvantage of needing to generate the items in a predetermined order — you can’t easily have random access with a generator. An implementation of the colour building generator is in the next code sample:

def coloursgenerator(red,green,blue,steps):
 for i in range(steps):
 def f((start,end), i, steps):
 return start + i * (end-start)/float(steps) 
 rgb = tuple([f(c,i,steps-1) for c in [red,green,blue]])
 yield "#%02x%02x%02x" % (rgb)

Let’s look at an example of where a function approach is more appropriate. Say we’ve got a list of system loads, like those generated by the uptime utility on UNIX or Linux, and we want colour code them as a visual cue to how much load the system is under. What we want is something that looks like the following:

load average: 0.36 0.43 0.48
load average: 4.23 1.87 0.98
load average: 3.97 2.67 1.39
load average: 4.01 3.03 1.61

We’ll use the load amounts as the percentage along the gradient to colour the table cell (assuming that they max out at 5.00 for now). The implementation:

load_averages = 
 [[0.36, 0.43, 0.48], 
 [4.23, 1.87, 0.98], 
 [3.97, 2.67, 1.39], 
 [4.01, 3.03, 1.61]]

colourfunc = colourfactory((255,255), (255,0), (255,0), 500)

print "<table cellspacing=5 cellpadding=0>"
for load in load_averages:
 print "<tr><td>load average: </td>"
 for l in load:
 print "<td bgcolor=%s>%s</td>" % (colourfunc(l*100),l)
 print "</tr>"
print "</table>"

There’s no approach that’s best 100 percent of the time, but by knowing all of the options you can be sure you’ve got the tools to write the best, most efficient and easiest to maintain code possible. There are situations where lists are definitely better, like when you want to precompute as much of the data as possible so as not to slow down the execution of the algorithm, but in situations where you’re doing random access and you only want some of the possible data you should try a function approach first.