The heap is an integral component in many algorithms -- a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

The heap is an integral component in many algorithms -- a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

Python has a module for implementing heaps on ordinary lists, allowing the creation of quick and easy priority queues for when you need a consistently sorted list between additions and removals. The heapq module defines functions for a minheap - which always returns the smallest item. To set up the heap, you add values using heappush and remove them using heappop:

>>> from heapq import heappush, heappop >>> heap = [] >>> heappush(heap, 2) >>> heappush(heap, 10) >>> heappush(heap, 4) >>> heappush(heap, 5) >>> heappop(heap) 2 >>> heappop(heap) 4 >>> heappop(heap) 5 >>> heappop(heap) 10

If you have a list populated already you can turn it into a heap by using the heapify function from heapq:

>>> from heapq import heapify >>> heap = [3,4,5,6,9,1,7,2,8] >>> heapify(heap) >>> heap [1, 2, 3, 4, 9, 5, 7, 6, 8] >>> heappop(heap) 1

It's important to note here that heapify does not sort the list. A heap, when implemented in a list, is closer to a flattened tree structure. In a heap, any element in the list heap[k] must be smaller or equal to the element heap[k*2 + 1] and heap[k*2 + 2]. The first item in the list is always the smallest. When it is removed, then the smaller of the second and third elements become the top, and the rest of the heap is adjusted.

If you're interested in more information on heaps, you should read the Wikipedia page.

Using a heap in this way (adding all the items at once and removing them all at once) has no benefits over simply sorting a list. The real benefit of priority queues come when you need to remove the minimum value in between adding values. Take the following example:

from random import randint def minheap(n, Print=True): numbers = [randint(0,100) for i in range(n)] heapify(numbers) if Print: print "Original: %s" % (numbers) for i in range(n): newint = randint(0,100) heappush(numbers, newint) oldint = heappop(numbers) if Print: print "new number: %s, minimum: %s, heap: %s" % (newint, oldint, numbers) >>> minheap(10) Original: [8, 33, 59, 48, 46, 74, 70, 97, 54, 78] new number: 59, minimum: 8, heap: [33, 46, 59, 48, 59, 74, 70, 97, 54, 78] new number: 58, minimum: 33, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78] new number: 30, minimum: 30, heap: [46, 48, 59, 54, 58, 74, 70, 97, 59, 78] new number: 66, minimum: 46, heap: [48, 54, 59, 59, 58, 74, 70, 97, 66, 78] new number: 54, minimum: 48, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78] new number: 12, minimum: 12, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78] new number: 13, minimum: 13, heap: [54, 54, 59, 59, 58, 74, 70, 97, 66, 78] new number: 91, minimum: 54, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91] new number: 31, minimum: 31, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91] new number: 17, minimum: 17, heap: [54, 58, 59, 59, 78, 74, 70, 97, 66, 91]

To demonstrate that using a heap can be much faster, we'll write a version that keeps the list sorted using an insertion sort:

def minlist(n, Print=True): numbers = sorted([randint(0,100) for i in range(n)]) if Print: print "Original: %s" % (numbers) for i in range(n): newint = randint(0,100) if newint >= numbers[n-1]: numbers.append(newint) else: for j in range(n): if newintNow let's see how they do against each other once we increase the number of elements:

>>> i = time(); minlist(10000, False); print time()-i 11.6881940365 >>> i = time(); minheap(10000, False); print time()-i 0.142065048218 >>> i = time(); minheap(50000, False); print time()-i 0.621912002563 >>> i = time(); minheap(100000, False); print time()-i 1.33778500557 >>> i = time(); minheap(1000000, False); print time()-i 14.6926519871Here you can see that the heap version can handle almost one hundred times as many numbers as the flat list based one and be the same speed.

If you need more control or more functionality, and some classical algorithms demand it, then you'll need to implement your own heap. For the basics though, the Python heapq module gives you what you need, with just a handful of functions to learn.