The unfortunate fact is although optimisation can only take you so far, the true efficiency issues are going to lie in your algorithm design.

When you're designing or writing software, one issue that can often be glossed over is the matter of efficiency. It's so easy at the beginning of a project to just concentrate on getting something working, so you can demonstrate progress, and then worry about making it fast later on. The unfortunate fact is though optimisation can only take you so far, the true efficiency issues are going to lie in your algorithm design. Most IT professionals have learned the basics at some point in their career, but in case you're a little rusty read on and we'll refresh your memory.

The first thing to consider is what kind of complexity you're looking to reduce. The two major complexity areas are time — that is, how long an operation will take to complete — and space, or how much memory is needed. When talking complexity, we tend to rate speed in terms of how many steps (or blocks of memory for space complexity) are taken per input variable, rather than in absolutes, since they are so dependent on the specifics of the hardware. Likewise, the length of time an individual step will take is largely disregarded since for large inputs this time will be dominated by the complexity class.

To make comparing two algorithms easier we group them into classes by using a special kind of notation. There are a number of different ways to do this, based upon the best case, average case and worst case input scenario. I like to use the worst case most of the time, since that's the time it's going to make the most difference to how you perceive performance. To express this we use what's called big O notation, which expresses the number of steps an algorithm will take for an input of size "n" in the worst case. So, take the following example, which simply sums the numbers in a list.

sum(a) { final_sum = 0 n = length(a) for (i = 0; i < n; i++) { sum += a[i] } return final_sum }

Treating each line as a single step, we can see that calling sum on a list of size n will take n+4 steps to complete, two for the initialisation of final_sum and n, one to set up the for loop, one for the return statement and then n times one for the loop body.

The problem has changed, and now you need to multiply each number by how many times it occurs in the list before adding it to the running total. Take the following implementation:

sum_multiple(a) { final_sum = 0 n = length(a) for (i = 0; i < n; i++) { num = 0 for (j = 0; j < n; j++) { if (a[j] == a[i]) { num++ } } final_sum += a[i] * num } return final_sum }

This does similarly to the last function, with the exception that before adding the current value to the running total, it goes through the list and counts the number of occurrences of each value. Calling this function of a list of size n means that 4 + n * (1 + n * 2) steps are carried out since the outer loop now contains 2n + 1 steps. In total this means that calling this function "costs" 2n^{2} + n + 4 steps. For a list of 10 numbers it takes 214 steps, but for a list of 100 numbers it will need more than 20,000 steps to complete. That's quite an increase. When we rewrite it in another way, however, this changes:

sum_multiple2(a) { final_sum = 0 n = length(a) numbers = dict() for (i = 0; i < n; i++) { if (numbers.has(a[i])) { numbers[a[i]]++ } else { numbers[a[i]] = 1 } } for (j = 0; j < n; j++) { final_sum += a[j] * numbers[a[j]] } return final_sum }

In this example we precompute the number of times each value occurs in the list. To do this we use a new data type which can store these values. It's not particularly important how this is implemented so long as we can be sure that we can insert and retrieve values in constant time. In languages that support them as standard this could be a hash or a dictionary, or if you're not that lucky (say you're using C) then you can think of it as an integer array of size max(a). The method simply returns true if this type contains a the given value.

Anyhow, you can see how rather than work out how many times each number occurs as we reach it we can do it all at the beginning and store it. Let's look at how this helps — sum_multiple2 takes 3n + 6 steps: the usual initialisation steps, plus two for each input to build the dictionary of number occurrences, and then one for each input to sum them. For 10 inputs this will take 36 steps, for one hundred: 306. That's more than 65 times faster for the second version when dealing with 100 inputs. If say, we had one million inputs it becomes two trillion vs three million and the second version is more than 650,000 times faster.

Now we've been taking a fairly casual view of the number of steps in each algorithm, treating each line as one step, when a statement like "sum += a[j] * numbers[a[j]]" contains multiple lookups and could be compiled into as many as 10 individual instructions on a hardware level. This is not really that important though, when you think about it, even if we assume that every step we've counted in the second example really takes 10, and the first program is unchanged then it still represents more than a 60,000 times improvement.

Really what we're interested in is the order of the algorithm, for convenience, we reduce it to the size of the largest part. For example, sum_multiples we say is O(n^{2}) whereas sum_multiples2 is O(n). This is often all you really need to know, for large enough values of n, O(n) algorithms will always beat O(n^{2}) algorithms, regardless of the details.

That's all well and good for time complexity, but what about memory? You may have noticed that sum_multiples2 achieves its speed increase by using more memory, namely the variable `numbers`

. Now in this case it's probably a good switch, since we can remove a factor of n for the cost of only n blocks of memory. If you're running in an environment where memory is very precious then you will want to look carefully at changes such as this. In general you can improve a lot of algorithm's time complexity by using more memory, by caching the results of frequently called functions. This is the theory behind dynamic programming.

Let's apply these ideas to a slightly less contrived example: the classic in this case is sort algorithms, but since they end up being fairly commonly used, it's still useful to take a look at them. We'll pick out two common sort algorithms and compare them: Selection sort and Quick sort. Even if your platform gives you sorting as part of the language or API, a customised sort algorithm can give significant speed benefits, so it's worth reading on.

Selection sort is perhaps the simplest of all the sorting algorithms to grasp immediately, and it's analogous to the way most people will intuitively sort, for example when they order their hand in a game of cards. It has two main steps: First, find the minimum value in the list, then swap it with the value at the start of the list. Repeat this process for the entire list while ignoring the already sorted values at the beginning and you'll end up with a sorted list. An implementation of selection sort follows:

selection_sort(a) { n = length(a) for (i = 0; i < n; i++) { min = i for (j = i + 1; j < n; j++) { if (a[j] < a[min]) { min = j } } temp = a[i] a[i] = a[min] a[min] = temp } }

You can easily see from the nested loops that this algorithm is O(n^{2}). This is a particularly bad sorting algorithm in terms of efficiency, and is almost never used in practice, much like the infamous Bubble sort.

Quick sort, on the other hand, is commonly used in the real world, as it provides much faster average case speeds and uses little additional memory. Quick sort also follows a relatively simple sequence of steps: First choose one item in the list, called the pivot, and then reorder the list so that everything that is smaller than the pivot is before it in the list and everything larger is after it. Then, having two new lists, sort each using Quick sort until they contain zero or one element, which are always sorted. From the Wikipedia entry for Quick sort, here is one implementation:

partition(array, left, right, pivotIndex) { pivotValue = array[pivotIndex] swap(array[pivotIndex], array[right]) // Move pivot to end storeIndex := left for (i = left; i < right; i++) { if (array[i] < pivotValue) { swap(array[storeIndex], array[i]) storeIndex := storeIndex + 1 } } swap(array[right], array[storeIndex]) // Move pivot to its final place return storeIndex } quicksort(array, left, right) { if (right > left) { select a pivot index (e.g. pivotIndex = left) pivotNewIndex := partition(array, left, right, pivotIndex) quicksort(array, left, pivotNewIndex-1) quicksort(array, pivotNewIndex+1, right) } }

In worst case Quick sort is O(n^{2}), just like selection sort, this occurs when the pivot that is chosen is always either the smallest or largest item in the list. However, for the average case Quick sort finishes in O(n log_{2} n) time, which is significantly better. If we're sorting a list of one million records, the difference is one trillion vs nineteen million — a speed up of 50,000 times. In fact, it can be proved that the best a general comparison sort algorithm can do in average case is n log n. I'll omit the proof, but it demonstrates that Quick sort is pretty close to as good as a general sort is going to get.

This brings us to the second main way in which you can improve your algorithms: by using your knowledge of the data to make assumptions about what sections of a standard algorithm you can avoid. Let me show you what I mean: Say you have an unsorted list containing all the numbers from 1 to n with no repetition. There is a sorting algorithm you can use that will sort the list in only O(n) time. Hang on, didn't I say just last paragraph that the best a sort can do is n log n? Well yes, but that's only for the general case where you have to compare items to establish an order. If you can cut that part out of the equation then your algorithm can run faster. Take a look at the code for Bucket sort:

bucket_sort(a) { n = length(a) sorted = array of size n for (i = 0; i < n; i++) { sorted[a[i]] = a[i] } return sorted }

By using the value of each item as its position in a sorted list, we can sort the list by running through it just once. This is a trivial example for integers, but say you have 1,000 records with sequential IDs from 0 to 1000, then you could use that field to sort the records quickly. Obviously this trick won't work if you can't use the value as an index, so you're limited to integral numbers in this kind of sort.

Now that you've seen the basics in action, take another look at that code — are you sure it's as efficient as it could be? It's worth the time if you can make improvements like the ones here; bringing a block of code into a smaller class can have major positive consequences in terms of efficiency. Before you jump straight into things next time, try to take a moment to think of the best way to go ahead — you will be thanking yourself for it later down the line.