iOS

Efficient code is good but clean code is better

Efficiency can be important when writing code, but it should usually take a back seat to writing clear code. Chad Perrin explains why.
By Chad Perrin

This post was originally published in the Software Engineer Blog in June 2011.

There was a time, long ago, when the most important thing you could do with your code was to make it more efficient -- in terms of how much functionality you can pack into every kilobyte of storage, how tightly it compiles, how little RAM it uses, how much you can communicate in every network packet sent, and so on. In those days, many computers did not even have random access persistent storage, you could only run one program at a time, and RAM was measured in bytes rather than gigabytes. Those days are long gone.

There are still reasons to pay attention to efficiency when writing code. A poorly written Fibonacci sequence generator could take hours to produce the first 100 numbers in the series. An operating system that requires 20 gigabytes of storage just to be minimally useful does not serve very well for use on embedded devices. A browser that suffers memory fragmentation problems can actually consume all the RAM and swap space on your system after a couple days of very heavy use.

As computing resources continue to grow, efficiency falls further behind another concern when writing code, though. That concern is the cleanness of the code itself. Mostly, this boils down to readability and comprehensibility. Programmers need to be able to read and comprehend your code -- programmers that will come along after you have moved on and even when you come back to your own code in six months.

Without readability and comprehensibility, you cannot easily reuse your code. You will be forced to reinvent the wheel many times over if you decide it is easier to write from scratch than use what you have already written when that would otherwise serve perfectly well. When writing open source software, the same problem applies to other people. Even worse when writing open source software, if your code is unreadable or -- once read -- incomprehensible, nobody else will bother looking at the code very much; you will not get any feedback on it other than (perhaps) complaints about that, you will not get code contributions, and ultimately your "open source" will be so opaque as to be effectively closed. It may be easier for others to just rewrite the needed functionality and put your project "out of business". This is happening to GNU Screen right now.

The problem is just as bad in closed source software development. Bugs are easier to fix when you can understand your own code. Features are easier to add, too.

When efficiency matters, it is easier to see where the efficiency bottlenecks reside if your code is clear.

I wrote a program that makes use of a plain text YAML flat file database. As you may recall from an earlier article, YAML is a popular format for simple data storage with Ruby (and other languages). When data structures start getting complex and large, YAML is not the most efficient tool in the toolbox. It offers some advantages, however; its format is very clearly human-readable, it is widely supported, it does not require complex software systems like typical relational database management systems to access data, it translates directly into common data structures, and interacting with it is easy with popular YAML libraries.

The data stored in the YAML file used by this program is getting more complex, and there is more of the data over time. I am starting to look at the need to use a more robust data format, such as SQL. Basically the only question about this that lies before me right now is: Should I use SQLite or PostgreSQL? Before making that data migration, though, I need to add more functionality to my program, because I have been making edits directly to the YAML data file at times where it was easier to do that than to wrap simple methods around alterations in functionality. Yes, I need to rewrite part of the program to deal with a data format change, soon. No, I do not consider this a poor choice of data format in the early stages of development.

I chose YAML in part because I needed human readability in a structured format that was easy to manipulate with code. This was of critical importance early on, and development would not have proceeded with nearly as much alacrity in early stages if I had not made that decision -- if I had chosen to use PostgreSQL from day one. In fact, the whole project might have just stalled and withered away. Even if that was not the case, the flexibility afforded by this data format to how I dealt with my data saved me far more time than I will have to spend on the migration of data formats.

In working on that same project, I wrote code that correlated a short list in one record with a long list of values that resided in another record, summed values that matched between the two lists, and presented the result. This was written as an atomic operation whose code resided entirely within a single method, including:

  • opening the YAML file
  • reading the data into a data structure
  • closing the file
  • correlating list items and producing a sum of values
  • returning the sum, and
  • letting the data structure go out of scope.

It was all very clear and clean code. Unfortunately, it turned out to be horrifically inefficient, though I did not notice this fact at first. It was only later when I decided I needed to be able to perform this operation hundreds of times and produce tabular output to show a visible matrix of results that the failures of that approach became obvious. The problem was loading an increasingly large YAML file's data into a large, complex data structure every single time one of these values needed to be calculated. For efficiency's sake, this should obviously have been done outside the method.

As fellow TechRepublic contributor Sterling Camden can tell you, I was pretty hard on myself for that lapse at the time. I was simply flabbergasted I had done something like that, building such a huge resource sink into such a simple little method. There are times that efficiency matters, and this was definitely one of them. I had managed to completely miss the importance of efficiency in this case, and made what appeared to be a very amateurish mistake. This was not 10 years ago; I could not blame this on a significantly more naive me. It was closer to 10 weeks ago.

Now that I have had more time to think about it, I realize that I did not really do anything wrong. By wrapping up the entire operation in a single method, I had satisfied the needs of the functionality at hand. I did so in a way that made it incredibly clear where the method was getting its data, and that the data had not been altered in any way by the program before it was used as the basis for this operation. It was about as clear as it could possibly be under the circumstances, which aided in the task of constructing a series of algorithmic steps that would complete the needed operation as simply and straightforwardly as possible. It also made it much easier to test the code for unexpected edge cases.

When I first wrote the method, my focus was entirely on the clarity of my code, and as a result I had useful functionality that served me well -- until I needed more. Because the code was so clear, and I had a better idea by then of the sort of uses to which I would want to put this functionality, I was able to accomplish the needed rewrites quickly, easily, and most importantly clearly when the time came to make changes to support the more resource-intensive series of operations I then needed. If my focus had been on efficiency to satisfy needs I did not yet know I had, the structure of the program as a whole might have been significantly more complex and difficult to sort out when changes needed to be made. Ultimately, the highly inefficient but very clear way I wrote that method early on offered the readability, comprehensibility, and structural simplicity to make an important refactoring much easier to undertake.

As long as your code meets the basic requirements for efficiency needed to suit your needs for the immediate future, clarity trumps efficiency. As Donald Knuth put it, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." There are those who disagree with this idea. Developer Joe Duffy said:

Mostly this quip is used to defend sloppy decision-making, or to justify the indefinite deferral of decision-making. In other words, laziness.

His attack on the "premature optimization" culture is well written and in-depth. His argument basically boils down to this statement from near the end of the page:

I'm not saying Knuth didn't have a good point. He did. But the "premature optimization is the root of all evil" pop-culture and witty statement is not a license to ignore performance altogether. It's not justification to be sloppy about writing code. Proactive attention to performance is priceless, especially for certain kinds of product- and systems-level software.

Based on the above description of how I ended up making certain choices about how to write my YAML-using program, he would probably call me lazy. His certainty of my laziness would probably only be reinforced if he knew that one of the reasons I chose YAML is the simple fact that I find it easy to use, and had never written Ruby code that used SQLite before.

It was not lazy in the sense he means, though. It was carefully approached coding, with a focus on making things as clear as I could. I was not thinking about performance, about the efficiency of what I wrote, at all; I simply did not want to think about it at the time. What I did want to think about was, at the time, much more important -- and is often one of the most difficult things to achieve when writing code.

I wanted it to be simple, elegant, readable, and fully comprehensible. That takes work.

When I discovered my efficiency problem, I felt pretty stupid for a while. After having thought about it, though, I have come to realize that if I had that choice to make over again, with the same knowledge of what I might need in the future -- very little -- but more awareness of the fact I was writing horribly inefficient code, I think I would have made the same decision. I hope I would because, at the time, efficiency did not matter; when it would come to matter in the future, the clarity of the code I wrote is what allowed me to change the code to suit the needs that arose as easily as I did.

Just remember, the admonition that premature optimization is the root of all evil does not give you license to write bad code. It is an encouragement to consider other factors than efficiency whenever you can get away with it, because optimization can be added, but as our code gets more complex clarity only suffers. More to the point, clarity helps you figure out how to make your code more efficient later, but efficiency never really helps you figure out how to make it clearer.

Besides . . . one of the benefits of clear code is that making your code clearer usually involves making it shorter too, contributing to the efficiency of source code storage.

15 comments
ZevGriner
ZevGriner

Efficient programming is affected by the requirements. For example, a looping bubble sort is more efficient than a recursive merge sort for a small number of items, and a lot cleaner. But if you're dealing with a million items, recursive sorts are the way to go. Also, one's experience helps shape the approach and choice of algorithms. When I started, I did lots of maintenance programming. I kept wishing these coders looked a little into the future and made their code more generic. Sometimes, the algorithms were specific to the situation, so it took a lot of testing to make sure the upgrade didn't break what was running.

Tony Hopkinson
Tony Hopkinson

Over engineering your design might end up with it not be perceived as a clean one. If you know a change is going to happen certainly, if you can modularise and cope with classes of changes that might, all well and good, but simply considering a range of potential changes and implementing them, be wary.

guillegr123
guillegr123

I think the best thing to do is, first of all, consider the more possible future requirements. Then, create a solution that take into account the existing and future requirements, that be both clear and enough efficient.

bradleyross
bradleyross

When I was maintaining other people's software, I usually didn't consider optimization profitable unless it looked like it would cut the total execution time by about eighty percent. I cut the execution time of a lot of programs by eighty percent or more (sometimes much much more). If you are saving less than five percent, the time saved wasn't usually worth the cost of testing the changes. (Of course, some of those fellows never tested their changes.)

cougar.b
cougar.b

I always consider myself a rank amateur. Currently, I've decided that the time has come to stop the trial and error and comment, comment, comment, as well as removing any obsolete lines from previous trials that didn't work out. My goal is completely readable code, and it feels wasteful to take this time, but I'm still committed to it, since I truly am concerned with future hacks to my work. Thanks for reinforcing what I decided to do in my rank amateurness.

HypnoToad72
HypnoToad72

Heck, tell that to Microsoft now, given how bloaty Windows has become... 16GB for an OS of which no average joe user could tell any difference over is sheer bloat...

Tony Hopkinson
Tony Hopkinson

It's been my opinion for about that many decades. Over that time I've never seen or heard a challenge to the contention that wasn't rubbish. Efficient code is code that you can change, because code changes. After you've had a 100 blokes losing 40% of their pay because you believed different chat with you about it, you change your mind along with your underpants. Clean first, always. You can always make it messy later, if you have to. Going to management or 100 hairy colleagues and asking for more time to clean it up, will not win you anything but enemies.

andybrucenet
andybrucenet

If by "Efficiency" one means "Optimization" - this has always been a terrible way to approach a project. Even in the old days of x86 assembler TSRs / device drivers (remember the multiplex interrupt?) approaching a coding task from the optimization view leads to problems of maintenance, reliability, and general frustration. It is quite simple why this has always been so: Heisenberg's Principle. The act of optimizing code prior to its thorough debugging causes all sorts of entertaining results, yet the hubris involved is all too symptomatic of the quintessential programmer. (Despite its dweebish stereotype, I know of no male programmer who does not embrace machismo and a "laws of physics need not apply" attitude.) Remember the 90/10 rule and take it bed each night: 90% of program time is spent in 10% of program function. I still remember fondly a GUI developer from the mid-90s who spent a week of all-nighters analyzing and improving the initial screen display of a complex application. Net result? .27 seconds of time (we did time it). Unfortunately this did not ease the true problem of single-threaded database calls which caused the overall initial screen display to take 31 seconds. The lesson? Apply solid development techniques: design patterns, test-driven development, quantifiable metrics, and required documentation all go a long way to provide the reliable foundation upon which that 10% of truly time-intensive code can be: 1) Identified; 2) Analyzed; 3) Optimized; 4) Ruthlessly Tested. Oh - and avoid hubris. Never again shall I use the name "SmartThreads" as - by the time I was finished debugging them (except for that last pesky race condition I never found) - I knew I was anything but Smart.

Tony Hopkinson
Tony Hopkinson

Clean code would call sort on some collection and that sort might be bubble or merge. Efficient and messy would be the sort code slapped in a button click event handler, called DisplayItems. As for developers looking in the future, may be they did. maybe the resource couldn't be made available, may be they couldn't sell it to management, maybe the future changed. Maybe the code had so much technical debt already they dare n't.. Clean code is a habit. Start clean, stay as clean as you can, start messy, and you are stuffed.

Tony Hopkinson
Tony Hopkinson

for micro-optimisation, 80% ? And 80% of what and how? Oh and the tests should be done before you optimise. .

Tony Hopkinson
Tony Hopkinson

now whether the bloat features are done with bloat code, I have no idea, I suspect it will be the usual mix of new code and technical debt, but a heck of a lot of it.

Tony Hopkinson
Tony Hopkinson

3 tests written 4 implemented 5 Ruthlessly tested 6 optimised 7 Ruthlessly tested again. Other than that, total agreement

cougar.b
cougar.b

Great! Another reason for my wife and I to go out for a celebration! Can I use you as a reference?

Editor's Picks