Efficiency can be important when writing code, but it should usually take a back seat to writing clear code. Chad Perrin explains why.
There was a time, long ago, when the most important thing you could do with your code was to make it more efficient -- in terms of how much functionality you can pack into every kilobyte of storage, how tightly it compiles, how little RAM it uses, how much you can communicate in every network packet sent, and so on. In those days, many computers did not even have random access persistent storage, you could only run one program at a time, and RAM was measured in bytes rather than gigabytes. Those days are long gone.
There are still reasons to pay attention to efficiency when writing code. A poorly written Fibonacci sequence generator could take hours to produce the first 100 numbers in the series. An operating system that requires 20 gigabytes of storage just to be minimally useful does not serve very well for use on embedded devices. A browser that suffers memory fragmentation problems can actually consume all the RAM and swap space on your system after a couple days of very heavy use.
As computing resources continue to grow, efficiency falls further behind another concern when writing code, though. That concern is the cleanness of the code itself. Mostly, this boils down to readability and comprehensibility. Programmers need to be able to read and comprehend your code -- programmers that will come along after you have moved on and even when you come back to your own code in six months.
Without readability and comprehensibility, you cannot easily reuse your code. You will be forced to reinvent the wheel many times over if you decide it is easier to write from scratch than use what you have already written when that would otherwise serve perfectly well. When writing open source software, the same problem applies to other people. Even worse when writing open source software, if your code is unreadable or -- once read -- incomprehensible, nobody else will bother looking at the code very much; you will not get any feedback on it other than (perhaps) complaints about that, you will not get code contributions, and ultimately your "open source" will be so opaque as to be effectively closed. It may be easier for others to just rewrite the needed functionality and put your project "out of business". This is happening to GNU Screen right now.
The problem is just as bad in closed source software development. Bugs are easier to fix when you can understand your own code. Features are easier to add, too.
When efficiency matters, it is easier to see where the efficiency bottlenecks reside if your code is clear.
I wrote a program that makes use of a plain text YAML flat file database. As you may recall from an earlier article, YAML is a popular format for simple data storage with Ruby (and other languages). When data structures start getting complex and large, YAML is not the most efficient tool in the toolbox. It offers some advantages, however; its format is very clearly human-readable, it is widely supported, it does not require complex software systems like typical relational database management systems to access data, it translates directly into common data structures, and interacting with it is easy with popular YAML libraries.
The data stored in the YAML file used by this program is getting more complex, and there is more of the data over time. I am starting to look at the need to use a more robust data format, such as SQL. Basically the only question about this that lies before me right now is: Should I use SQLite or PostgreSQL? Before making that data migration, though, I need to add more functionality to my program, because I have been making edits directly to the YAML data file at times where it was easier to do that than to wrap simple methods around alterations in functionality. Yes, I need to rewrite part of the program to deal with a data format change, soon. No, I do not consider this a poor choice of data format in the early stages of development.
I chose YAML in part because I needed human readability in a structured format that was easy to manipulate with code. This was of critical importance early on, and development would not have proceeded with nearly as much alacrity in early stages if I had not made that decision -- if I had chosen to use PostgreSQL from day one. In fact, the whole project might have just stalled and withered away. Even if that was not the case, the flexibility afforded by this data format to how I dealt with my data saved me far more time than I will have to spend on the migration of data formats.
In working on that same project, I wrote code that correlated a short list in one record with a long list of values that resided in another record, summed values that matched between the two lists, and presented the result. This was written as an atomic operation whose code resided entirely within a single method, including:
- opening the YAML file
- reading the data into a data structure
- closing the file
- correlating list items and producing a sum of values
- returning the sum, and
- letting the data structure go out of scope.
It was all very clear and clean code. Unfortunately, it turned out to be horrifically inefficient, though I did not notice this fact at first. It was only later when I decided I needed to be able to perform this operation hundreds of times and produce tabular output to show a visible matrix of results that the failures of that approach became obvious. The problem was loading an increasingly large YAML file's data into a large, complex data structure every single time one of these values needed to be calculated. For efficiency's sake, this should obviously have been done outside the method.
As fellow TechRepublic contributor Sterling Camden can tell you, I was pretty hard on myself for that lapse at the time. I was simply flabbergasted I had done something like that, building such a huge resource sink into such a simple little method. There are times that efficiency matters, and this was definitely one of them. I had managed to completely miss the importance of efficiency in this case, and made what appeared to be a very amateurish mistake. This was not 10 years ago; I could not blame this on a significantly more naive me. It was closer to 10 weeks ago.
Now that I have had more time to think about it, I realize that I did not really do anything wrong. By wrapping up the entire operation in a single method, I had satisfied the needs of the functionality at hand. I did so in a way that made it incredibly clear where the method was getting its data, and that the data had not been altered in any way by the program before it was used as the basis for this operation. It was about as clear as it could possibly be under the circumstances, which aided in the task of constructing a series of algorithmic steps that would complete the needed operation as simply and straightforwardly as possible. It also made it much easier to test the code for unexpected edge cases.
When I first wrote the method, my focus was entirely on the clarity of my code, and as a result I had useful functionality that served me well -- until I needed more. Because the code was so clear, and I had a better idea by then of the sort of uses to which I would want to put this functionality, I was able to accomplish the needed rewrites quickly, easily, and most importantly clearly when the time came to make changes to support the more resource-intensive series of operations I then needed. If my focus had been on efficiency to satisfy needs I did not yet know I had, the structure of the program as a whole might have been significantly more complex and difficult to sort out when changes needed to be made. Ultimately, the highly inefficient but very clear way I wrote that method early on offered the readability, comprehensibility, and structural simplicity to make an important refactoring much easier to undertake.
As long as your code meets the basic requirements for efficiency needed to suit your needs for the immediate future, clarity trumps efficiency. As Donald Knuth put it, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." There are those who disagree with this idea. Developer Joe Duffy said:
Mostly this quip is used to defend sloppy decision-making, or to justify the indefinite deferral of decision-making. In other words, laziness.
His attack on the "premature optimization" culture is well written and in-depth. His argument basically boils down to this statement from near the end of the page:
I'm not saying Knuth didn't have a good point. He did. But the "premature optimization is the root of all evil" pop-culture and witty statement is not a license to ignore performance altogether. It's not justification to be sloppy about writing code. Proactive attention to performance is priceless, especially for certain kinds of product- and systems-level software.
Based on the above description of how I ended up making certain choices about how to write my YAML-using program, he would probably call me lazy. His certainty of my laziness would probably only be reinforced if he knew that one of the reasons I chose YAML is the simple fact that I find it easy to use, and had never written Ruby code that used SQLite before.
It was not lazy in the sense he means, though. It was carefully approached coding, with a focus on making things as clear as I could. I was not thinking about performance, about the efficiency of what I wrote, at all; I simply did not want to think about it at the time. What I did want to think about was, at the time, much more important -- and is often one of the most difficult things to achieve when writing code.
I wanted it to be simple, elegant, readable, and fully comprehensible. That takes work.
When I discovered my efficiency problem, I felt pretty stupid for a while. After having thought about it, though, I have come to realize that if I had that choice to make over again, with the same knowledge of what I might need in the future -- very little -- but more awareness of the fact I was writing horribly inefficient code, I think I would have made the same decision. I hope I would because, at the time, efficiency did not matter; when it would come to matter in the future, the clarity of the code I wrote is what allowed me to change the code to suit the needs that arose as easily as I did.
Just remember, the admonition that premature optimization is the root of all evil does not give you license to write bad code. It is an encouragement to consider other factors than efficiency whenever you can get away with it, because optimization can be added, but as our code gets more complex clarity only suffers. More to the point, clarity helps you figure out how to make your code more efficient later, but efficiency never really helps you figure out how to make it clearer.
Besides . . . one of the benefits of clear code is that making your code clearer usually involves making it shorter too, contributing to the efficiency of source code storage.