I sat alone eating breakfast at my favorite diner. The cook ambled out of the kitchen to the soda machine to fill her cup with ice, but she couldn’t get any out. She called the owner, who came out of the back and fiddled with the machine a couple of times before it finally gave ice. “It just wanted some attention,” she said as she left.
A system that needs periodic attention indicates poorly designed automation, I thought to myself. Yet, most systems do seem to require this kind of babysitting. If someone cares for the systems on a regular basis, they run indefinitely without a hitch; if the systems are neglected for too long, they fall apart.
In software development, we call that code rot. When nobody pays attention to a section of working code, it suddenly doesn’t work any more. Sometimes after examining the problem, I conclude that it could never have worked — despite the fact that it had been running for years. This happens so frequently that it almost doesn’t surprise me any more. The code didn’t change itself (at least, most code can’t do that), so how did this happen?
You might suspect that the metaphor “rot” is somewhat inaccurate. Physical materials degrade spontaneously — or do they? If you could place a tomato in a sealed environment and remove all microbes from it, it wouldn’t rot. Nothing rots by itself — it needs some agent acting on it.
While extending a metaphor doesn’t establish fact, it seems to shed light on this case. Code rot often occurs because the requirements for the code have shifted imperceptibly — as imperceptibly as bacteria infect organic matter. Nobody remembers that the environment in which this code operated successfully for years differed subtly from that in which it is now expected to function. Changes made elsewhere place new, unforeseen, and undocumented demands on this code. Common sense says that it should handle this case, but nobody thought of that specific combination of circumstances when it was designed.
Code can also degrade because it relies on the behavior of other code. The documented behavior of a library or framework often doesn’t include specific details on which clients rely. The author may feel free to change an undocumented behavior — he or she may even consider it a bug. Combine a few of those changes together, and you can easily lose the trail back to “how did this ever work?”
What can we do to prevent code rot?
We can take another cue from our metaphor: avoid exposure, because exposure hastens rot. The less your software relies on other software, and vice versa, the less trouble you’ll have. By “less,” I don’t mean fewer instances of usage, but rather fewer methods of usage. Consider the Unix ‘tee’ utility, for example. Other than the obligatory proliferation of GNU options, it hasn’t changed significantly in decades, and yet, it doesn’t rot. Why? Because from the beginning it defined a simple interface with clear expectations. Furthermore, it confines its expectations of the code on which it relies (the standard C library) to clearly documented behavior in the purpose for which it was written.
Documentation helps, but documentation is not the solution — simplicity is. No matter how well-documented the system, if it has too many interdependencies with other systems, some of those dependencies will change in incompatible ways. Therefore, break the system down into simple, independent components that each do one thing well, and document those things. As requirements change, the alignment of those components may need to be altered, but not so much the components themselves.
You must employ discipline to stick to that philosophy. When a new requirement arises, don’t give in to the impulse to just hack in support for this special case by passing in a flag. That will introduce additional, unnecessary complexity to the interface. The more you do that, the more likely you’ll create a web of unseen interdependencies that will lead directly to rotten code.