Apps

How to prevent code rot

Software developers call neglected code that stops working code rot. Chip Camden explains why this often occurs and how to avoid such trouble.

I sat alone eating breakfast at my favorite diner. The cook ambled out of the kitchen to the soda machine to fill her cup with ice, but she couldn't get any out. She called the owner, who came out of the back and fiddled with the machine a couple of times before it finally gave ice. "It just wanted some attention," she said as she left.

A system that needs periodic attention indicates poorly designed automation, I thought to myself. Yet, most systems do seem to require this kind of babysitting. If someone cares for the systems on a regular basis, they run indefinitely without a hitch; if the systems are neglected for too long, they fall apart.

In software development, we call that code rot. When nobody pays attention to a section of working code, it suddenly doesn't work any more. Sometimes after examining the problem, I conclude that it could never have worked -- despite the fact that it had been running for years. This happens so frequently that it almost doesn't surprise me any more. The code didn't change itself (at least, most code can't do that), so how did this happen?

You might suspect that the metaphor "rot" is somewhat inaccurate. Physical materials degrade spontaneously -- or do they? If you could place a tomato in a sealed environment and remove all microbes from it, it wouldn't rot. Nothing rots by itself -- it needs some agent acting on it.

While extending a metaphor doesn't establish fact, it seems to shed light on this case. Code rot often occurs because the requirements for the code have shifted imperceptibly -- as imperceptibly as bacteria infect organic matter. Nobody remembers that the environment in which this code operated successfully for years differed subtly from that in which it is now expected to function. Changes made elsewhere place new, unforeseen, and undocumented demands on this code. Common sense says that it should handle this case, but nobody thought of that specific combination of circumstances when it was designed.

Code can also degrade because it relies on the behavior of other code. The documented behavior of a library or framework often doesn't include specific details on which clients rely. The author may feel free to change an undocumented behavior -- he or she may even consider it a bug. Combine a few of those changes together, and you can easily lose the trail back to "how did this ever work?"

What can we do to prevent code rot?

We can take another cue from our metaphor: avoid exposure, because exposure hastens rot. The less your software relies on other software, and vice versa, the less trouble you'll have. By "less," I don't mean fewer instances of usage, but rather fewer methods of usage. Consider the Unix 'tee' utility, for example. Other than the obligatory proliferation of GNU options, it hasn't changed significantly in decades, and yet, it doesn't rot. Why? Because from the beginning it defined a simple interface with clear expectations. Furthermore, it confines its expectations of the code on which it relies (the standard C library) to clearly documented behavior in the purpose for which it was written.

Documentation helps, but documentation is not the solution -- simplicity is. Now matter how well-documented the system, if it has too many interdependencies with other systems, some of those dependencies will change in incompatible ways. Therefore, break the system down into simple, independent components that each do one thing well, and document those things. As requirements change, the alignment of those components may need to be altered, but not so much the components themselves.

You must employ discipline to stick to that philosophy. When a new requirement arises, don't give in to the impulse to just hack in support for this special case by passing in a flag. That will introduce additional, unnecessary complexity to the interface. The more you do that, the more likely you'll create a web of unseen interdependencies that will lead directly to rotten code.

This post first appeared in TechRepublic's IT Consultant blog.

Keep your engineering skills up to date by signing up for TechRepublic's free Software Engineer newsletter, delivered each Tuesday.

About

Chip Camden has been programming since 1978, and he's still not done. An independent consultant since 1991, Chip specializes in software development tools, languages, and migration to new technology. Besides writing for TechRepublic's IT Consultant b...

30 comments
RMSx32767
RMSx32767

"Stopped working" or did not execute as expected? There is a difference.

venerable Architect
venerable Architect

Back in 1978 I coined the term Bit rot when I was given some code that had been bastardized so many times that there was no way to salvage it. The Demand was to convert from English Units to Metric. That in it self was not an issue but there was no way to understand what was done with this code. So I suggested rewriting the code. The ignorance really showed and I was forced to do the conversion. Needless to say the code rotted out in the next year as the bits seemed to go. The real problem then was that only one person could make it run and she quit !! Love this column. Congrats on the excellent use of terminology. John

anil_g
anil_g

One day I'd like to see TechRepublic start to publish nice, quick "take aways" at the top of their articles, instead of the repetition of the title in slightly different words. A good take away for an article allows readers to decide if they want to read on for more detail and helps skim more content efficiently. If TechRepublic started using take aways their content would be more valuable and accessible to their readers. Here's a suggested take away for this article that I've kindly put together based on text from the actual article, to show how easy it is to do something like this. "Documentation helps prevent code rot, but documentation is not the solution - simplicity is. If a system has too many interdependencies some of those dependencies will change in incompatible ways. Therefore, break the system down into simple, independent components that each do one thing well, and document those things."

techrepublic@
techrepublic@

Use assertions to verify functions' arguments validity. If a function expects a integer in the range [1,4] then put an assertion at the start of the function to verify that expectation. It makes the requirements and intent of the code much clearer. Use something like: void foobar( int arg ) { assert( 1 <= arg && arg >= 4 ); switch( arg ) { case 1: case 2: case 3: case 4: } } instead of void foobar( int arg ) { switch( arg ) { case 1: case 2: case 3: case 4: default: //something wrong! } } p.s. Pseudo code that just happens to look like C.

just.a.guy
just.a.guy

I hate documentation. Most of it is just a restatement of the code. Programmers have learned to read the code rather than the documentation, because documentation and code easily "rot" in different directions unless you have the FDA breathing down your throat. Let's assume we have a perfect system, at the start. Nothing breaks. Then change comes along. If the change does not violate existing assumptions, then nothing breaks. Suppose a change results in a contradicting assumption becoming part of the system. As long as all of the assumptions that come together are not contradictory, the system continues to work. It is only when data or code with contradictory assumptions are combined that errors occur. What is the function of documentation? 1. Insure what was requested is what was delivered. 2. Assist in determining what and where to modify a system, when change occurs. If its only use is insure what delivered is correct, then you can through away the documentation after the system is delivered. Its main use after the initial system acceptance is to aid in change analysis as the system ages. If you want to improve the utility of this documentation, then you should focus on the source of errors (i.e. introducing contradictions into a system). Back in the 1970s we used a method called IPO (input/process/output). It was too weighty for most system. It resulted in volumes of documentation. In many cases it was just pseudo statements for the code actually implemented. What I would like to see is an AIO (assumptions, input, output) form of documentation at the top of each module. And I would place most emphasis on the assumptions. You can even write you input/output definitions in terms of assumption statements. Assumptions have a major subject which is being described or defined. And it states a condition ("rule") that is assumed to be "true", without proof. In writings assumptions it is better to write simpler sentences rather than complex ones. If you need a complex set of conditions, the I would create a type or classification for the set of things that have the combined set of assumptions. And then I refer to an object as an object of type x. or an X object. I don't know, this idea may already have been conceived by others and is already being used. I don't know. That is why I offer it here.

Justin James
Justin James

Something that found to be really important is to use better tools. For example, if you don't have a good tool to handle documentation (and no, I don't mean simple "//" comments, I mean full "///" comments at the very least), code rot speeds up. If you don't use a *good* version control system, code rot speeds up. If you write dense, unreadable code, code rot speeds up dramatically. If your code is written with a pile of hardcoded values, code rot gets VERY quick as the underlying rationale for the selection of the values is forgotten over time, and people just slap on more and more layers of junk to override logic in new cases. And on and on and on. For me, I've found that using the non-mainstream systems (and there are lots of good ones out there, I'm in love with Agile Platform, but I've seen a lot of very good ones out there... Django, Ruby on Rails are two examples) really address these issues well. As your code becomes more easily maintainable because it's obvious where to make changes and it is easier to make changes with metadata, not hard values or conditional statements, code rot slows down quite a bit. I've seen tons of .NET projects that had significant amounts of code rot even before the initial release, it's a lot harder to have just that bad in the more modern, less tradition-bound systems. J.Ja

oldbaritone
oldbaritone

One of the "housekeeping tricks" learned the hard way - never assume. Use the "else" at the end of a logic chain as a catch-all for anything unexpected. Call it "Code Rot" because a dependency was overlooked, but these things do happen. For example, If a function has an argument that should be passed with an integer value in the range 1-4, don't code it as "case 1", "case 2", "case 3", and "else" Instead, code "case 1", "case 2", "case 3", "case 4" and "else" - and "else" raises an error about unexpected input. In IF-THEN-ELSEIF logic do the same - leave the unspecified "ELSE" as "There should be no way to get here" and raise an error. Then, after a missed project meeting and someone forgot to tell you they defined a new option in that meeting, you will find out right away, rather than trying to figure out what you weren't told. Or many years later, when a routine that has "worked perfectly for years" gets something new, it tells you right away. Been there. Done that. Got a tee-shirt. Don't need another tee-shirt. ;-)

minstrelmike
minstrelmike

code rot is the converse of the truth that software does not evolve like many say. Any particular piece of software merely accretes like a coral reef. (In the same way no individual evolves; only species evolve).

pschulz
pschulz

My take on this is this: A key thing to do is to never rely on undocumented behaviour. Never. These undocumented features are undocumented for a reason, and the least of it is that they can change "unexpectedly". But you have no excuse - undocumented features are prone to change, they are almost meant to change, so never rely on them. Make your program "less clever" but rely on a few clearly documented and main line features. That includes avoiding the "special" and "advanced" features - just stay mainline, it will enhance code clarity and maintenance drastically. The other thing to avoid is so called "3rd party" components. In Windows development, avoid reliance on a variety of 3rd party controls and components.Who knows what assumptions they were built on (and which undocumented features THEY used)? Done this way, it is possible to define a clear baseline which makes a program work when those baseline requirements are met. If those requirements are few (and are sufficiently mainline and thus likely to be maintained in the future by vendors and OS developers), then you have the key to future workability.

Slayer_
Slayer_

In our global code, we try to never change the inputs/outputs of a function once it is created. If it needs to be changed, we actually write a new function instead and leave the old one intact allowing the legacy code to keep using it properly. The only problem with this approach is it leads to developer confusion when trying to select the correct function. Example: Do you want, GetString, GetVarString, GetVarString32 ??? We once thought maybe putting a serial date in the function name would help, but it made the code hard to read. Many of those functions just call their newer versions with the parameters reorganized and adjusted. Others are simply blanked out when they are no longer needed. So far, the best we have is descriptions in the function on if it is deprecated and what replaces it. It's no wonder software projects inevitably get slow and bloated. Our code is about 10 years old now and still has very little rot. Thankfully most of those global functions were well thought out when they were written, requiring very little maintenance.

pschulz
pschulz

This article nicely lays out one of the key problems. Let's face it - deep down all of us really do want to write the perfect code which always works and will always work. One time I made a little utility program for someone, was done in maybe 2 hours. I did not consider this important at the time, until I got a call 12 years later, when the company moved, that they had lost this program and they wanted it as they had been using it for the last 12 years and still needed it. I had my archive copy still around and and, again, it's been used since a few years again. I wondered at the time how come the small little projects seem to be more successful in terms of longevity than the large complex ones - code rot is definitely one way to explain that.

jkameleon
jkameleon

"Can we" is not a question. Of course we can. But, making "durable" software requires additional effort, better planning, and better understanding of the problem domain. It all depends on requirements.

Tony Hopkinson
Tony Hopkinson

kiss all that goodbye. Tools, techniques, and styles are all great, the biggest roadbloack is was and always will be cajoling resources out of management to address technical debt, from people who misunderstand what broke means.

apotheon
apotheon

Regression testing is a bulwark against code rot.

Tony Hopkinson
Tony Hopkinson

Aint nobody ever asked me if I want to reuse someone elses code, or take on this entire third party component suite, or not implememt this behaviour in this undocumented funny. How quickly I can make it work again like what it used to in the previous version, now that I get asked a lot...

apotheon
apotheon

Rather than rot, that's more like cancer. You need a process in place for deprecation of old code. Otherwise, you end up with something "designed" like PHP, which nobody really wants. Your code comment deprecation labels are a step in that direction, and that's good; if you can come up with something better that works within your organizational culture, do so. If possible, you might consider establishing a registry of all new functions and things that call them. When a new version is created, deprecate the old version, and start encouraging people to move to using the new version. When the registry drops to zero uses for a given old function, delete it.

Sterling chip Camden
Sterling chip Camden

I wrote a utility for the university where I worked back in the late 70s. A friend of mine told me they're still using it. I didn't think much of it at the time. It's amazing to me that it even still compiles.

kreniska
kreniska

It is not correct to say "it ALL depends on requirements". Obviiously requirements are important. But, so is design and code logic and testing. Even if you get the requirements right and even if you meet the requirements and even if there are no dependenciews on external libraries, the code can still fail in the future - 1. if error conditions are encountered that were not anticipated by the programmer 2. if a code path is hit that has never been executed and it was not tested.

doug.cronshaw@baesystems
doug.cronshaw@baesystems

You never get enough to fully test all the individual code modules, never mind doing the comprehensive sub-system and major system testing that you know should be done to protect against some sources of code rot. You are therefore forced to program defensively (such as ensuring that you always add the catch-all final case or error-trapping ELSE in a sequence of structures, see above comments) to protect against that lack of testing resource. [We used to work with the intention of spending at least 40% of the manpower effort being devoted just to testing. I don't think I ever managed to spend that much on testing after all the other preliminaries such as design and coding had eaten a significantly larger amount of the available resource than had been intended!]

Sterling chip Camden
Sterling chip Camden

Especially with complex UI components, roll your own just isn't always practical. Yet problems with third-party components are a big debugging time sink.

Sterling chip Camden
Sterling chip Camden

... of a routine name you still see all over the place in old DIBOL code: SRCH2. It's almost a good name, because it does a binary search. But it got its name because it was an improvement on SERCH. Because it had different parameters, the author didn't want to modify SERCH, so they left it in the codebase. Eventually, nothing called SERCH any more and it was retired, but then they couldn't rename SRCH2 without breaking a lot of code (this was long before compile-time macros made it into any languages other than Lisp).

Slayer_
Slayer_

So hopefully next time around, we do a better job. Of course, maybe next time around the language chosen will support overloaded functions.

jkameleon
jkameleon

"Project vision" is probably better. It's about what should developers focus on: Performance, deadlines, or readability/maintainability. If software is expected to be in use for a long time (OS, business core app, etc), it's important to be durable and maintainable. Programs for one time use (data migration, for example) don't need to be maintainable. Code rot is not the problem here. Sometimes, if you need planned obsolesence, code rot is even desireable. Customers might not like it, though.

Tony Hopkinson
Tony Hopkinson

the other is same as above, with code rot chucked in. It could all depend on requirements. You could have one that there are no bits of code that never execute. That the code only works if all teh anticipated conditions are met. It's more likely that you'll wake up in Hugh Hefners bed, with all his ladies convinced the new anti-aging treatment was very effective and a large box of viagra, but lifes usually not that much fun and then you become a developer... Code rot depends on one thing and one thing alone. Change, all else is a BS illusion of control.

AnsuGisalas
AnsuGisalas

We've decided that idle developers are writing the bugs, so from now on we'll cut out the Spec and Test steps, and have all developers go straight to Implementation.... all the man-hours we save on this will go towards pushing out more product... or on downsizing, we haven't decided yet... What are you all waiting for? Chop chop!

Tony Hopkinson
Tony Hopkinson

Everbody knows, if you just tried a bit harder to get things right first time, you wouldn't be making all these errors. Stop writing bugs, you damn programmers! :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :( :(

apotheon
apotheon

> You never get enough to fully test all the individual code modules, never mind doing the comprehensive sub-system and major system testing that you know should be done to protect against some sources of code rot. In many contexts, this is definitely the case. That doesn't mean we shouldn't. In fact, once you get past the initial time sink hump of getting used to a testing process, test-driven development can suit your testing needs by ensuring the tests are in place before the code is written with only about a 10% increase in initial development time at most -- sometimes, 10% or 20% less time, depending on the otherwise pathological character of a project. The fact those tests become the gift that keeps on giving during maintenance and upgrades only improves the benefits gained.

Tony Hopkinson
Tony Hopkinson

Some of the ones I've been forced to use have been some of the most poorly designed badly implemented rubbish I've seen in my career, and I've seen a lot, and that's not even counting my own code :p .

Sterling chip Camden
Sterling chip Camden

... have you thrown up your arms and said "I wish I'd just written this myself!"

Tony Hopkinson
Tony Hopkinson

either. It's another vector for change, so it will only accelerate it. And UI component suites are hideously invasive, they also tend to be very OS sensitive. There are some good reasons for taking advantage of them, but for the less than alert, they are nowhere near as advantageous as the bloke who sold you them said...

Editor's Picks