Five debugging tips for solving software problems

Do you ever have trouble getting information out of your clients? Even when they could give you that information with just a little effort? Here's how to drill down to get to the bottom of a situation.

Do you ever have trouble getting information out of your clients? Even when they could give you that information with just a little effort?

"It's broken," said the client.

"Broken?" I responded. "In what way?"

"It's not doing what it's supposed to do."

"Could you describe for me what it's supposed to do that it isn't?"

"You know -- it compiles fine, but when you run it, it just dies."

"Dies how?"

"It gets an error and quits."

"What's the error message?"

"I didn't write it down."

Naturally, the problem doesn't reproduce on my test system. So I have to keep the client involved at least enough to give me access to their system. The client, however, doesn't want to be involved. They just want it fixed. Without saying so, they're probably thinking "Doesn't this software quack ever test this stuff? We're paying him the big bucks just to have this blow up in our users' faces, and then paying more to have him fix it!" They're not feeling at all like helping me, but I really need some information if I'm going to help them, because when I try it on their system it doesn't reproduce there, either.

"I need to know the exact steps you followed leading up to the error message."

To them, it sounds like I'm evading responsibility, because they have no idea what steps preceded this disaster.

Most of my clients are software developers, and it puzzles me how frequently they plop huge haystacks of code in my lap and ask me to find the needle. I often try to teach them good problem-solving techniques -- you know, teach a man to fish -- but I'm amazed at the resistance I sometimes encounter. Oh well, more billable hours for Yours Truly.

Here are some general problem-solving techniques I use. Many of these apply to all sorts of problems, not just software bugs.

  1. Make it smaller. Distill the problem down to the minimum amount of code required to reproduce it. Eliminate anything extraneous. Why go to all that trouble? One of the easiest ways to find the needle in the haystack is to get rid of most of the haystack. Sometimes, the problem goes away when you cut something out, which should give you an idea where it's hiding. Besides, if you get into an iterative debugging cycle, you'll be able to cycle much faster with less code and fewer steps.
  2. Question your assumptions. I had a client call me the other day to report a problem in which static data was supposedly being modified by a return statement. Ah, I know what you're thinking, but there were no objects involved in this code, so no destructors were being called. To believe him would mean that there was a horrible bug in the runtime environment for the language he was using, which didn't seem at all likely to me. So I asked him how he examined the data before and after the return. The data was stored in a library module for which he did not have debug symbols, so he added calls in his code to a standard routine to query the data. But he didn't realize that this routine (in the way it was called) also had the unfortunate side-effect of modifying the data, creating an instant Heisenbug. He had been moving these calls around, trying to isolate the statement that changed the data, when the debugging statement itself was doing just as much damage.
  3. Beware of false causation. How many times have you heard, "the only thing that changed is X, so the problem must be related to X." No, no, no, no, no, no, NO! More than half the time, it's something else that changed that they forgot about - or didn't even know about. When the "obvious" cause turns out to be a red herring, or even before, I always echo the famous words of Sgt. Schulz: "I know nothing... NOTHING!" OK, so it's intended more in the spirit of Socrates. But no cause should be assumed until proven. That doesn't mean that you shouldn't check out your hunches first, though. We have intuition for a reason.
  4. Start at the result and work backwards. I often see programmers start a debug session, then step through routine after routine, examining variables along the way, hoping to stumble across the moment when things go wrong. Usually that doesn't work at all, because problems have a nack for resulting from seemingly innocuous beginnings. It may seem counter-productive because code doesn't execute backwards, but it's more efficient to start at the moment of the failure's epiphany (an error message, for instance) and examine what's wrong at that instant. Then go back to the code that led up to that point to see where it went wrong, backing up routine by routine until you find the culprit. Ideally, debuggers should be made able to step backwards, but even if you have to restart your debug session a hundred times, you'll save time over trying to perceive the cause from the top.
  5. Refactor. Sometimes the complexity of the situation is part of the problem. Maybe the definition of what the "plugh" function does is not entirely consistent, and that's what's leading to a failure. By simplifying and clarifying the design, those inconsistencies often reveal themselves. But use this judiciously -- once you get started down that road, you might not be able to stop for a long, long time.