As I settled into my chair and opened my inbox, I immediately noticed a message from one of my clients with the subject "HELP!!!!". Just as I was about to open it, the phone rang. I knew who it would be before I even looked at the caller ID. So I picked up the handset and announced myself.
"CHIP!" came the distraught voice on the other end of the line. "We're down! That last update you sent us is preventing us from getting into the Zaphod system at all! It won't even start! And I've got to demo this tomorrow!"
"Hold on, hold on. Let's examine a few things and see if we can figure out what's going on."
"It has to be the update! That's the only thing we've changed!"
Have you ever heard that line before? I have, many times — and when the client goes into towel-waving mode on you, you're tempted to look first at what they're blaming: your work. Perhaps you go back and walk through the changes you made to try to determine how any of those could be causing the symptom the user is reporting. Often, though, that's not the only thing that changed, and believing in your client's ad hoc diagnosis is only spinning your wheels.
Someone in their organization may have installed some new server-side software that couldn't possibly be related, but you come to find out that it has an unexpected interaction with the system in question. Maybe they made some modifications to other code that has a knock-on effect in this realm. Or maybe they rebuilt something, and the old copy hadn't been as up to date as they had thought — introducing yet another variable.
You could start by trying to eliminate your update, if possible. Remove it, and see if the problem goes away. If it doesn't, then your contribution would appear to have no part in it. On the other hand, even if the problem does go away, the flaw that is introduced by enabling your update could be an unforeseen interaction between components that were not part of your specification and testing, producing the infamous "It works on my machine."
Furthermore, just because two phenomena occur together doesn't mean that one causes the other. This is known as the fallacy of Questionable Cause. For example, I've often seen cases where adding code to a program subtly changes its memory footprint, which reveals a completely unrelated bug such as a bad pointer that previously had been firing harmless shots in the air but now finds a target and brings the whole program down.
Then there's the infamous Schroedinbug, which is an existing bug that should have always failed but hasn't... until now. I've run across a few of these in code that had been used successfully by thousands of users for more than a decade. You think to yourself, "That could never have worked. But somebody would have reported it, so something else must be making it work." Then the next day, you start getting error reports from users by the dozen. The truth probably is that something else was making it work — something that appeared to be so unrelated that it was removed in a code "optimization".
In all these cases, you can waste a lot of time trying to track down the chain of causation. I find it's better to ignore all causal theories at first, and simply diagnose the current symptoms using proven, scientific methods as if you had no prior exposure to this system at all. What effects are you observing? Have you asked every possible stupid question about assumptions? How can the problem be better isolated and simplified? Once you've eliminated as many variables as possible, then your intuitions (and those of your client) can be applied to discovering the solution. Maybe this will reveal a cause, and maybe it won't, but the solution is what's important.
Your client may express frustration with this approach — perhaps they think you're wasting time being too systematic. Sometimes an initial intuition will be right, but if the light bulb over your head doesn't illuminate immediately, you're better off looking for a different light bulb. In my experience, even a highly knowledgeable client will guess the right cause only about 30% of the time, if they're lucky. But hypotheses often come with emotions attached and, if you just ignore them, your client can feel snubbed. My wife is worst of all — when she tells me about a computer-related problem, and I go into my systematic approach to diagnosis and start asking basic questions, she'll say something like "Do you think I'm stupid? Why won't you trust what I'm saying?" and we end up converting a technical problem into a marital one. Fortunately, my clients have their own debugging experiences, so they can usually appreciate my need to nail everything down if I explain it appropriately.
To finish my story: It turned out that the client had installed a new version of the compiler for their language of choice several weeks earlier. The new compiler included a subtly different interpretation of a highly questionable piece of the client's code. To take advantage of my update, the client had to recompile their code, revealing the problem. Solution: rephrase that code in a more common, supported idiom and institute a regular cycle of full rebuild and retest.Get weekly consulting tips in your inbox TechRepublic's IT Consultant newsletter, delivered each Monday, offers tips on how to attract customers, build your business, and increase your technical skills in order to get the job done. Automatically sign up today!
Chip Camden has been programming since 1978, and he's still not done. An independent consultant since 1991, Chip specializes in software development tools, languages, and migration to new technology. Besides writing for TechRepublic's IT Consultant blog, he also contributes to [Geeks Are Sexy] Technology News and his two personal blogs, Chip's Quips and Chip's Tips for Developers.