Apps optimize

The art of the small test program

When your application fails, Chip Camden suggests creating a Small Test Program because it will help vendors and developers help you. Here's what is involved in creating such programs.

It's happened again. No matter how carefully you've tested each capability of the language, the library, or the framework you use. No matter how religiously you've built unit tests for each component. When you finally brought it all together into an application masterpiece, you get a failure you do not understand.

You try every debugging technique you know; you rewrite and simplify the most suspect passages; you stub out or eliminate entire components. Perhaps this helps you narrow the failure down to a particular region, but you still have no idea what's going wrong or why. If you have the sources to the language or the library, you may get a lot further than if they're proprietary, but perhaps you still lack the knowledge or the documentation to be able to make enough sense of the failure to solve the problem.

It's time to get some help. You post questions on the fora, or you contact the author/vendor directly, but they can't reproduce the problem. (Somehow, you knew that would happen.) They want you to send them a reproducing case. You direct them to your entire application, and the problem never gets resolved, because it's just too much trouble. The end.

Okay, we don't like that ending. How can we rewrite it? In the case of paid support we can stomp, yell, and escalate to force the vendor to spend time on the problem; but if it turns out to be too difficult to get the entire app running and debuggable, then they can still plead "unreproducible." There is only so much that a vendor can do. Even if they stay on the problem, it could take a very long time to get to the bottom of it. Fortunately, there's something we can do to help the vendor help us: It's called the Small Test Program (STP).

"Whoa! Wait a minute! We already removed everything extraneous when we were attempting to debug this!" I hear you cry.

That may be true, but our goal then was to rule out other causes. You can almost always do more by shifting the goal to reducing the footprint of the test case. The two goals sound almost the same, and they overlap a lot, but they don't cover entirely all the same ground. In the first case, we were trying to do everything we could to help ourselves to solve the problem. In the second, we want to do everything we can to help the developer to solve the problem. That means we need to take the following steps:

  • Remove reliance on a specific configuration. No doubt you've customized your development environment with all sorts of shortcuts and conventions to save yourself time; every one of those costs time, though, for someone who isn't familiar with them. You either need to remove those dependencies and create a more vanilla example, or provide an instant setup for them that won't be invasive. For instance, if you need the user to set certain environment variables, provide a script that does that and then launches the app. Preferably, eliminate the dependency on environment variables altogether -- dependencies can add to the confusion by being set in more than one place, or not getting exported properly.
  • Eliminate all custom or third-party components that you can. You should have already done this, but it becomes even more important when submitting a failure. External components attract the finger of blame -- as they should, because they often cause unforeseen problems. Rule them out. Furthermore, if the external components require installation and setup, that delays the developer from being able to look at the problem. Developers often have trouble getting these components to work on their system, which is all wasted time if they didn't really need them to begin with.
  • Reduce the number of user steps required. If you think that one or two runs through the test case will reveal the problem, then your name must be Pollyanna. If they have to run your test a thousand times, every minute of elapsed execution time costs two work days. It's actually more than that because people are human -- every time the developers have to restart a long, arduous set of steps, they need a pause to sigh and wonder where they went wrong in life.
  • Clearly document the steps required. I don't know how many times I've received something claiming to be the steps to reproduce a problem that reads "Run the app." Unless the app is so simple that it requires no setup or interaction, and the failure is so obvious that not even [insert archetypal clueless person here] could miss it, this instruction will fail to reproduce. No matter how apparent it may seem, include every step -- every setup command, the command to launch the app, and every input required. If you followed the previous steps, this shouldn't be much.
  • Reduce the number of lines of code executed as much as possible. Maybe the entire program runs in two seconds, but if it executes 30,000 lines of code, then that's at least 30,000 possible causes that the developer may have to rule out. Furthermore, it complicates debugging. If you can get the entire program down to "step, step, kaboom!" then you're gold.
  • Include clear indications of failure. Don't presume that the developer will recognize immediately that your Weenie Widget is 10 pixels too short -- tell them so in the steps. Ideally, the application should scream out "Here's where I'm failing!" when it's run. Use assertions, or at least a printf or message box.
  • Include clear indications of success. How many times have I solved a problem presented by a test program, only to run into another failure immediately afterward? Did I fix a problem that they weren't reporting, and now I'm seeing the one they meant? Usually, they know about the second one, but they just didn't bother to prevent it since they had reproduced a failure with the first one. This is bad form. Ideally, you want your test program to be tailor-made for inclusion in a test suite so the same problem doesn't get reintroduced. For that to happen, it needs to cross the finish line with flying colors. Let there be no doubt that it was successful.
  • Test your test. Run through the test as if you were the developer assigned to work on it to make sure you didn't forget anything. Don't run it on your development system, because your environment might be set up in a way that the developer's isn't. Use a virtual machine with a vanilla configuration to run the test and make sure it fails in exactly the way you intended. It could save you a few email round trips and avoid giving the impression that you don't know what you're doing.

Why you should create an STP

Why should you put the extra effort into creating an STP? It's their bug, after all. Let them find it and fix it.

Most of my clients are software developers, so I've looked at this issue from both sides. I've been the recipient of hundreds (perhaps thousands) of failures to solve over the last 20 years, and I've had to submit my share of them to numerous software providers. I can tell you from my experiences that more than anything else -- more than whether you pay the vendor to support the product or how much, more than all the screaming and yelling you can muster, more than the all the flattery you can lay on them, more than any reputation they may have for responding in a timely manner -- the single most influential factor in determining how quickly the developers will resolve your problem is how clearly and concisely you've demonstrated the failure.

So, the next time you need to submit a problem report, remember the immortal words of Steve Martin: "Let's get small."

Note: This TechRepublic post originally published on January 10, 2011.

Keep your engineering skills up to date by signing up for TechRepublic's free Software Engineer newsletter, delivered each Tuesday.

About

Chip Camden has been programming since 1978, and he's still not done. An independent consultant since 1991, Chip specializes in software development tools, languages, and migration to new technology. Besides writing for TechRepublic's IT Consultant b...

31 comments
phscnp
phscnp

These techniques are wonderful but I find commercial frameworks defy the "small example" approach. They usually suffer from code bloat (to process all the exception cases) and have surprising metadata and data dependencies. Frameworks with lots of "configuration by schema manipulation" can be especially painful because the problem could be shaky code reacting to years of accumulated schema changes followed by introduction of that one "special" record among five million. Are there special patterns for developing small examples for COTS frameworks?

minstrelmike
minstrelmike

In my experience, if I cannot duplicate the error, then I cannot fix it. Duh! Even if I did fix the problem, how would I know unless I could duplicate it? That's always been the way I analyze issues, get them down to the fundamentals where they are still issues. If a reinstall on a clean machine works, then the old machine _is_ the issue. don't even try debugging. What the article says is: if you're a developer trying to convince some other developers there's a problem, you need to do the same thing Joel Spolsky says about reporting a problem. 1. State where you are in the software before the problem occurs. 2. Give exact steps to get to the error you found 3. State what you expected to see instead. (sometimes an error isn't a software error). Often times (as others have already posted), merely describing the problem to co-workers will give you yourself an Aha moment. Or one of them will ask the obvious Is it plugged in or is the variable really initialized or my favorite error of all: are you testing on the same server you're coding on?

JohnOfStony
JohnOfStony

How refreshing to encounter an excellent article with valid comments - no bitching at other commenters, no political rantings, just good logical valid comments. Thanks everyone. Maybe it says something about the type of person involved in software development?

davidibaldwin
davidibaldwin

Maybe it's because I'm a self taught programmer (and mostly PHP web stuff) but I build all my applications from small pieces that I test and make work. And as much as possible, I try to keep the parts as independent from each other as I can. That makes them easier to test when you find you have missed something. I wouldn't (can't?) build a larger program without little tested pieces.

mikifinaz1
mikifinaz1

First some context I started testing back in the day, one of the guys that taught me was reputed to have been one of the IBM "black gang" of UNIX testers. I have worked as everything from a black box tester to test manager and then consultant; the scenario you use is good and bad. I think that you would need a whole book to cover the gamut of topics most testers face and the one you picked was not a common situation. It fit for the purposes of the article, however most testers don't see alpha code. I was an alpha tester once and at this point it is rare that you encounter this as a final cut problem. You may see it at the kick off of beta testing and often either you get a call to fix it or junk it. The guiding principle is money and resources. How badly do they want it and how much will they pay or be willing to risk. I once saw a bug in financial software that didn't allow you to remove signatory level users. Once they were entered, they were there forever. Imagine the risk of a bad guy who had one been a top level user gaining access again to do havoc. This show stopper was in a final release to the customer. The company still decided to ship the product. This was a rare event, in decades of testing I only saw it done twice. The vast majority of testing finds the mundane i.e. dialog box X doesn't work, or users with browser Y won't be able to see your site etc. The key to this seems to be documentation and notification. CYA is vital. Once you get to this juncture the tester/manager/consultant needs to stop and get the major contributors in a room eye ball to eye ball and speak slowly so that everyone is on the same page so the choices are clear. Are we in or out? Will we fix it or not? Will we accept the risk of NOT fixing it and hope for the best and who will sign off on the decision. I have seen bugs that were flawlessly accurate, but the tester didn't convey the importance of the issue and it slipped by until the customer found it (Oops!). The scenario you present shows the mechanics and dilemmas of dealing with this sort of issue, but doesn't inform the reader that these situations, often turn on the "people business" of business and can make or break reputations. More than just being accurate in issue reporting and dealing with the problems; you have to let people know that this is a show stopper and major problem that requires a major decision of the stakeholders.

robin
robin

The person detecting a defect (in testing or production) has the obligation to (1) confirm the defect is in the code and not in the test/usage and (2) provide sufficient information to enable the developer to reproduce the defect. Information needed to reproduce a defect includes at a minimum the steps taken and related data values plus the status of relevant environmental variables. Defect isolation also can include the valuable identification of conditions which do or do not seem to affect the defect. Chip and some of the comments have elaborated quite nicely on many of the mainly environmental conditions that commonly reveal defects. Please note that what is commonly called a user error often actually represents a design/code defect that allows the user to make such an error.

oldbaritone
oldbaritone

When I can't find a similar problem online in blogs, etc, I usually try to duplicate the error on a "Clean" machine. I have a couple of test boxes, and I can reinstall an out-of-the-box factory image in a few minutes, then try to cause the failure on the new install. If the failure can be duplicated, then I can call back and say "on a new system, I do X, Y and Z and the failure occurs." As Chip mentioned, development systems tend to accumulate a lot of "old stuff" over time, so the Clean Machine method eliminates those possible causes. If I cannot duplicate the fault, often I can identify the source of the issue, based on "what's different on my machine, compared to the 'clean' one?" Then the tough part is being able to talk with someone who can understand the description. As a developer, I've learned to be VERY careful with "case else". Instead of using else, I have the logic enumerate all of the (expected) possibilities, and leave "else" as "this shouldn't happen" and raise an error. Most times with just a little investigation, we find that some set of conditions arose that we didn't cover in the logic. A "this shouldn't ever happen" error immediately helps us localize the area of the faulty logic. That shortens the back-track time considerably.

russ.welsh
russ.welsh

A problem well stated is a problem half-solved.

Tony Hopkinson
Tony Hopkinson

for refactoring legacy code, sort of a unit test but with it's own little UI and without the major rework required to get unit testing in which my bosses are permanently temporarily reluctant to countenance at this time. Most recent one was to investigate a faukt with a third party component that happened in some scenarios in when running as a 32bit app in 64 bit system. It works, and I recomend it too. Proved the fault couldn't be fixed and that the vendor was talking bollocks, so got rid of it altogether, and refactored and in the process fixed a couple of other issues as well. Win, win, win time.

Sterling chip Camden
Sterling chip Camden

It'll get you to the finish line quicker. Do you have any LTP (or VLTP) horror stories?

Tony Hopkinson
Tony Hopkinson

More of a doh, than a duh While I'm not particularly comfortable with the 'erm success, my bosses have been delirious, (probably why it happened in the first place. :( ) The method Chip outlined has helped me do it.

Sterling chip Camden
Sterling chip Camden

Please note that what is commonly called a user error often actually represents a design/code defect that allows the user to make such an error. Agree as far as practicable, but in some cases this "design defect" would be the inclusion of a keyboard and mouse.

Tony Hopkinson
Tony Hopkinson

The minimum set of reproduction steps is a lovely thing to pass along. A mini test app in the context we are discussing would be to investigate an issue where despite being able to reproduce it, the poor dev can't say what's wrong... Something like NumberOfThingumyBobs cannot be < 0 is the error and there's code all over the place 'counts' them. And they run in different orders, possibly no defined order at all in some cases. A mini test app, you start with all the code, and then you chop lumps out until you are only left with the code with the error.

jhoward
jhoward

A "clean machine"/VM usually helps me figure out external dependencies and out of date library versions very quickly. Seems pretty simple but very useful when your dev machine has 4 beta versions of the same library installed and you can't seem to remember which stable version everyone else has installed.

Sterling chip Camden
Sterling chip Camden

Nice tip -- yep, when you use "else" or "default" to handle a known case, then you are assuming that there can be no other cases. And we all know about "assume".

Justin James
Justin James

One of the best debugging techniques I've found is merely telling someone else about the issue, describing the symptoms, environment, etc. in detail. Many times, it's that simple mental walkthrough that leads me to the solution. J.Ja

Ian Thurston
Ian Thurston

I'm guessing that "faukt" in your post was a typo ... but I've been "faukt" by faults so many, many times. I think that should be added to the lexicon! Anyhow, I've been running little test stubs since, oh, 1969. Helps, and sometimes the process leads me to ask "why do we need the more complicated version at all?". The less the guts, the less the bellyache.

Justin James
Justin James

... this has ALWAYS been one of my preferred testing techniques. I wasn't aware that there were people who don't do it, but I suppose that with the unit testing infatuation combined with the automation of testing, it's not surprising that a lot of people don't do this. J.Ja

Sterling chip Camden
Sterling chip Camden

Right. I should have mentioned that a binary search approach can often be helpful. If you have little or no indication of what part of the code causes the error, cut out as close to half as you can and see if it goes away. Proceed by halves until you isolate the offending bit. Then isolate that part to an STP.

phscnp
phscnp

I had a friend who swore by explaining the problem to his dog first. The dog never interrupted or jumped to conclusions, and the act of explaining often crystallized the issue.

JohnOfStony
JohnOfStony

Justin James has hit the nail on the head. Explaining to someone else what you're doing in the suspect section of code often solves the problem. You could even explain it to a cardboard cut-out although I've never tried that. On the subject of words, (e.g. faukt), I must say to Chip that I've never before encountered your plural of forum, "fora". In true Latin tradition you're right but everyone else seems to use "forums". I'll bet you've studied Latin at some time!

Sterling chip Camden
Sterling chip Camden

... known as "getting a fresh perspective." Sleeping on it, taking a walk, working on something else for a while, etc. are other flavors of the same phenomenon -- they're all ways to get your inner conversation about the problem out of a failed loop, although talking it over with someone else is a bit more active approach than the others.

Sterling chip Camden
Sterling chip Camden

Unfortunately, a lot of people still don't test in any methodical fashion at all. It's one of my sore points.

minstrelmike
minstrelmike

forum:fora aquarium: aquaria like cacti: cactus, I often use the manufactured plural: doofi for large groups of certain types of people.

apotheon
apotheon

That just faux everything up.

santeewelding
santeewelding

You conspire to confound the filters. That would make you, "crackers".

apotheon
apotheon

> There's no a in fukt... There is in "faucked", though -- right?

Tony Hopkinson
Tony Hopkinson

get to know each other a LOT better, before I let you hold my bollocks anywhere...

techrepublic
techrepublic

I'm adding faukt to my lexicon. As an aside, I enjoy reading the articles and comments of Chip Camden, Justin James, and Tony Hopkinson. Although I suspect Tony is a Brit, since he uses words like "bollocks", I won't hold it against him. grin