It is difficult to begin solving a computer problem unless you can reproduce it. Some issues will appear only when the machine is working hard, so here is an easy way you can throw a CPU into overdrive.
Some computer problems are trickier than others to track down. It’s important that techs have several strategies at their disposal for ferreting out system faults. Last week, I discussed how we could use known-good software to determine whether problems are hardware-related or not. That’s a fine strategy when a failure is immediately reproducible, but what if the customer’s problem is intermittent?
I think every support tech has had this experience: You receive a call from a customer in a panic because his machine has crashed. Your client is stressed because he is under a tight deadline and can’t afford any more problems right now. You respond with a visit to his work site, and ask him to demonstrate his problem. He tries to…but the machine won’t repeat its earlier antics. “I swear it happened!” exclaims your client. “Of course, the machine works now that you’re here!”
While it would be nice if we could fix machines merely by standing near them, it’s important that techs reassure their customers that some problems are complicated and not easily reproducible. No one thinks they are crazy; it’s just that the circumstances where some faults appear are very specific.
For instance, there are various problems a machine might exhibit only when under heavy computing load. A machine that has just rebooted or is idling can behave completely differently than a system that is working hard. Stress-testing a machine can reveal issues that aren’t obvious under cursory examination.
My favorite way to put a machine under stress is quick and easy. There is a command built in to Unix-like operating systems called “yes.” It was originally designed as a way of automating responses to interactive command-line programs. An interesting, if unintended, side effect is that running “yes” with no arguments on the command line will spike a CPU to 100 percent. To stress multiple processors, run the command in additional terminals.
There are more advanced benchmarking and system testing applications available, to be sure, but I can find “yes” on every Unix-like system I support. I’ve also found it handy to run the command when booted from a Linux Live CD or DVD to verify that a machine will fail under stress even when running known-good software.
I like using “yes” for simple stress-testing because it’s handy and pretty much idiot-proof. If you have another tool you like to use to put a machine through its paces, let us know about it in the comments.