IT Employment

Improve your troubleshooting efficiency by using known-good software

William Jones uses the "binary search" methodology to quickly deduce the cause of technical issues. One sure-fire way of eliminating the hardware vs. software question when troubleshooting a problem is to use known-good software for testing purposes.

The most important step in resolving computer problems is isolating a root cause of the issue. One way to get to the heart of the matter is by attempting to replicate the problem when booted from software you can trust.

-------------------------------------------------------------------------------------

I first heard the term binary search when I was taking a computer programming course in college. Essentially, binary search is a tactic used to find a specific value by evaluating subsets of all the possible values. If you can keep eliminating half the possible solutions from consideration, you quickly arrive at the answer you seek.

When answering a support call, the first question I ask myself kicks off a binary search of sorts: Is this a hardware problem or a software problem? The sooner I can remove a class of failure from consideration, the fewer possible solutions I have to contend with.

Sometimes the answer to my initial question is simple. If a monitor is smoking...well, there you go . . . that's likely a hardware problem. But most computer issues are not so obviously solved. Since our computers use software to interact with the various internal and external components, what appears to be a hardware problem could be attributable to a corruption in device drivers or operating system components.

So, once I've replicated my client's problem in their software, I like to use software of my own to see if the issue persists. I carry two types of bootable media: a CD/DVD and a USB hard drive. The USB drive lets me carry a complete OS installation with a full load of device drivers and diagnostic utilities. The optical disk is usually a "live" Linux distribution that can boot many machines using generic drivers. The key to using known-good software for troubleshooting is making sure that the software is trustworthy. I reimage my USB drive frequently and make sure my test software is patched. If I'm concerned about the health of the network, I usually use my optical boot disk. It's read-only, so there's no chance I'll carry away something unsavory from an infected machine.

If a customer's problem persists when booted from my software, I begin to presume that a hardware failure is involved. If using my boot disk makes the problem disappear, then I know to take a closer look at the health of my client's software installation.

Testing with a known-good component is vital to isolating hardware issues. By using the same principle to test your clients' software, you can make sure you are getting to the heart of the issue in the most efficient way possible.

16 comments
Jackmagurn
Jackmagurn

Whats the most reliable bootable usb image tool do you use?

richardstevenhack
richardstevenhack

I pretty much do the same thing. I carry several bootable CDs with Ultimate Boot CD for Windows, Knoppix, Ubuntu System Rescue, Windows Bootable Recovery Console, TuffTest Pro, hard drive manufacturer diagnostics, etc., as well as recently added 8GB USB flash drive with a full install of openSUSE and another 2GB with Ultimate Boot CD for Windows installed on the flash drive, as well as two 4GB flash drives each identically loaded with nearly 250 utilities and another 8GB running BackTrack 3 security Linux distro (which I also have on CD). There's not much I can't do with that stuff! This past week one of my client's iSCSI boxes running OpenFiler went down with a kernel panic. Bad news - kernel panics are a nightmare to debug. Tried booting Ubuntu System Rescue, no go. UBCD4Win, no go. Then I noticed it was a different kernel panic each time the system failed to boot. That and the fact that nothing would boot indicated hardware issues to me, probably memory. Ran Microsoft Memory Diagnostic bootable CD - bingo! Bad memory. Got two new sticks, machine's fine now. As for the fellow having problems above, I had a client machine that would shut down randomly. I tested that it was receiving power, BUT the power was not persistently good which my detector did not spot. But since he switched power supplies, that probably isn't it in his case. But keep in mind you need to test the QUALITY of the power, not just that it's getting power. A multimeter is important.

hidayat.syah
hidayat.syah

Good artikel. Expert sometimes forget this basic step and circling around until lastly decide to start all over with this '101' technique. I use 15 minutes rule. If I cannot figure out where to go in 15 minutes, then start using this technique to reduce problem variables.

mackbolan777
mackbolan777

I recently built a machine using a biostar gf8100 m2+te & here's the issue. No overclocking,tried different power supplies,booted with windows vista 32 originally,was fine but intermittently it will shut down. All hardware seems fine,used tuff test pro to check the hard drive,tried reformatting & installed vista ultimate 64 bit,same issue. Cut the power swith off & tried running it by touching the wires together,boots fine,but still this problem of impromtu shutdown. Could be writing an email,surfing,gaming & boom shut down. All temps are normal when this happens,tried several drivers to try resolve the issue,no dice. I'm at a loss here. Thinking it's a bad motherboard. Oh,& tried 2 different power supplies. Is there a way to find out for certain it's a motherboard issue? Shooting from the hip is not working. It never shuts down when running the dos based tuff test program,only during a windows session. Got ideas? I'm going to try the biostar site suppport for answers as well. Thanks,Pete

Triathlete1981
Triathlete1981

Good post for a newbie I guess. But this seems like Troubleshooting 101.

oneoar51
oneoar51

In my experience vista has a lot of ACPI issues. After I exhaust my other options, I set the machine to NEVER sleep in the power options. I believe the "spend all my money on power" setting is the one you want.Although the monitor goes down the trouble with Vista interacting on a low level with hardware goes away.

ideason88
ideason88

...if you have one handy to make sure it's not a power supply (as in electric power from the socket) problem. Also check the power switch on the case - I had a bad switch once that would stick and cause problems. If not that then MB failure would be my next guess - unless the MB is being shorted by incorrect mounting.

jjcanaday
jjcanaday

I had a similar problem on a new machine. I had purchases with Case, mobo, cpu, and ram installed so I sent it back under warranty. They reported back that it was loose screws holding mobo to case. I had heard of that being a problem before (if metal screws into metal posts). They replace all screws w/ nylon and computer has worked fine for months now. YMMV, of course.

anderiousb
anderiousb

Hi, Since you've tried fresh install of Two different versions of Vista, changed the power supplies to no avail, I would take a look at the memory module that you in you systems. Swap the memory modules, change slots if possible and see if that would make any changes. You can also download memtest program for free from http://www.memtest86.com/ and burn an ISO image and boot the system with it to test your system memory. If Memtest doesn?t reveal any problems with the memory modules and you're still suspecting its a Motherboard problem, then you might be able to pin down the problem by using PCI Hardware Diagnostic card also know as PC Postcards. Have a look at this site http://www.uxd.com/hardware-diagnostics.shtml. I hope my information helps. Good luck.

Tony Hopkinson
Tony Hopkinson

I had one that did this turned out to be a bad ACPM driver type thing, turned up after a winders update.

bkdirks
bkdirks

You tried running it by "touching the wires together"??? Wow. Maybe you should be in the explosives business rather than in PC support.

Tony Hopkinson
Tony Hopkinson

fault finding ability is wild guess, or this wild guess worked last time. Debugging especially, drives me buggers.

santeewelding
santeewelding

To get to 201, 301, and 401, how do you manage to do so without direct reference to 101? Or, for that matter, pulling your pants up before you appear in public.

iconoclastic
iconoclastic

yes, the concept is basic. I believe the author is highlighting the method he uses. There are certainly many ways to implement the concept, having multiple options increases our toolbox inventory, so to speak.

Nunob
Nunob

I would also try to upgrade the bios for your system board from their website. Also I would pull the heatsynch and clean off all the thermal paste carefully then apply a new coat to the cpu. But what it really sounds like to me is a bad stick of RAM sisnce it runs fine in DOS mode which uses a lower RAM space than the GUI. I would start by pulling one stick of RAM at a time and run it like normal to see what happens. I have used several of the RAM testers out there over the years but I have had mixed results with them. I did once see a PC shop that had an actual hardware RAM tester that did an awesome job of testing but all the software ones that are testing the RAM in a system that is already unstable is a bit of a crap shoot to me. Hope that gives you a direction to go. Redmember test one thing at a time and undo your changes if it doesn't fix your problem. :)

Tony Hopkinson
Tony Hopkinson

is a classic troubleshooting failure. It's generally less likely with hardware, but in programming and configuring it's a total killer.

Editor's Picks