Part one of this series discussed troubleshooting basics. Part two outlined the practice of troubleshooting from the general to the specific. Part three addressed the methodologies used to narrow the scope of your analysis. In this article, the final part of the super geek series, I’m heading into the deep end of the troubleshooting pool. I’m going to share my secrets for troubleshooting the really tough problems, the ones that leave mere mortal geeks scratching their heads and wondering what they’ve missed. Learn how you can use the technique of half splitting to save your sanity when troubleshooting a difficult problem.
Start with documentation
When you are dealing with a relatively complex system, it helps to have a system diagram that shows the system components and how they relate to one another. That’s right, I’m talking about documentation! It doesn’t have to be anything complex, a simple drawing like the one shown in Figure A will usually work just fine. The important part is that it identifies all of the system's components.
|This took me about three minutes to create using Microsoft Visio 2000.|
The half-splitting method
With your documentation in hand and a tough problem on the table, it’s time to start isolating the problem. The most effective method of isolating a problem is called half splitting. Here’s how it works.
Using the system diagram in Figure A, let’s assume that Desktop 2 is not able to print to the printer. You’ve already gathered the symptoms and checked for high probability causes (correct driver, printer is online, network connectivity, and so on). It’s time to start eliminating components as possible causes. The list of possible causes for our hypothetical problem consists of six components:
- Desktop 2
- LAN 1
- LAN 2
Create a system flow diagram
First, label each component A through F, respectively. Then create a system flow diagram that shows the flow of output between each component (see Figure B).
|This system flow diagram indicates that the output of A feeds the input of B, which then feeds the input to C and so on.|
Start in the middle and work your way out
Next, look at the output of C and determine if it is good or bad. Let’s assume that the output of component C is good. You can, therefore, scratch A, B, and C off of your list of possible causes. Now, split your list again. Look at the output of E. If it’s bad, you know that it or D is the problem. If it’s good, you know that F is the problem.
While this may not seem like much of a time saver for this particular problem, imagine troubleshooting a system with 50 or 100 different components. When working on a system this large, the half-splitting method can dramatically reduce downtime. Table 1 may help to illustrate the point:
In programming, this is called a binary search, and the formula looks something like this: S = LOG2N, where S is the number of steps to find the problem with N components.
Continuing with the half-splitting method on our hypothetical problem, we now need to devise a test that will prove that half of the components are either working or not working. One common test would be to try printing from Desktop 4. If it can print, we know that the Server, Printer, and LAN 2 are all good. That leaves us with the Router, LAN 1, and Desktop 2 as possible causes. The next step is to see if Desktop 1 can print. If it can, you have eliminated the Router and LAN 1 from the equation. That leaves Desktop 2 as the culprit; however, we’re not done yet.
Examine each layer of a system
Remember back in the second part of this series when I discussed the layered nature of a system? Now it’s time to peel away the top layer and start the process over again with layer 2. The components of layer 2 (Desktop 2 in this case) that impact printing are depicted in Figure C.
|Authentication, permissions, and the local spooler are also possible problem causers.|
Next, we devise a test to half split these layer 2 components. The easiest test would be to have a different user log on to the computer and try to print to the same printer. If you are able to print using different credentials, the spooler is working. That leaves us with two possible causes—authentication and permissions. The next test is to have the user log back on to Desktop 2, making sure they authenticate correctly and have the necessary permissions.
The half-splitting process continues until you have identified the cause of the problem. The keys to successful half splitting are to stay on a single level, and devise a test that proves definitively that each component is either good or bad.
Remember that forgotten component
Half splitting can help you get to the bottom of a difficult problem, but sometimes this technique may not be enough. How many times have you started troubleshooting a problem and proven that all of the components are working properly? I’ve tried A, B, C, and D, and they all work. It happens, even to the best of super geeks. In almost every case, the cause is E, the forgotten component.
When you just can’t figure out what is wrong, start by taking a break. Walk away from the problem for a few minutes, clear your head, and start over from the beginning. You may have missed something along the way. Most of the time, starting over from the beginning will help you find the missing piece of the puzzle.
The end of the line
I hope you’ve enjoyed my secrets of a super geek series and will find the tips useful. I have been using these techniques for many years and have found them to be some of the most productive methods for troubleshooting computer problems. With a little practice, I know they can work just as well for you.
What do you think of this article and the “Secrets of a Super Geek” series? Have you been able to use Mike Sullivan’s methods when troubleshooting complex computer problems? Do you have a great tip for solving difficult problems? Share your comments and suggestions with your fellow TechRepublic members. Post a comment below or write to Mike Sullivan.