Microsoft

Picking Up the Pieces: Diagnose the Blue Screen of Death with Alexander SPK

Learn about a utility that will help you decipher the Blue Screen Of Death.


Imagine that you are working on your system without a care in the world and then, out of nowhere, Windows crashes and you get the infamous blue screen of death (BSOD). What do you do? Your first choice is to try to figure out on your own what the BSOD means by deciphering everything it's displaying. Your other option is to get your hands on a copy of Alexander SPK for Windows.

What's Alexander SPK?
Alexander SPK for Windows is a product from Alexander LAN that is designed for the sole purpose of helping you diagnose the BSOD. There is also a NetWare version of the product that helps you diagnose abend information on a NetWare server.

Deciphering a blue screen on your own is not impossible. However, doing so takes a lot of time, patience, and knowledge of the internal workings of Windows.

Alexander SPK, on the other hand, gives you all of the detailed technical information that you could ever want, but also translates that information into clear English so that just about anyone with minimal computer skills can understand the cause of the crash.

Acquiring Alexander SPK for Windows
There are actually two different versions of Alexander SPK for Windows. There is a Standard Version, which is designed for a single PC or server and costs $39.

The other version is the Enterprise Edition. The Enterprise Edition ships with a management console that you can use to diagnose remote machines. The base price for the Enterprise Edition is $499. This price includes the console, one server agent, and five workstation agents. Additional server agents cost $149, and additional workstation agents cost $20. The company also offers site and enterprise licensing. Both versions work only with Windows 2000 and XP.

Anatomy of a BSOD
Before I discuss the ways Alexander SPK for Windows helps you diagnose BSOD errors, I would like to take a moment and show you what a blue screen looks like and what it tells you. Space doesn't permit me to go into all of the intricacies of diagnosing a blue screen error, but, then again, that isn't really the point. Instead, I want to show you the types of information that are presented on the blue screen and then show you how Alexander SPK makes the information easier to understand.

There are actually four different types of blue screen errors. Figure A shows a typical BSOD. As you look at the error, the first thing you will notice is the bug check section.

Figure A
This is a typical blue screen error.


The bug check section is the portion of the blue screen that contains the actual error message. The bug check section looks something like the code shown below:
*** Stop: 0x0000001E (0xF24A447A, 0X00000001, 0X0000000)
KMODE_EXCEPTION_NOT_HANDLED
*** Address F24A447A base at f24A0000, DateStamp 35825ef8d - wdmaud.sys

The main things that you need to be aware of within the bug check section are the error code and the error symbol. The error code is the hexadecimal number that immediately follows the word Stop. This number may be followed by up to four other numbers. The error symbol is the word that follows the error code. In the error that I've listed above, the error symbol is:
KMODE_EXCEPTION_NOT_HANDLED.

In some blue screen error messages, the error symbol is followed by a memory location and a file name. This information indicates the memory location and file with which the error occurred. Whether you'll see this information depends on what type of Stop error has occurred. In some cases, you may only see the first line of the Stop error. This usually indicates a problem with the video services.

The next part of the BSOD is the recommended action section. This section basically just tells the user to reboot and try it again and, if the problem still exists, to find some updated drivers.

Admittedly, this is some pretty generic information, but it does occasionally work. Sometimes solving a blue screen error is as simple as rebooting the system or freeing up some disk space. However, it usually isn't that simple. Most of the time, solving a blue screen error involves working through the stack listing.

The stack listing isn't shown in Figure A, but you can see an example of a different type of blue screen error in Figure B. In this figure, the stack is the large section at the middle of the screen that shows what drivers were running at the time of the crash. Normally, you would use the order of the items on the stack to help determine the cause of the problem.

Figure B
This is a different type of blue screen error.


The last section in Figure A is the debug port information. This information isn't really pertinent to discovering the cause of the crash but rather for extracting crash information.

Using Alexander SPK
As I mentioned, there are four different types of blue screen errors. I wanted to show you the major parts so that you could compare them to Alexander SPK. Alexander SPK has four main sections, each accessible via a corresponding button at the bottom of the Alexander SPK console. These sections include:
  • Crash report
  • Drivers list
  • Stack list
  • Analysis report

Crash report
By default, when you open Alexander SPK, the crash report is displayed. You can also access the crash report by clicking the Crash Report button. You can see an example of the crash report in Figure C.

Figure C
The crash report gives you some basic information about both the computer and the crash.


As you can see in the figure, Alexander SPK gives you some basic information about both the machine and the crash, as well as contact information for the support vendor. More importantly, though, you will notice that toward the middle portion of the screen in the left-hand column, Alexander SPK identifies the driver that caused the crash.

Technically, this is the driver that caused the crash, but it may not necessarily be the faulty driver. The reason for this is that often drivers have dependencies on lower-level drivers. One of these lower-level drivers could be to blame. It is also possible that some other driver or module has corrupted the system's memory, and that the driver that crashed the system just happened to reference the corrupted memory area. There are many reasons why the crashed driver may not be the driver that actually has the problem. Even so, as Alexander SPK states in the Report Summary section, this driver should be considered the primary suspect.

As you scroll down through the Report Summary, you will see the actual Windows stop message. In the case of Figure D, this message is PAGE_FAULT_IN_NON_PAGED_AREA. You will also notice that, unlike Windows, Alexander SPK goes on to explain what this error message means.

Figure D
Alexander SPK explains what the Windows stop message means.


The next section of the report summary tells you how long the machine was down and what version of the Windows kernel was running at the time of the crash. As you can see in Figure E, Alexander SPK goes on to explain that other modules could have been responsible for the crash. Although it is not displayed in the figure, Alexander SPK lists the modules that were active near the time of the crash. While the module listed in the Crashed Driver section should be your primary suspect, any of the modules listed in this section could have caused the crash.

Figure E
Alexander SPK details what other modules could have caused the crash.


Drivers
If you trace a problem to a specific application, then the information on the Drivers screen, shown in Figure F, will be extremely valuable to you. You can access this screen by clicking the Drivers button.

Figure F
This is a list of all the drivers that were loaded into memory at the time of the crash.


Most of the time, you won't want to use the drivers list as your initial diagnostic tool. There are usually just too many drivers loaded into the system for one specific driver to stand out as the cause of the problem. However, if you have already determined that a specific application is causing the problem, you can go through the list and locate drivers that may be associated with that application.

The drivers list gives you the file name, size, date, and time stamp. By having this information on hand, it will be easier for you to work with the offending product's technical support department to determine whether you have the correct file versions.

Stack
One of the key things to check in your crash report is the stack information. You can access the stack information by clicking the Stack button. When you do, you will see all of the drivers that were on the system's stack at the time of the crash.

If you look at Figure G, you will notice that the modules Driver2, Driver1, and Driver7 were all loaded onto the stack. With the way Alexander SPK displays stack information, a module's position on the stack indicates the order in which it was initiated. For example, in this case, Driver2 would have been placed on the stack first, followed by Driver1 and then Driver7.

Figure G
These are the modules and drivers loaded on the stack at the time of the crash.


It might stand to reason that, in this example, Driver7 is the module causing the crash. While this may very well be the case, it is important to look at the stack as a whole. The information shown in the figure came from an actual crash diagnosis, but the module names have been changed for legal reasons. In the actual crash, though, Driver1, Driver2, and Driver7 all belonged to the same application. Therefore, although Driver7 might have triggered the crash, the entire stack may have been unstable because of other bad modules associated with the application.

Analysis report
The Analysis Report screen, shown in Figure H, is a sort of second summary screen. While the main crash report focuses on detailing the crash in plain English, the Analysis screen displays raw debugger information, similar to what is presented on the actual BSOD, with very little explanation inserted. You can access this screen by clicking the Analysis button.

Figure H
The Analysis Report screen is a reprint of all of the crash debug information.


Putting Alexander SPK to good use
As you can see, Alexander SPK makes it easy to see what was in memory at the time of the crash. Occasionally, Alexander SPK will point to a Windows module as the cause of the crash. Assuming that you are running Windows XP or Windows 2000 with the latest service pack, the chances of a Windows module actually being to blame are slim. Most of the time if a Windows module is reported to be the problem, then one of two things has happened.

One possibility is that a poorly written driver has taken down Windows, but has affected the module that is reported to be the cause of the failure. The other possibility is that your system has a hardware problem. When hardware is to blame for a BSOD, memory is usually the offending component. Memory errors are fairly easy to spot because the machine will often behave erratically. Blue screen errors will often be very inconsistent when memory is to blame.

If you find that an application or a bad driver is causing your problems, then you may find yourself caught in the blame game. This is where the application's manufacturer blames Microsoft or your PC's configuration for the problem and Microsoft blames the application's manufacturer.

However, my favorite thing about Alexander SPK is that it can put a stop to the blame game. Alexander SPK collects so much forensic evidence about the crash that it would be difficult for a company to argue with you once you have found a problem with its software.

Editor's Picks