When the network crashes, the first thing most administrators do is panic. You don’t have to. On Sept. 19, Mark Russinovich, chief software architect for Winternals Software, discussed survival strategies for when the unthinkable happens. If you couldn’t join us then, enjoy the transcript, and we hope to see you on our next live Guild Meeting. You can find a schedule of Guild Meetings in your weekly TechProGuild Notes TechMail or on the Guild Meeting calendar.
When the network crashes, the first thing most administrators want to do is panic. You don’t have to. On Sept. 19, Mark Russinovich, chief software architect for Winternals Software, discussed survival strategies for when the unthinkable happens. If you couldn’t join us then, enjoy the transcript, and we hope to see you on our next live Guild Meeting. You can find a schedule of Guild Meetings in your weekly TechProGuild Notes TechMail or on the Guild Meeting calendar.
Note: TechProGuild edits Guild Meeting transcripts for clarity.
Welcome to tonight’s discussion on WinNT/2000 system recovery
MODERATOR: Welcome to the Guild Meeting! Let me do a little housekeeping, and then we’ll get started. I want to remind everyone that we will be picking a winner tonight—that is, the person who asks the most questions and the most interesting questions will win the meeting. At the end of the month the person who has won the most meetings walks away with a motherboard and CPU courtesy of pogolinux.com.
Now, tonight’s speaker is Mark Russinovich from Winternals. Mark, before we start, could you tell us a little about yourself and what you do?
MARK RUSSINOVICH: Sure. I’m chief software architect (and cofounder) of Winternals Software, a company that specializes in Windows NT system software (www.winternals.com). I also run the Sysinternals Web site (www.sysinternals.com), am a contributing editor for Windows 2000 Magazine, and am coauthor of the just-released Inside Windows 2000, 3rd Edition from Microsoft Press.
MODERATOR: And here’s a shameless plug for Mark’s book. You can find it at http://www1fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0735610215.
Win2k Recovery Console
MARK RUSSINOVICH: Has anyone had a blue-screen of death (BSOD) crash within the last week and want to share the story? Or has anyone had an (unfortunate) opportunity to use the Win2k Recovery Console (RC) yet?
DANIEL.FORTIN: Nope.
JEFFREY.FEENEY: No, sir.
MHLEWISJR: Not yet. Win2k works fine, so far!
SPIDER1: What type of things would the Recovery Console be used for? I’ve had blue screens where I’ve run the CD and used the second repair option.
MARK RUSSINOVICH: The Recovery Console is essentially a command-line interface that you can boot into from the Win2k CD so that you can perform low-level repair, like enabling and disabling services, fixing MBRs, and updating files.
In fact, I’d like to take credit for the Recovery Console. In 1998 we (Winternals) released “ERD Commander,” a utility that lets you create a modified set of NT 4 boot-floppies so that you can boot into a command-line environment like the Recovery Console. Microsoft liked the idea so much that they “borrowed” it. The RC works by booting a bare-bones NT kernel on top of which runs a command prompt. Thus, all the drives of the system are accessible with mostly familiar commands.
DANIEL.FORTIN: So how does it work?? What are the tools inside that shell that are available?
MARK RUSSINOVICH: The RC includes the basic file system command set you’d expect (CHDIR, RMDIR, DEL, COPY, etc.). It includes a few other commands like FIXMBR for repairing MBRs and DISABLE for disabling a driver or service that might be preventing the system from booting.
NT BSODs
MARK RUSSINOVICH: Any recent NT 4 scenarios where you ended up with an unbootable system? (If no one in the room answers positively to this, then you’re all very lucky!)
MHLEWISJR: Sure…a user could not get Real Player loaded correctly.
JEFFREY.FEENEY: Not yet, but we have recovery disks just in case.
AJFRANCIS35: I had a customer with a blue screen right after NTLDR. I don’t remember the exact error. (It was two months ago.) I had to install NT into a separate folder and then run ERD from C:\Winnt\Repair folder to recover his registry.
MHLEWISJR: I have only two machines on an NT server.
Win2k environments
MARK RUSSINOVICH: Who hasn’t moved to Win2k in their environment?
JEFFREY.FEENEY: We plan to make the move in the first quarter of 2001.
MARTINCHURCH: I have yet to make the move to Win2k. Currently I have a mixed bag of Win95, Win3.1x OS/2, and Win NT 4.0.
Verifier to the rescue
MODERATOR: Mark, I had one TechProGuild member who couldn’t make it to the meeting but wanted to know what kinds of situations might cause the BSOD. In other words, are there bugs to be watching for that affect recovery?
MARTINCHURCH: Most of my BSODs come from print drivers and Iomega Zip drives.
MARK RUSSINOVICH: The most common cause of blue screen results from a third-party driver that suddenly exhibits a latent bug that crashes the system.
JEFFREY.FEENEY: Wouldn’t there be a pilot environment to catch that?
MARK RUSSINOVICH: One of the problems of the NT/Win2k device driver model is that it is very complex. Tiny bugs can slip through stress testing and debugging into the field pretty easily…that is, until the Win2k Driver Verifier came along. The Verifier is a tool that intercepts driver functions and validates them against common programming mistakes.
DANIEL.FORTIN: Where do we access this Driver Verifier?
MARK RUSSINOVICH: The Verifier is installed by default. Just type ‘verifier’ at a command prompt or in the Run dialog box. The Verifier is mostly useful for developers, but if you suspect a driver of causing your system to crash you can selectively verify it. When the Verifier catches a driver performing an illegal operation, the BSOD will flag that driver as being buggy.
Because of the Verifier, Win2k drivers are much more reliable than NT 4 drivers. In fact, to get a driver “signed” by Microsoft’s Hardware Quality Testing Labs, a driver must past Verifier testing.
Other reliability benefits of Win2k (not to sound like a Win2k commercial) include the RC and “Safe Boot,” which is a version of a boot that includes only a minimum driver set.
MHLEWISJR: Is that why drivers from my older machines are giving me fits with Win 2k… Verifier is not letting them load?
MARK RUSSINOVICH: No, Verifier is not enabled unless you enable it. If by “fits” you mean that the system crashes, that’s just a bug in a driver rearing its ugly head.
MHLEWISJR: Understood!
SPIDER1: Turning on Verifier uses a fair amount of drive space, does it not?
MARK RUSSINOVICH: No, Verifier doesn’t use any disk space. If you enable “special pool checking,” then the memory requirements of the drivers you verify will go up.
Recovery Techniques
SPIDER1: My company uses image CDs on several thousand workstations. They’re mostly NT, but we’re gradually moving to Win2k. Are there any decent recovery techniques when the install CDs are not onsite? Presently we rely on fast reimages, but we frequently lose some data even though we teach saving to the server.
MARK RUSSINOVICH: You can install the RC onto the hard drive. If you’re in an NT 4 environment or if you want something a little more powerful than the RC, then our (Winternals’) ERD Commander 2000 fits the bill.
SPIDER1: I assume you need to be able to get to the boot options? What services can you not disable in order to boot, and where would we find a list of services that can be disabled?
MARK RUSSINOVICH: There’s no good source of what drivers are critical to a boot. A basic guideline is that drivers marked as “boot load” (they have a Start value of 0 in their Registry key) are needed to boot. However, there might be other drivers that are System Start (with a Start value of 1) that are needed.
MARK RUSSINOVICH: The safe boot I mentioned includes driver “groups” and drivers by name that Win2k considers critical to booting. You can see that list on a Win2k system under HKLM\System\CurrentControlSet\SafeBoot.
SPIDER1: Thanks. I’ll look at that key.
FIXMBR
MARTINCHURCH: I am interested in the FIXMBR command. Is this available in the NT 4 environment?
MARK RUSSINOVICH: No, FIXMBR is not available for NT 4. In any case, it is somewhat limited in that it won’t repair a partition table. It just fixes the MBR boot code if it’s damaged.
Disabling Services
DANIEL.FORTIN: Suppose that we don’t have a driver prob. What is the process to turn off services?
MARK RUSSINOVICH: There are commands similar to the ones used to disable drivers in the RC and ERD Commander 2000 for disabling services. If the system boots, however, you can just use the Service Control Manager to disable a service.
DANIEL.FORTIN: If the system crashes every time, what can we do?
MARK RUSSINOVICH: Unfortunately, there are no automated tools for determining what causes a crash. One approach is to disable services or drivers you are suspicious of in turn.
DANIEL.FORTIN: What does the command look like?
MARK RUSSINOVICH: I don’t remember the exact command in the RC, but it’s similar to ERD Commander 2000’s: “service <badservice> disable”, or “driver <baddriver> disable”
DANIEL.FORTIN: Thanks.
Dumpexam
MARK RUSSINOVICH: Has anyone tried using Dumpexam to get information from a crash dump?
MHLEWISJR: I thought Dumpexam was just like NT’s crash dump file? Isn’t it all gobbledygook?
MARK RUSSINOVICH: Dumpexam basically runs a bunch of kernel debugger commands against the memory image, including ones that dump the list of active processes, memory usage, the current thread trace at the time of the dump, the list of loaded drivers, and the active process at the time of the dump. Often, there are clues to be found in that information.
MARTINCHURCH: How do we do that?
MARK RUSSINOVICH: You first configure your system to produce a crash dump when it BSODs (obviously). Install the symbols for the installation (specific to the Service Pack if appropriate), and then run Dumpexam, pointing it at the symbols and the crash dump file. Dumpexam is on the NT 4 CD under the support tools directory, and it’s on the Win2k support tools CD.
MARTINCHURCH: I’ll have to examine these files in more detail. What is the best procedure for using them?
MARK RUSSINOVICH: Do just what I said earlier: 1. Enable crash dumps (on Win2k you want kernel-memory dumps or full dumps). 2. Install the symbols for the installation. 3. Run “dumpexam -Z dumpfile -y <path to symbols>”. The output gets sent to the screen, so you typically pipe it to a file for later examination.
MARK RUSSINOVICH: Another trick you might find useful is to download the OEM Support Tools from Microsoft’s Web site. They have a tool there called the “crash analyzer.” It goes one step further than Dumpexam to use heuristics when examining the dump file and point out suspicious drivers that might have performed illegal operations before the crash.
MHLEWISJR: That’s a great tip. I will use it tomorrow!
Avoiding a reimage using RC
SPIDER1: Recently I installed an Iomega Zip drive on a Win2k system, and it gave me the BSOD. I was unable to boot into any mode. My answer was to grab the image CD and reimage the system. I then researched the driver I was supplied and found a newer one, and installing that one worked. What could I have done to avoid the reimage using this Recovery Console?
MARK RUSSINOVICH: If you knew the name of the driver or if it had included a description of itself in its Registry information, then you could have used the RC to disable it, get the system to boot, and then install a newer version.
SPIDER1: So a little research on the filenames of drivers would be a good thing to do before I install them I guess.
MARK RUSSINOVICH: Most drivers these days include a description that the RC and ERD Commander 2000 print out when you list the drivers installed on the system, e.g. “ATAPI—The IDE hard disk port/miniport driver.”
MARTINCHURCH: I found with Iomega you could get around the problem by turning off the device then booting up.
MARK RUSSINOVICH: Ahh, trick the driver!
SPIDER1: Thanks. I really need to “bone” up on this. You’ve been a big help. I’m getting tired of grabbing image CDs to resolve every problem with BSODs
MARK RUSSINOVICH: Glad you found the information useful. Of course, things you should try before falling back on the RC are: 1. Try booting to “last known good.” 2. Try booting into Safe mode. 3. Go to the RC.
Installing RC in a Pinch
DANIEL.FORTIN: If the RC is not installed on a crashed system, is there a way to install it to recover?
MARK RUSSINOVICH: You can boot the RC directly from the Win2K CD. Alternatively, you can create a bootable ERD Commander CD ROM (which gives you the opportunity to include installation or driver files specific to your environment). Of course, everyone should keep up-to-date backups with a backup program that supports disaster recovery (restoring to a system that won’t boot), just in case a crash causes data loss.
Another tip (and self-plug): if you have an unbootable system that you have important data on and you want to salvage it to another system before a reinstall, check out Winternals NTRecover (salvage across a serial line) and Remote Recover (salvage across a TCP/IP network).
Resources
MARK RUSSINOVICH: Closing pointers: check out my past Windows 2000 Magazine series on Win2K reliability enhancements, and look for an upcoming article on NT/Win2K crash-dump analysis.
SPIDER1: I’ll be checking that out. It’s a great magazine. I also have the archive CD.
MARTINCHURCH: When will these articles be coming out?
MARK RUSSINOVICH: The reliability articles came out about a year ago. The crash-dump article will be out in the December issue.
MARTINCHURCH: I will look forward to it.
SPIDER1: I’ll put in an unsolicited plug for the Win2k magazine and the Archive CD: it is a great combination and an excellent resource.
DANIEL.FORTIN: Is there a place where we can find reports on such problems with drivers and get solutions or advice?
MARK RUSSINOVICH: Whenever you get a crash, do a search in the Microsoft KB on the crash code you see. It often has articles reporting bugging drivers with pointers to vendor updates.
TechProGuild Information
MODERATOR: I’m afraid we have to wrap things up now. I’d like to thank our speaker, Mark Russinovich. I’m going to give Mark another shameless author plug: You can get his book Windows NT Internals Revealed at http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0764580329
And tonight’s winner is…It was close; we had so many new faces. But, Spider1, you’re tonight’s winner! That means if you win the most meetings this month you’ll walk away with that grand prize motherboard and CPU courtesy of pogolinux.com. Be sure and send your contact info, name, address, and phone number to jharvey@techrepublic.com.
DANIEL.FORTIN: Where can we find the stats about current month finalists and scores?
MODERATOR: Sign up for the Guild Note. It comes out on Tuesdays and Thursdays with updates on articles and Guild Meetings. Check under my account, daniel.fortin. I think you can sign up there.
SPIDER1: Is there a way to save the transcript of these sessions?
MODERATOR: Yes, Spider1. We post the transcripts, edited down of course, about a week after each meeting.
SPIDER1: Cool, this is my first trip in here, I just signed up today.
MODERATOR: Welcome. If you send me your e-mail address, I’ll notify you when the transcript goes up. Send it to jharvey@techrepublic.com.
SPIDER1: Thanks.
MARK RUSSINOVICH: Thanks, everyone. Congratulations, Spider1!
SPIDER1: Cool. Thanks a lot!
HEADRAT: Thanks for all the great tips.
MODERATOR: Mark, I can’t thank you enough for speaking tonight. I’ll hope you’ll join us again soon.
MARK RUSSINOVICH: My pleasure.
MODERATOR: Thanks everyone. Good night.