When a computer crashes, it's rare that you have to make only a simple fix to get things up and running again. Here's an anecdote from Brien Posey about the trials and tribulations he went through when recovering from a particularly nasty Windows failure.
While it's true that many companies have very intricate disaster recovery plans, I've seen more that are not adequately prepared for disaster. Many organizations using RAID arrays and redundant hardware have a false sense of security. These types of hardware work great for keeping the server online if a hard drive goes out. But if a disk controller fails, every drive in the array could become corrupted.
If that happened, you could always restore from a backup, but there's a problem with that too. Far too many administrators back up a server's data, but not the Windows operating system. The idea is that if a failure occurred, the administrator could just reload Windows, restore the data, and be back in business. There are several problems with this approach, though. As with most things in the IT world, it's not always that easy.
Why can't I just reinstall Windows and then restore?
The main problem with this technique is the amount of time it takes to reload Windows and reconfigure the server. It's way faster to restore a full system backup than it is to reload Windows, reinstall the antivirus software, reconfigure the network connections, reapply the service packs and hot fixes, and reload the various server applications.
A real-life tale
A couple of weeks ago, I got a frantic phone call from a friend who owns a small construction company. Her hard drive had crashed so badly that Windows wouldn't boot even in safe mode. She was calling to see if I could help her get the server back online. I told her that I would come right over. The last thing she said before we hung up was to "not worry" because she had a backup.
When I arrived at her office, I found three things that made what should have been an easy fix fairly difficult. First, she had backups only of the data, not of the operating system. Second, she didn't have a clue where her Windows installation CD was located, not to mention any of the other application installation CDs. Thirdï¿?and this is the biggieï¿?she was using the server as her personal workstation.
Any competent IT professional knows that you should never use a server as a personal workstation. However, many small business owners aren't IT professionals, and they run with very tight budgets. Sometimes it's just more economically feasible to use a server console as a workstation than to drop big bucks on another computer. I myself have done this, and I've visited dozens of other companies where this is common practice. However, I certainly don't condone using a production server as your own personal workstation.
The more I spoke to my friend, the more I realized how minimal her backups actually were. Her backups basically included the data files from Quicken and Turbo Tax, a few Word documents, and Excel spreadsheets. That was the extent of the backup, though. Since she was using the server as her personal workstation, her e-mail, personal documents, and contacts list were all on the server and had not been backed up.
If this had been some huge company, losing one person's e-mail, documents, and contacts probably wouldn't be a major deal. Losing the documents could even be blamed on the user for not saving them to a location that gets backed up each night.
The problem here was that my friend and her husband actually owned the business and had only a couple of employees. Losing her data would have been catastrophic to the business. The sad part was that she had paid some "IT professional" to set everything up and show her how everything worked. The guy never told her anything about making normal backups. He just told her to make sure that she saved a copy of any work onto a Zip disk.
I felt really bad for her, and I knew that I had to do everything in my power to try to salvage as much of her data as possible. I had hoped that I could fix the problem by reinstalling Windows. Sure, my friend couldn't find her Windows CD, but her product key was on a sticker on the side of the server. I had a Windows CD of my own, so it seemed like the thing to do.
When I attempted to reinstall Windows, however, the Setup program didn't even recognize the existing version of Windows as a valid installation that could be repaired. A simple reinstallation was going to be out of the question. In retrospect, even trying to reinstall Windows was probably a bad idea since I risked further corrupting the hard drive by writing the Windows installation files to it.
Because I couldn't reinstall Windows, I decided to go to the computer store, get a new hard drive, and try to transfer her files to it. When corruption of that extent occurs, I always feel it's better to spend a little cash on a new drive than to risk reloading everything onto a potentially unreliable drive.
On the way back from the computer store, we stopped by my house and picked up one of my computers to assist us in the recovery. I removed the corrupted hard drive from my friend's computer and set it up as a slave drive in my computer. I had intended to use NTBackup to back up all of the files on her hard drive, then save the backup to the known good hard drive in my machine. The problem was that NTBackup couldn't successfully create a backup because the volume I was backing up was corrupted.
As I was trying to decide what to do next, I noticed that the new hard drive came with its own backup program. I installed the backup program onto my server and tried the backup again. The program skipped any invalid files rather than crashing; this enabled me to back up the corrupted hard drive. Sure, some files were skipped, but I knew that these files were so badly damaged that I didn't have any hope of salvaging them anyway.
At that point, I was still hoping that I could somehow repair Windows, and that perhaps only a few key system files were damaged. I didn't trust the boot sector and MBR from the old hard drive, so I wanted to refresh them. To accomplish this, I installed the new hard drive into my friend's server and installed a clean copy of Windows onto it. When I confirmed that the new copy of Windows was functional, I made a copy of all the Windows system files. I then restored the backup that I had made earlier to the new hard drive. Finally, I overwrote the Windows system files with the known good files.
Since I started with a fresh copy of Windows, I knew that the system would be bootable. I restored my friend's copy of Windows over top of the known good copy so that I could restore the registry and any system-specific INI files or DLL files. Since I had selectively copied most of the Windows system files earlier, I used these known good copies to overwrite the existing copies in hopes of replacing some of the damaged files and being able to boot Windows. In this particular case, the technique didnï¿?t work, but it has worked for me a few times in the past.
By this point, I knew there was no way I was going to be able to salvage Windows as a whole. I was going to have to try salvaging individual files instead. I reformatted the new hard drive and installed a fresh copy of Windows. I then created a folder that I called old_server and restored the backup of the corrupted hard drive to it. Now that I had a functional copy of Windows, the trick was to save anything that I could from the old_server folder and make it work with the new Windows installation.
One of the things that made this particular reload interesting was that the machine I was reloading Windows onto was a domain controller. Normally, this would not be a big deal because you could just install Windows and then use DCPROMO to join the domain. In this case, that simply wasn't possible because the server was the network's only domain controller and the only DNS server. This meant that I had to install Windows as the first domain controller in a brand new domain. I set up the domain using the same name as it had previously used. Even so, Windows considered this a completely different domain. The domain's SIDs were different from the originals.
This meant that all user accounts had to be set back up, and all of the workstations had to be rejoined to the domain. The problem is that the workstations already thought they belonged to a domain with that same name. I therefore had to set the workstations to be a member of a workgroup and then rejoin the domain.
Keep in mind that the domain now had a different SID, as did the newly created user accounts. Although a user entered the same username to log in to a domain with the same name as the previous domain, Windows saw the login as a completely different user account.
Windows thought this was the first time the user had logged in and therefore provided the default desktop. Users no longer had access to their e-mail, desktop, icons, favorites, etc., even though they were using their own computer.
Fortunately, the users' documents and settings had not been lost. They were simply in a directory that users didn't have permission to access. To get around this problem, I had to log in to each machine using the local Administrator account. Each user's profile and the settings that it contained were stored in a folder named \Documents and Settings\Username. Normally, a user's folder would be named username.domain. Because Windows had detected two different user accounts with the same name, however, it created a new profile directory for the user named username.domain.000.
It's possible to just delete the username.domain.000 folder and then rename the username.domain folder to username.000, but doing so sometimes causes problems. A better solution is to grant the user access to the username.domain folder and then edit the Windows registry to point the user's profile to that folder.
Back on the server, I had to do something similar for my friend's profile. Luckily, her machine had a new Windows installation, so no \Documents and Settings\Username.domain folder existed. Therefore, Windows Setup created the username.domain folder and linked her profile to it. All I had to do was copy the contents of her old username.domain folder (found in the backup directory) to her new username.domain folder. Although a few of her files were corrupt, the copying process went pretty smoothly.
In case you're wondering, relinking profiles took care of getting everyone their e-mail back. In that particular office, everyone was using Outlook Express. By default, Outlook Express messages are stored in a DBX file in the \Documents and Settings\Username.domain\Local Settings\Application Data folder. Contacts are stored in a WAB file in the same folder. If the office had been using Outlook rather than Outlook Express, I could have recovered the mail in a similar manner. The difference is that Outlook stores messages in a PST file and stores contacts in a PAB file.
Although my friend's Microsoft Office documents and her e-mail and contacts all existed within her Documents And Settings folder, she had other applications running on the server that stored data to other locations. I knew that I couldn't just restore the backup I had made because doing so would trash Windows. Instead, I chose to go through the backup and restore everything but the Windows folder and the Documents And Settings folder. This meant that I was restoring her data directories and her Program Files folder.
While restoring the Program Files folder, I decided not to overwrite anything that currently existed. I knew there was a good chance that some of the files in the folder could be corrupt, and I didn't want to replace a good file with a bad one. My goal was to replace files belonging to her various applications.
Simply restoring application files to the Programs Files folder won't make an application work. Applications almost always write data to the Windows registry. However, I knew that some of her applications were storing configuration data in INI files. I hoped that if I restored her Program Files folder, some of her application-related configuration data might be retained when the programs were reloaded. As luck would have it, the configuration files were retained and the applications were properly configured upon reinstallation.
In the end, I was probably able to save about 75 percent of her data. She had to restore the rest from backup, and all of her applications had to be reinstalled. I knew that the problem could eventually happen again if a good backup plan wasn't put into place. I configured the NTBackup program that comes with Windows Server 2003 so that her entire system would be backed up each night to removable media. As I was showing her how everything worked, she asked me if the new backup would take care of the other three computers in the office.
To prevent a future disaster, I came back the next day and configured Windows so that everyoneï¿?s profile was being loaded from the server. While I was at it, I created a group policy that would redirect everyoneï¿?s My Documents folder to the server. This way, I could make sure that all documents and settings were being backed up nightly.
Learning from others' mistakes
Performing full system backups of your servers is crucial, especially if you're in a small company. If you ever have a massive server crash, and no backup exists or the existing backup is minimal, try some of the tactics above to rebuild the server and salvage your data.