Time may heal all wounds, but corrupted Microsoft Exchange databases require time, a mirrored server, and a persistent network administrator. That’s what we found out when we revisited Peter, an Exchange administrator who recently suffered through a power outage and multiple system failures that crashed his Exchange server and corrupted his databases. Peter’s travails are documented in the article “Corrupted Exchange databases play havoc with admin.”
From that experience, Peter learned that two successive power outages can drain an uninterruptible power supply (UPS) to a point where it fails to give an Exchange server time to properly shut down. He also discovered that his backup program wasn’t giving him continuous transaction logs during a full backup. As a result, when a tape with a full backup on it proved faulty, the break in transaction logs meant that he couldn't get a full restore.
Now Peter does a full backup on his Exchange databases each night and is looking for a longer-lasting UPS. He also is developing a script to work with the UPS during electrical failures to begin a proper shutdown of his Exchange server.
Meanwhile, the issue with the corrupted databases does have a happy ending. In this week’s From the Trenches, we'll see how Peter was able to recover all the lost data by:
- Creating an identical Exchange environment on a new server.
- Running a destructive data-recovery utility and then Ex-Merge.
- Transferring the recovered database files back to his live Exchange environment and allowing users to request lost messages.
Reconstructing the scene
When initially looking for a solution for a corrupted database problem, Peter went to Microsoft's Knowledge Base and found the article "XADM: How to Recover from Information Store Corruption." This article describes in detail what a network administrator should do when faced with corrupted Exchange databases.
Several Exchange utilities can invoke the recovery process from the command line in Windows. But if those utilities don't fix the databases, the article cautioned that the next step involved destructive techniques that shouldn't be tried on a live Exchange server. So when Peter got to this point in his original effort to restore the databases, he called Microsoft and was told that he should attempt to restore his databases from his backup tapes.
Eventually, Peter found out that he didn’t have that option because one of his full backups had failed. This left him with a chunk of restored data and then a gap of 10 days between the point when the tape failed and when he repaired his Exchange server. The Exchange server crashed on a Friday morning and his full backups were done on Sundays.
The recovery begins
To recover the lost data, Peter needed to come up with an offline disaster recovery server. Essentially, he had to create an Exchange server that was identical to his production server at the time the power went out. Fortunately, he had another server, meant to be used for something else, that he could use for his disaster recovery server. He built a Windows NT 4.0 server as a backup domain controller with this new machine and gave it a different name from his live production Exchange server.
The next step was to remove the new server from the network, promote it to a primary domain controller, and change its name to the name of the Exchange server that failed.
Peter then installed Exchange 5.5 on the server and applied all the service packs up to SP4, so that he had the basis of an Exchange server just like the one that crashed. To complete the reconstruction project, he copied the corrupted databases into the same directory they were in on the production Exchange server. Before removing the new server from the network, Peter had copied the databases to a hold directory on the new machine. He was now ready to continue with the steps outlined in the Q article.
Going where no Exchange admin should have to go
According to the article, the next steps Peter was to perform on the corrupted Exchange databases would require a “hard or forcible state recovery command” that should be used only as a last resort. The procedure involved two commands run in sequence on each of the two main Exchange databases: Pub.edb and Priv.edb.
Peter ran the first command and then the second on each database. The first command forces the recovery. The second forces a set of 23 tests to be performed on each of the databases. If the tests fail, both commands are repeated in sequence.
The first two times Peter ran the commands, the tests failed after about the ninth test. He would get a JET error that, according to another Q article, indicated a database inconsistency.
On the third try, the databases made it through all 23 tests, but there were numerous error messages displayed at the end.
The recovery process requires a network administrator to run each process three times if errors are reported, and if the error list remains the same on all three tries, he or she can proceed to the next step. The Microsoft documentation says that the errors may mean an attachment is corrupted, but that will not prevent the information store from starting. That is what happened to Peter.
Because he didn’t copy the directory store (Dir.edb) from the old Exchange server when he built the new server, his next step was to run the Exchange Server Consistency Adjuster. The Consistency Adjuster updates the new directory store with all of the files it can from the databases that have just undergone testing. Peter had recovered everything that could be recovered at this point.
Connecting users with their recovered data
Peter now had all of the files he could recover on his new Exchange server. While this was satisfying on an admin level, users still didn't have the mail they lost. So he turned to Microsoft’s well-documented Ex-Merge utility to help get the lost e-mail back to the end users. The Q article Peter had been following offered a link to the Ex-Merge and explained that it's named Iloveyouhlpi.zip because it was originally used in response to the Love Bug virus.
The Ex-Merge utility exports the data from the individual mailboxes on the new Exchange server into .pst files, which can then be imported into the live production server.
“You can’t put your [recovered] server back onto the network because of its name and because it’s a PDC,” Peter said.
The easiest way to get around the problem is to burn the .pst files with the recovered mailboxes into CDs or save them to a backup tape. Another possibility is to swap hard drives in a RAID array, but this could involve forcing a reboot of the Exchange server.
Peter connected both his Exchange servers, running on Compaq 380s, to a Compaq 4100 through fiber disk array cards. The 4100 had 12 hard drives that he split evenly between the two machines, giving each machine rights to see only six of the drives.
Next, he exported the .pst files to the new server’s set of drives on the 4100 and then removed the new server from the 4100. Finally, he gave the production Exchange server rights to the drives containing the .pst files.
“It’s the fastest way we could have done it,” he said.
Peter then sent a message out to all of his end users on the network telling them they could request to have their messages for those 10 lost days restored to their mailboxes. In the first week, only about 25 percent of the users requested the restoration. Nevertheless, the restore was ultimately successful.
Have you had to recover lost Exchange databases?
If you've ever had to recover a corrupted Exchange database, were you successful? If it didn’t work for you, were you able to determine the reason? Is there an easier process than that described by Microsoft? Share your experiences in the discussion below.