Microsoft

Using Volume Shadow Copy to quickly recover Active Directory

Recovering a corrupt Active Directory can be a nightmare. Rather than fighting with potentially corrupt backup tapes, use Volume Shadow Copy, a potential time- and career-saving option in Windows Server 2003.

Have you ever thought about what you would do if the data on one of your servers became corrupted? If you're like most people, your first response to that question would be that you would restore a backup.

But what if the corruption occurred on a domain controller? Technically speaking, you could still fix the problem by restoring a backup. As someone who has lived through the nightmare of a corrupt Active Directory, let me tell you that restoring a domain controller from backup is no fun at all. Rather than fighting with potentially corrupt tapes, try using a potential time- and career-saving option in Windows Server 2003: Volume Shadow Copy.

What happens during backups

When you back up a domain controller, the Active Directory database gets backed up as part of the system state. The only problem is that most of the organizations that I've seen back up the system state as part of a full backup. Unfortunately, those organizations tend to only perform a full backup once a week. That means that if they ever had to perform an Active Directory restoration, they could be restoring a copy of the Active Directory that is up to a week old.

Now, before you send me an e-mail, let me just say that, yes, I do know that this is where the other domain controllers come into play. Normally, when you restore the Active Directory database onto a domain controller, the newly restored domain controller will begin synchronizing itself with the other domain controllers. This allows Active Directory entries that have been created or modified since the time the recently restored backup was made to be copied to the domain controller.

The resynchronization process works great if the domain controller in question has failed due to a hardware problem. The resynchronization process tends not to work so well, however, if the failure was due to corruption within the Active Directory database. Think about it for a moment. The Active Directory is designed so that if any changes occur within the database, the changes will be automatically copied to the other domain controllers. This means that if certain types of corruption occur within an Active Directory database, the corrupted information could potentially be replicated to the other domain controllers, corrupting them as well.

This brings us back to the question of how you would fix the problem. If your most recent backup of the system state is a week old, you could restore it, but the newly restored database would soon be overwritten by the corrupted databases found on other domain controllers. Windows sees the corrupted objects as being more current and therefore replaces the newly restored database objects with the corrupted ones, because it believes that the objects being overwritten are out of date.

You could get around this problem by performing an authoritative restore. Doing so tells Windows that the copy of the Active Directory you are restoring should be considered the most current, and that copies of the Active Directory on all other domain controllers should be replaced by this version. Performing an authoritative restore would get rid of the corruption. The problem is that if the backup you're restoring is a week old, you are going to lose a week's worth of changes to the Active Directory.

There are other problems with using this recovery method as well. Imagine that a corrupted Active Directory database has managed to propagate to the other domain controllers in your organization. If this has happened, it's likely that no one (including you) will be able to log in to the network. This being the case, you can be sure that your phone will be ringing off the hook because of all of the people who want to know when the network will be back up. You can also be sure that you'll be getting a lot of pressure from your boss to get the problem fixed quickly because he's getting a lot of pressure from his boss.

The problem is that a full-blown authoritative restore can take a long time to complete, especially if the backup is stored on tape. Even after the restore completes, the resynchronization is going to require additional time to complete, and even then things may not work 100 percent correctly because a week's worth of updates are gone. The point is that a corrupt Active Directory is a major problem that takes a long time to fix using traditional methods.

Windows Server 2003 to the rescue

Hopefully by now you're starting to understand what a nightmare Active Directory corruption can be. If the above situation happened in real life, you would probably look pretty bad for having the network down all afternoon. What if you could perform a full restoration within just a few minutes with very little if any lost Active Directory updates? Windows Server 2003 makes this possible.

The backup program that comes with Windows Server 2003 really isn't that different from the one that comes with Windows 2000, and if you try to perform a traditional authoritative restore, you would get the results that I discussed earlier. What if you didn't have to rely on the backup program though?

The reason why it is possible to perform nearly instantaneous Active Directory restorations is because you can exploit two of the services that come with Windows Server 2003, and basically use them for a purpose for which they were never intended. The two services are the Volume Shadow Copy Service (VSS) and the Virtual Disk Service (VDS).

The Volume Shadow Copy Service


Just in case you aren't familiar with these two services, let me give you a little background information. The VSS was originally designed to help users be able to restore files without having to call you. The idea is that once VSS is enabled, snapshots of the user's files are made at various times during the day.

If users need to revert to a previous version of a file, they can easily restore one of these snapshots without calling you. The snapshots are normally stored on the same volume as the original files, so they don't offer protection against volume-wide corruption or against a hardware failure, but they do make an administrator's life easier by allowing users to restore their own files if a critical failure has not occurred.

VSS has one other characteristic that comes in very handy for restoring Active Directory. VSS can make snapshots of open files. Normally, when Windows Backup encounters an open file, it will just skip the file, but VSS will actually back the file up. This is very important since the Active Directory is almost constantly open.

The Virtual Disk Service

The other service that plays a part in this functionality is the VDS. The VDS is an interface for managing virtual storage. Basically, this service allows you to control various storage systems in a common manner regardless of the underlying hardware. For example, Windows could be configured to treat a drive on a NAS (Network Attached Storage) Server in the same way that it would treat a local drive.

A crash course in Storage Area Networks

So far, I've given you a brief description of the VSS and VDS services. But there's one more element that must be in place before you can make the fast Active Directory recovery work. You must have a Storage Area Network (SAN) in place. In case you aren't familiar with SANs, the basic idea is that storage devices do not have to be associated with a specific server. Instead, it's possible to have a server-independent array of disks that can be accessed by multiple servers.

Like all storage devices, however, a SAN array does have some limitations. Even though the SAN array is shared, it doesn't mean that multiple servers can write to the array simultaneously. Doing so would corrupt the data stored within the array. To prevent this problem, the array is segmented into various LUNs. A LUN is a logical unit of storage. Each LUN is then associated with a specific server.

One other concept that you need to understand is that this assignment is logical, not physical. This simple fact means that a LUN's assignment can be changed. It's possible for a server to write data to a LUN, and for an administrator to then associate the LUN with a different server so that the data that was written by one server now appears to reside on a different server. There is no direct data transport system available that will allow you to move data from one server to another through a LUN, but even though this technique is cumbersome, it does get the job done.

Putting it all together

Now let's take a look at how all of the various pieces that I've discussed can be put together for the sake of a fast Active Directory recovery. The first step in the process is to use the VSS to create a snapshot of the system state information (which includes the Active Directory). In this process, VSS acts as a coordinator and notifies each of the system state components to prepare for shadow copy creation.

Each component of the system state has its own writer, which is responsible for writing system state information to a backup. In this particular case, each individual system state component prepares itself to be backed up. Once the preparations are complete, each component's writer will notify VSS that it is ready to be backed up. Since VSS is working as a coordinator, it then notifies the backup application that the system state is ready to be backed up. The backup requestor then halts the Active Directory for a few seconds. This gives VSS just enough time to make a quick snapshot of the Active Directory databases.

Once the snapshot is complete, the Active Directory service is resumed and everything continues to function normally. As you can see, Shadow Copy now has a snapshot of the Active Directory, but notice that it only took a few seconds to make the snapshot backup, as opposed to the long duration required for backing up the Active Directory to tape. Also notice that the snapshot could have easily been created during the middle of the day since creating a snapshot is nondisruptive and requires a minimal amount of server resources.

Now here's where things get interesting. Normally, when you create a shadow copy backup, the data is backed up to a special folder residing on the same volume as the original data. Just because that's the default location, though, you aren't locked into using it. By using third-party software, it's possible to create a shadow copy on a separate volume. A good example of such an application is CommVault Shadow Explorer.

By using an application such as Shadow Explorer, you can configure Windows to place the shadow copy of the Active Directory on a LUN rather than on a local drive. Once the shadow copy has been created, you can break the connection between the server and the shadow copy. In doing so, the shadow copy becomes an isolated, read-only backup of the system state.

Now, suppose a critical Active Directory failure occurs, and it becomes necessary to restore the shadow copy data. To get the server back online, you must simply dismount the failed volume, mount the LUN containing the shadow copy backup, unmask the shadow copy backup, switch the data from read-only to read/write, and reboot the server.

The hardware

In my explanation, I've talked a lot about making a shadow copy backup to a LUN that is based on a SAN. You may be wondering how it's even possible to boot a server properly if the Active Directory information exists on a LUN.

As I explained earlier, you're not technically backing up the Active Directory. You're backing up the system state, of which the Active Directory is one component. Other components include things like the system registry and any registered COM+ objects. Such objects are required for booting the system.

Normally, Windows boots off of a local hard drive. However, the only way to make the concepts that I've discussed in this article work properly is to set Windows up so that it boots off of a LUN rather than off of a local drive. When you make a shadow copy, the shadow copy is being written to a separate LUN within the same SAN. The idea is that if a failure were to occur within your primary LUN, you can just take it offline, point the server to the backup LUN, and be back online very quickly.

If a LUN contains a backup of an Active Directory and the LUN can be assigned to any available server, then it might seem as though you could use a single shadow copy to restore any failed domain controller. But this isn't the case.

While it's true that each domain controller contains nearly identical copies of the Active Directory database, you must remember that the Active Directory can't be restored as an individual component. You can only restore the system state as a whole. Therefore, if you were to try to restore a domain controller with system state information taken from a different domain controller, you would be restoring the registry, COM+ objects, etc., from the wrong server.

Even though you can't use LUN transport to restore an alternate domain controller, it does have its place. If an Active Directory failure was hardware-related, the LUN transport method that I've described could be used to disassociate the LUN with the failed hardware. You could then replace the failed server with an identical replacement, associate the LUN with it, and be quickly back online.

Editor's Picks