Have you ever thought about what you would do if the data on
one of your servers became corrupted? If you’re like most people, your first
response to that question would be that you would restore a backup.

But what if the
corruption occurred on a domain controller? Technically speaking, you
could still fix the problem by restoring a backup. As someone who has lived
through the nightmare of a corrupt Active Directory, let me tell you that
restoring a domain controller from backup is no fun at all. Rather than
fighting with potentially corrupt tapes, try using a potential
time- and career-saving option in Windows Server 2003: Volume Shadow Copy.

What happens during backups

When you back up a domain controller, the Active Directory
database gets backed up as part of the system state. The only problem is that
most of the organizations that I’ve seen back up the system state as part
of a full backup. Unfortunately, those organizations tend to only perform a
full backup once a week. That means that if they ever had to perform an Active
Directory restoration, they could be restoring a copy of the Active Directory
that is up to a week old.

Now, before you send me an e-mail, let me just say that, yes,
I do know that this is where the other domain controllers come into play.
Normally, when you restore the Active Directory database onto a domain
controller, the newly restored domain controller will begin synchronizing
itself with the other domain controllers. This allows Active Directory entries
that have been created or modified since the time the recently restored
backup was made to be copied to the domain controller.

The resynchronization process works great if the domain
controller in question has failed due to a hardware problem. The
resynchronization process tends not to work so well, however, if the failure was
due to corruption within the Active Directory database. Think about it for a
moment. The Active Directory is designed so that if any changes occur within
the database, the changes will be automatically copied to the other domain
controllers. This means that if certain types of corruption occur within an
Active Directory database, the corrupted information could potentially be
replicated to the other domain controllers, corrupting them as well.

This brings us back to the question of how you would fix the
problem. If your most recent backup of the system state is a week old, you
could restore it, but the newly restored database would soon be overwritten by
the corrupted databases found on other domain controllers. Windows sees the
corrupted objects as being more current and therefore replaces the newly
restored database objects with the corrupted ones, because it believes that the
objects being overwritten are out of date.

You could get around this problem by performing an
authoritative restore. Doing so tells Windows that the copy of the Active
Directory you are restoring should be considered the most current, and that
copies of the Active Directory on all other domain controllers should be replaced
by this version. Performing an authoritative restore would get rid of the
corruption. The problem is that if the backup you’re restoring is a week old,
you are going to lose a week’s worth of changes to the Active Directory.

There are other problems with using this recovery method as
well. Imagine that a corrupted Active Directory database has managed to propagate
to the other domain controllers in your organization. If this has happened,
it’s likely that no one (including you) will be able to log in to the network.
This being the case, you can be sure that your phone will be ringing off the
hook because of all of the people who want to know when the network will be
back up. You can also be sure that you’ll be getting a lot of pressure from
your boss to get the problem fixed quickly because he’s getting a lot of
pressure from his boss.

The problem is that a full-blown authoritative restore can
take a long time to complete, especially if the backup is stored on tape. Even
after the restore completes, the resynchronization is going to require
additional time to complete, and even then things may not work 100 percent
correctly because a week’s worth of updates are gone. The point is that a
corrupt Active Directory is a major problem that takes a long time to fix using
traditional methods.

Windows Server 2003 to the rescue

Hopefully by now you’re starting to understand what a
nightmare Active Directory corruption can be. If the above situation happened
in real life, you would probably look pretty bad for having the network down
all afternoon. What if you could perform a full restoration within just a few
minutes with very little if any lost Active Directory updates? Windows
Server 2003 makes this possible.

The backup program that comes with Windows Server 2003
really isn’t that different from the one that comes with Windows 2000, and if
you try to perform a traditional authoritative restore, you would get the
results that I discussed earlier. What if you didn’t have to rely on the backup
program though?

The reason why it is possible to perform nearly
instantaneous Active Directory restorations is because you can exploit two of
the services that come with Windows Server 2003, and basically use them for a
purpose for which they were never intended. The two services are the Volume
Shadow Copy Service (VSS) and the Virtual Disk Service (VDS).

The Volume Shadow Copy Service

Just in case you aren’t familiar with these two services,
let me give you a little background information. The VSS was originally
designed to help users be able to restore files without having to call you. The
idea is that once VSS is enabled, snapshots of the user’s files are made at
various times during the day.

If users need to revert to a previous version of a file, they can easily restore one of these snapshots without calling you. The snapshots
are normally stored on the same volume as the original files, so they don’t offer
protection against volume-wide corruption or against a hardware failure, but
they do make an administrator’s life easier by allowing users to restore their
own files if a critical failure has not occurred.

VSS has one other characteristic that comes in very handy
for restoring Active Directory. VSS can make snapshots of open files. Normally,
when Windows Backup encounters an open file, it will just skip the file, but
VSS will actually back the file up. This is very important since the Active
Directory is almost constantly open.

The
Virtual Disk Service

The other service that plays a part in this functionality is the
VDS. The VDS is an interface for managing virtual storage. Basically, this
service allows you to control various storage systems in a common manner
regardless of the underlying hardware. For example, Windows could be configured
to treat a drive on a NAS (Network Attached Storage) Server in the same way
that it would treat a local drive.

A crash course in Storage Area Networks

So far, I’ve given you a brief description of the VSS and
VDS services. But there’s one more element that must be in place before you can
make the fast Active Directory recovery work. You must have a Storage
Area Network (SAN) in place. In case you aren’t familiar with SANs, the basic
idea is that storage devices do not have to be associated with a specific
server. Instead, it’s possible to have a server-independent array of disks
that can be accessed by multiple servers.

Like all storage devices, however, a SAN array does have some
limitations. Even though the SAN array is shared, it doesn’t mean that multiple
servers can write to the array simultaneously. Doing so would corrupt the data
stored within the array. To prevent this problem, the array is segmented into
various LUNs. A LUN is a logical unit of storage. Each LUN is then associated
with a specific server.

One other concept that you need to understand is that
this assignment is logical, not physical. This simple fact means that a LUN’s
assignment can be changed. It’s possible for a server to write data to a LUN,
and for an administrator to then associate the LUN with a different server so
that the data that was written by one server now appears to reside on a
different server. There is no direct data transport system available that will
allow you to move data from one server to another through a LUN, but even
though this technique is cumbersome, it does get the job done.

Putting it all together

Now let’s take a look at how all of the various pieces that
I’ve discussed can be put together for the sake of a fast Active Directory
recovery. The first step in the process is to use the VSS to create a snapshot
of the system state information (which includes the Active Directory). In this
process, VSS acts as a coordinator and notifies each of the system state
components to prepare for shadow copy creation.

Each component of the system state has its own writer, which
is responsible for writing system state information to a backup. In this
particular case, each individual system state component prepares itself to be
backed up. Once the preparations are complete, each component’s writer will
notify VSS that it is ready to be backed up. Since VSS is working as a
coordinator, it then notifies the backup application that the system state is
ready to be backed up. The backup requestor then halts the Active Directory for
a few seconds. This gives VSS just enough time to make a quick snapshot of the
Active Directory databases.

Once the snapshot is complete, the Active Directory service
is resumed and everything continues to function normally. As you can see,
Shadow Copy now has a snapshot of the Active Directory, but notice that it only
took a few seconds to make the snapshot backup, as opposed to the long duration
required for backing up the Active Directory to tape. Also notice that the snapshot
could have easily been created during the middle of the day since creating a
snapshot is nondisruptive and requires a minimal amount of server resources.

Now here’s where things get interesting. Normally, when you
create a shadow copy backup, the data is backed up to a special folder residing
on the same volume as the original data. Just because that’s the default
location, though, you aren’t locked into using it. By using third-party
software, it’s possible to create a shadow copy on a separate volume. A good
example of such an application is CommVault Shadow
Explorer
.

By using an application such as Shadow Explorer, you can
configure Windows to place the shadow copy of the Active Directory on a LUN
rather than on a local drive. Once the shadow copy has been created, you can
break the connection between the server and the shadow copy. In doing so, the
shadow copy becomes an isolated, read-only backup of the system state.

Now, suppose a critical Active Directory failure occurs,
and it becomes necessary to restore the shadow copy data. To get the server
back online, you must simply dismount the failed volume, mount the LUN
containing the shadow copy backup, unmask the shadow copy backup, switch the
data from read-only to read/write, and reboot the server.

The hardware

In my explanation, I’ve talked a lot about making a shadow
copy backup to a LUN that is based on a SAN. You may be wondering how it’s
even possible to boot a server properly if the Active Directory information
exists on a LUN.

As I explained earlier, you’re not technically backing up
the Active Directory. You’re backing up the system state, of which the Active
Directory is one component. Other components include things like the system
registry and any registered COM+ objects. Such objects are required for booting
the system.

Normally, Windows boots off of a local hard drive. However,
the only way to make the concepts that I’ve discussed in this article work
properly is to set Windows up so that it boots off of a LUN rather than off of
a local drive. When you make a shadow copy, the shadow copy is being written to
a separate LUN within the same SAN. The idea is that if a failure were to occur
within your primary LUN, you can just take it offline, point the server to
the backup LUN, and be back online very quickly.

If a LUN contains a backup of an Active Directory and the
LUN can be assigned to any available server, then it might seem as though you
could use a single shadow copy to restore any failed domain controller. But this
isn’t the case.

While it’s true that each domain controller contains nearly
identical copies of the Active Directory database, you must remember that the
Active Directory can’t be restored as an individual component. You can only
restore the system state as a whole. Therefore, if you were to try to restore a
domain controller with system state information taken from a different domain
controller, you would be restoring the registry, COM+ objects, etc., from the
wrong server.

Even though you can’t use LUN transport to restore an
alternate domain controller, it does have its place. If an Active Directory
failure was hardware-related, the LUN transport method that I’ve described
could be used to disassociate the LUN with the failed hardware. You could then
replace the failed server with an identical replacement, associate the LUN with
it, and be quickly back online.