SolutionBase: Preparing for Microsoft's Data Protection Manager 2006

Recovering from a disaster can be a nightmare, especially when you discover that the backup tapes you were relying on are bad. Here's a heads-up on an upcoming product from Microsoft that may help alleviate those headaches in the future: Data Protection Server.

A few months ago, I wrote a TechRepublic article that discussed a forthcoming Microsoft product called Data Protection Manager 2006. At the time that I wrote that article, the product was not even in beta testing yet and most of the article's content was based on information that I had received from Microsoft. A few weeks after writing that article however, I received a beta version of Data Protection Manager. I have experimented with the software extensively since that time, and I want to share my experiences with you. In this article, I will explain what Data Protection Manager is, and what the various requirements are for running it in your organization.

Before I begin

Before I get started, I want to point out that Data Protection Manager 2006 (DPM) is currently in Beta One. As such, there is always the chance that changes could be made to the product before it is eventually released next year. The information in this article is based on my own personal experiences with the beta software.

Choosing your server hardware

Depending on how much data you are backing up, DPM 2006 is a pretty high demand application. In my own organization, I initially decided to use DPM 2006 to back up a network volume that contained approximately 20GB worth of data. This network volume contains data that is primarily static in nature. For example, it contains everything that I have ever written, financial information, digital photos, etc. For the most part, the data is not being updated very often, but I do add additional data to the volume each day.

Given the nature of my data and the fact that my wife Talainia and I are the only users on the network, I decided to use a high end PC as my DPM 2006 server. The machine that I am using has a 3.2 GHz processor, 1GB of RAM, a gigabit network connection, an 80GB hard disk, and two, 250GB SATA hard drives.

Initially, I configured the server to backup my 50GB of data so that replicas are synchronized hourly and backups are made three times a day (8:00 AM, 12:00 PM, and 6:00 PM). I found that my server offered excellent performance under this workload. That being the case, I decided to back up a few other network volumes containing less important data. The server continued to perform well, but disk space has become an issue.

The reason why disk space has become a bit of an issue for me has to do with the way that DPM 2006 uses disk space. Before I go into that though, let me just say that DPM 2006 goes to great lengths to conserve disk space. As you might know, one of the nice things about DPM is that it is designed to store multiple versions of a file. If for example I made five different changes to a file over the course of a week, I could revert to any of those particular versions by simply telling DPM to restore the file from the appropriate time period.

Of course having five different versions of the same file online can consume a lot of disk space, but as you will recall earlier I said that DPM 2006 does a lot to conserve disk space. One of the ways that DPM conserves disk space is through incremental file updates.

Suppose for a moment that I have a 100 MB file and that I make a 1 byte change to the file. Most backup applications would notice that the file has changed and backup the entire file during the next backup cycle. However, assuming that DPM already has the original file backed up, it would only backup the byte that has changed, not the entire file. In doing so, DPM 2006 is able to use 1 byte of disk space to store the new version of the file rather than consuming 100 MB of extra space.

Since DPM 2006 uses incremental updates, you are probably wondering how in the world my server is running short on disk space. It has to do with the way that DPM allocates space. As you will recall, earlier I mentioned that my server has an 80GB hard drive and two, 250GB SATA drives. The 80GB hard drive (drive 0) contains the Windows operating system, the DPM 2006 application, and all of the other essentials (I will talk more about this later on). These files are consuming 8GB of space on this drive. Obviously, this means that there is a lot of free space on the system drive. Technically, I probably could put some of that space to good use, but it is generally considered to be bad practice to store data on a server's operating system partition.

The server's second drive is used primarily to store the DPM 2006 index files. The index files are made up of a database and a log file. Together these files eat up about a gigabyte of disk space. Obviously, this leaves a massive amount of free space on the drive, but one of DPM 2006's restrictions is that the index files must be placed on a separate volume from the actual data.

Obviously, I don't want to waste a 250GB drive on 1GB worth of data, so I use the drive for something else. DPM 2006 is not Exchange aware. Since that is the case, I use the NTBACKUP program to backup my Exchange Server to the empty space on Drive 1. I still back up my Exchange Server to tape, but I keep a week's worth of backups stored on this drive. That way I've got two backups. If my house burns down, I would dig the tape out of the fireproof vault and use it to restore my Exchange Server. If my server just crashes though, it would be a lot quicker and easier to restore from the backup stored on my DPM server.

As you've probably already figured out, Disk 2 is where all of the data is located. DPM creates special volumes on the disk for data storage. The way that DPM works, there are three types of data that must be stored for each protected volume; replicas, shadow copies, and logs. Replicas are nothing more than replicas of the data on the volume that you are protecting. You can synchronize replicas with your data either just before a shadow copy is taken or hourly. On my server, replicas are taken hourly. This means that if my file server crashed, I would never lose more than an hour's worth of data.

Shadow copies are taken of the replicas on a periodic basis. As you may recall, earlier I mentioned that my DPM 2006 server created three backups a day (8:00 AM, 12:00 PM, and 600 PM). These three backups are actually shadow copy operations. The shadow copies are taking snapshots of the current replica and making the data contained within the snapshot eligible for restoration.

The logs are simply a mechanism that DPM 2006 uses to keep track of what has and hasn't been backed up. The logs are stored separately from all of the other data. Logs are actually stored on the volume that is being protected rather than on the DPM 2006 server's disks.

DPM 2006 creates two separate partitions on its data drive for each volume that is being protected. One partition is used for replicas and one is used for shadow copies. If you look at Disk 2 in Figure A, you can see that DPM 2006 has created eight partitions on my data disk; two for each of the four volumes that I am protecting.

Figure A

DPM 2006 creates two volumes for every volume that is being protected.

If you look at the top portion of Figure A, you will notice that some of my disk volumes don't have much free space left. The reason why so much disk space has been consumed is because of the number of shadow copies that DPM 2006 keeps on file. I mentioned that my DPM 2006 Server was creating three shadow copies a day, but the server is also retaining shadow copies from multiple days. If you look at Figure B, you will see that the selected volume has 21 days worth of shadow copies available. There are actually 64 different shadow copies for that volume that I could restore data from.

This is great, because the volume that's selected in Figure B is the volume where most of my data resides. I love knowing that I can go back in time as far as 21 days if I need to restore a file. However, keep in mind that I also mentioned that I was adding data to this volume on a daily basis. As a matter of fact, guess which volume this article and the corresponding screen shots are being saved to.

Figure B

There are 21 days' worth of shadow copies available for the selected volume.

The reason why I mention this is to point out the fact that the amount of data stored on the protected volume is constantly growing and there is a finite amount of disk space allocated for protecting the volume. Something's eventually got to give.

In case you are wondering, the server has not actually reached a point in which disk space is a problem yet, even though available disk space is starting to run low. DPM 2006 has a limit of 64 shadow copies per volume (I think that Microsoft may have borrowed code from SharePoint which has the same limit). The fact that my server actually has 64 shadow copies for this volume means that disk space hasn't yet become a problem for the volume. If disk space were to run low, DPM 2006 would make fewer shadow copies available for restoration. In doing so, it would free up disk space that it could use to accommodate the extra data.

Just because your protected data is growing doesn't necessarily mean that you have to give up having 64 shadow copies though. If you look at Figure A, you will notice that there are some unallocated areas on my hard disk. That being the case, I could use the Modify Disk Allocation option, shown in Figure C, to allocate more space to the protected volume.

Figure C

You can allocate more space to a protected volume if necessary.

This brings up an interesting point though. If you look at Figure C, you will notice that the selected data size for the G:\ volume on Tazmania is 24.43GB. At the time that I set up DPM 2006, I had about 20GB of data on this volume. I told DPM 2006 that I had 24GB of data and it allocated 67.86GB of disk space for protecting the volume. Since that time though, the data on the protected volume has grown to about 50GB. Therefore, having 67GB allocated to the volume is pretty ridiculous. The only reason why it is working without reducing the number of shadow copies available to me is because of the static nature of my data.

I can allocate more space to the volume, but doing so isn't as simple as the initial setup. If you look at Figure D, you will notice that you have to tell DPM 2006 how much space to allocate for replicas and how much space to allocate for shadow copies. The current shown in the figure don't really make sense considering that I have a 50GB replica, but this is beta software.

Figure D

When allocating additional space, you must specify how much space is used for replicas and how much space is used for shadow copies.

For the most part, my hardware configuration works fine for my environment, but it would be completely impractical for a company of any size at all. If you are thinking about deploying DPM 2006 when it comes out, I strongly recommend springing for a RAID array with plenty of space. Rumor has it that you can also use a Storage Area Network to store replicas and shadow copies.

Software requirements

Before I conclude this article, I want to take a moment and talk about DPM 2006's software requirements. On the surface the software requirements seem pretty standard (Windows, SQL Server, etc.) However, there are a lot of hidden prerequisites that must be installed prior to installing DPM 2006. In fact, this was by far the most difficult server application that I have ever had to install. Some of that difficulty was caused by the need for a lot of obscure patches though, and those patches will not be required (individually) when the software ships.

According to the Microsoft TechNet Web site, these are the prerequisites for installing DPM 2006:

  • Windows Server 2003 (Standard or Enterprise Edition), with Service Pack 1 or higher
  • Internet Information Server 6.0
  • SQL Server 2000 (Standard or Enterprise) with Service Pack 4 or higher
  • SQL Server 2000 Reporting Services (Standard or Enterprise) with Service Pack 2 or higher

At first glance, this seems like a pretty reasonable list. Sure, you might have to download a few service packs, but the list itself doesn't seem too bad. There are some hidden requirements though. For example, the SQL Server 2000 Reporting Service has some prerequisites of its own. One such prerequisite is Visual Studio.NET. Furthermore, DPM 2006 connects to the SQL 2000 Reporting Service through a Web interface. One of the requirements for this is that IIS must support SSL encryption. This means that you will have to have an enterprise certificate authority that can issue the certificate to IIS.

The reason why I am telling you all this is so that you can budget for DPM 2006 appropriately. In addition to the hardware cost and the cost of the DPM 2006 license (yet to be determined), you will need licenses for Windows Server 2003, SQL Server 2000, the SQL Server 2000 Reporting Service, and Visual Studio. You may also need another box and another Windows Server 2003 license if you don't currently have a certificate authority installed on your network.