When I was 13, I mowed lawns for an entire summer just to scrape together enough money to buy my first computer. At the time, floppy disk drives cost about $500, which was way out of my budget, so I decided to use a cassette tape player for data storage instead. It didn't take me long to realize that my tape drive was slow and unreliable. File read errors were so common that if I wrote a program I really wanted to keep, I'd have to make multiple backups on about a half dozen tapes.
If you think about it, though, the tape drives most companies use for backing up data aren't really all that different from what I was using almost 20 ago. Sure, today's tape drives are digital instead of analog, and they're a whole lot faster than the tape drive I used back then, but the basic concept remains the same and is subject to the same problems I used to have.
I've often wondered if anyone else thinks that backing up data to tape is archaic and outdated, but tape backups seem like they're here to stay. Recently, however, Microsoft announced that it's attempting to modernize backups by creating a new product called Data Protection Server. This product is designed to build on top of existing Microsoft technologies, such as Volume Shadow Copy and the SharePoint services, while also addressing all of the deficiencies that exist in current backup techniques. Here's how it works.
What's Data Protection Server?
With Data Protection Server, data will be backed up to a dedicated server rather than to tape. This in itself addresses many of the faults related to tape backups. New technologies such as iSCSI (SCSI over IP or Internet SCSI) make it possible to save data to a remote server just as though it were local. Even so, bandwidth limitations could be a serious issue for companies attempting to back up large quantities of data. Of course, all of this is speculation. Microsoft has yet to release details about whether Data Protection Server will support long-distance backups.
There are a lot of details about Data Protection Server that Microsoft has yet to release. Among those details are the licensing fee and the official release date. So far, the only date I've been able to get is a statement from Microsoft that the open evaluation will begin in early 2005. What Microsoft is presently lacking in release details, however, it has made up for in technical details. As I explained earlier, Data Protection Server is designed to address the shortcomings of traditional tape backup. In the sections below, I'll explain each of the various shortcomings and how Data Protection Server will help to eliminate them.
Shrinking backup windows
Several years ago, I was a network manager for a large insurance company. Normally, the company had the usual 9 to 5 business hours (which were usually more like 7 to 7). At any rate, the fact that the building would be virtually empty by 8 P.M. meant that I had plenty of time to perform each night's backups.
During the open enrollment season, the company turned into a 24-hour-a-day operation. Sure, most of the employees went home at 5, but there were hundreds of temps doing data entry all night long. The only way we were able to do the nightly backup was to schedule the open enrollment database backup to run during the temps' lunch hour, and then back everything else up later.
Although this was a good plan, it wasn't perfect. The backup would fail if someone accidentally stayed logged in to the database during lunch, or if the temps were late leaving for lunch. The problem was also compounded by an upper management staff that understood the need for a backup, but was reluctant to lose an hour of productivity each night.
You have to remember that this occurred quite a few years ago. Back then, we were backing up only a couple of GB of data each night. Today, storage space is cheaper and data files tend to be much more bloated. Because of this and government regulations such as Sarbanes-Oxley, the volume of information being backed up each night is increasing exponentially.
Think about it for a minute. There used to be a time when it was common practice to weed out old files in order to conserve storage space on the servers. Today, however, hard drives are so cheap and so large that almost no one weeds out old files anymore. After all, from the company's perspective, it's usually cheaper to add more disk space to a server than to have employees spend time figuring out which files they may never need again. Besides, employees could potentially delete something useful or something that is required by federal law to be retained, so it's best not to weed out files.
The point is, each week you're probably backing up more data. The current solution is to use incremental and differential backups during the week to decrease the amount of data thatï¿?s being backed up. The problem is that incremental and differential backups donï¿?t do you much good if a tape in the series is lost or damaged. Even though incremental or differential backups save time, saving time isn't much help if managementï¿?s philosophy is that there's no good time for you to run a backup.
The open file issue
As you probably know, the reason you canï¿?t just run a backup anytime you feel like it is that if a file or a database happen to be open at the time of the backup, it will generally be skipped. If you have people working all night long or if you have people who stay logged in all night with files open, it means that your nightly backups are probably incomplete.
Microsoft has actually made great strides over the last few years in helping administrators back up open files. For example, Exchange Server uses transaction logs in a way that allows the information store to be backed up even if the server is active. More recently, Microsoft has incorporated the Volume Shadow Copy Services into Windows. Shadow copies of a file can be backed up even if the file itself is in use.
Microsoft has based Data Protection Server on similar technologies. This means that you can create a backup anytime you want without having to worry about files and databases being open. You can now create backups without working around everyone's production schedule. No more having to deal with a specific backup window.
Frequency of backups
The frequency at which your company backs up data is usually directly related to the backup windows and to the volume of information being backed up. What if you could eliminate those two restrictions? Would you back up data more often than once a night?
Suppose your backup runs between midnight and 3 A.M. daily. One day, you come into the office and create a new file at 9 A.M. You spend the entire day working on the file, but at 4:30, the server crashes. You're out of luck because the file hasn't been backed up yet. Unless you can use Microsoft Office's Automatic Document Recovery feature, your file is gone.
That's one of the biggest problems with the typical backup plan. Files do not get backed up until many hours after they're created. A disk crash could end up costing your company a full day's worth of data.
Data Protection Server allows you to create backups much more frequently. For example, you could theoretically run a backup operation every hour or every two hours. That would accomplish two things. First, if you're backing up data every hour, your company would lose at most an hourï¿?s worth of data in the event of a critical failure, rather than a whole dayï¿?s worth.
Second, frequent backups allow you to maintain multiple versions of a file. For example, suppose you've been working on a spreadsheet all afternoon, but you find out that someone has given you some bad data, and everything you've entered in the last two hours is wrong. Rather than having to trash the file or revert to last nightï¿?s backup, you can access the backup made two hours ago and revert to the file as it existed just prior to your entering all that bad data.
The first time I heard someone suggest hourly backups using Data Protection Server, my immediate thoughts were that there was no way to ever do hourly backups because there's too much data to back up. The amount of data being backed up would rob the network of all its bandwidth and would quickly fill up the backup server. However, frequent backups are what make Data Protection Server really shine.
Aside from the open file issue, the main reason you canï¿?t perform frequent backups today is because of the inefficient manner in which Windows backs up files. If you changed one single byte of data in a 100-MB file, Windows would have to back up the entire file.
Data Protection Server is designed differently, though. It starts out by making a normal backup of each file. It can then store up to 64 different versions of the file (just like SharePoint). Rather than storing complete copies of the file, only the bits of data that have changed since the last backup are backed up. This saves an astronomical amount of disk space and greatly reduces the time that it takes to run the backup.
Even though only small fragments of changed files are being backed up in most cases, itï¿?s still possible that large organizations could be backing up a lot of data each hour. Fortunately, Data Protection Server contains a bandwidth limitation feature that prevents it from consuming all of the networkï¿?s bandwidth. Administrators can control how much network bandwidth Data Protection Server is allowed to use, thus ensuring that plenty of bandwidth is available for everyone else.
Limitations of backup media
Another way that Data Protection Server trumps normal backups is that it doesnï¿?t use tapes. Tapes are expensive (especially if you retain lots of archives). Tapes also tend to be less than reliable. A March 2004 study by the Yankee Group indicated that 42 percent of those surveyed had failed to restore data due to a bad tape.
Still another problem with backup tapes is that they tend to never be around when you need them. Everywhere I've ever worked, the disaster recovery policy dictated that the most recent backup be stored off-site. This meant that if someone needed to restore a file from the most recent backup, the restore couldn't be initiated until the tape was retrieved. In some cases, this required driving to a satellite office or driving home to get the tape. In other cases, it required paying an expensive courier service to deliver the tape. In any case, waiting on a tape is very inconvenient and often expensive.
The advantage of Data Protection Server is that there generally are no tapes. Up to 30 days' worth of backups can be stored on the serverï¿?s hard drives (although information can be committed to tape for archival purposes). Since servers usually contain fault-tolerant hard drives, you don't have to worry about having a bad tape or having to go get a tape that is stored off-site. Saving backups to a serverï¿?s hard disks also means that you're no longer limited to the capacity of a data tape. Backups that used to span multiple tapes can now be performed without your having to switch tapes.
Duration of restoration
Another area in which traditional backups have deficiencies is in the restore operation. What happens if a user asks you to restore some fairly insignificant file from last Tuesday? First, you have to use complex backup software to figure out whether the file was actually backed up on Tuesday. Next, you have to schedule the restore operation and retrieve the necessary tape. If you're using incremental or differential backups, you may have to retrieve multiple tapes. Once you've located the file and tapes, the restore operation can begin.
This is where the next problem comes in. Data tapes use whatï¿?s known as sequential access. This means that files are saved to the tape in sequence. There might be 500,000 files saved on the tape prior to the file you're trying to restore. With a tape, there's no way of going directly to the necessary file. The tape has to scroll past the other files until it finally reaches the file you need. This is a huge waste of time, especially if you're trying to restore an incremental or a differential backup and have to swap tapes a few times.
With Data Protection Server, the restore issue is greatly simplified and made more efficient. First, you donï¿?t have to look for backup tapes because all previous backups are stored in a central location. Second, there's no waiting for a tape to scroll to the correct file because Data Protection Server is disk-based. Disks use random access, which means that you can directly access any file on the volume without having to read other files first. Thirdï¿?and this is the biggieï¿?users don't have to come to you if they want to restore a file. They can very easily restore files themselves, directly from their workstation. As you can see, Data Protection Server takes virtually all of the time and effort out of file restorations.
Data protection done right
Data Protection Server will revolutionize the way backups are performed, but I don't think it will completely replace tapes. The product can store only 30 days' worth of archives. Organizations requiring longer retention periods will likely find themselves backing up Data Protection Server itself to tape. I'm concerned about what would happen to a companyï¿?s data if Data Protection Server doesn't end up supporting off-site storage. Since the product hasn't entered public beta testing yet, we'll just have to wait and see what options it actually allows.