SolutionBase: Make image-based backups of your servers

You can use these tricks to supplement your existing backup mechanisms.

Having a strong, multidimensional approach to backing up and restoring your servers is an important part of any disaster recovery strategy. While other TechRepublic articles have touched on some of the topics that I will mention in this article, I will go into greater detail on some of the technical topics and will provide some unique tips. Basically, I'm going to provide a collection of powerful, low-dollar, server backup and restore practices that can augment your current strategy.

Word of warning
Beware that some of these solutions are not supported operations from the corresponding hardware or software vendors, and can be destructive to data. I highly recommend that these suggestions be used in environments where your existing disaster recovery mechanism is trusted and in place already. Further, I would not recommend using these tricks until you have become very comfortable with them on test equipment and test data. All of the examples mentioned have been used on HP ProLiant equipment, and have had good results.

Imaging a server
Depending on your drive configuration, you may be able to image your server using popular imaging software like Symantec Ghost. Regarding the drive configuration, if you have a RAID array on your server, this makes a big difference in the logistics of how you could use Ghost. If you have one large array that contains your operating system and server data, Ghost may not be a good idea since it takes an image of the entire system contents and places it in one file. This may not be feasible in systems with extremely large amounts of data.

However, Ghost does do a good job of compressing certain files types, like databases. If you have your operating system (5-GB drive) on one array and a large amount of data—or your server application—on another array (533 GB), you could make an image of just the operating system in order to have a more manageable image file that can fit on a backup tape or a network share. Also keep in mind that Ghost compresses native Windows file systems best, so the destination image may not be compressed well on dual-boot or Linux systems.

The current version of Ghost, allows you to back up and restore single drives if you have partitioned your drives within the operating system. For example, if you have a single 18-GB RAID 1 array and configured Windows 2000 with a C:\ drive for your operating system and a D:\ drive for your Data, you can back up just C:\ or just D:\ now with Ghost. This is a new feature of Ghost, if the operating system has done the partitioning.

It is important to note that Symantec clearly states that Ghost is not supported on RAID array configurations. However, I have never had an issue backing up a RAID 0, RAID 1, RAID 0+1, or RAID 5 array using Ghost for Windows (NT, 2000, WS2K3), NetWare (3/4), and Linux(Mandrake) filesystems.

Using Ghost (the corporate version for servers) is generally pretty easy and somewhat similar to using the popular desktop version. I have used it in DOS with a bootable floppy disk or CD. There are plenty of resources on the Internet to make a good bootable CD or floppy for your equipment. The major differences you will encounter in building a DOS disk are in drivers for your tape drive (APSIXXX.SYS driver), network drivers (MSLANMAN client), CD-ROM drive (MSCDEX driver), and array controllers (some array controllers have a DOS driver that would need to be loaded, such as IBM e-series ServeRAID, for example). Once you create a bootable disk and run Ghost locally or over the network, then you can make an image backup of your server to a file on the network, a tape image, or burn it to CD or DVD (with Ghost 8.0).

I have had similar success with PowerQuest DriveImage, which is now part of Symantec.

Combination of traditional mechanisms
Many organizations use online backup tools like BackupExec or BrightStore for the day-to-day backup and restore operations in Windows environments. If you use one of these two mechanisms to back up your data to a tape mechanism on a daily or periodic basis, I have had good experience supplementing that operation with a periodic automatic NTBackup script (NTBackup is built into Windows).

NTBackup does a good job of taking nontransactional data like user directories and databases (with their engine stopped) and placing that data in a tape or file location. This can be beneficial if your tape drive has failed, your primary restore mechanism is not restoring correctly, or the media has failed and you need to do a data restore. If you have a supplemental periodic NTBackup, you can fall back one more level to retrieve your data more expeditiously. Though the data may be older than the primary backup mechanism, it may be just what you need to resolve the situation. If you do this, be sure to have a different storage target. For example, if your BackupExec operation writes to tape, send your NTBackup script destination to a file on a different system.

The downside of using a supplemental backup is that there is more data to manage and an increased amount of storage requirements. The supplemental mechanism could be used only for the most critical data if resources are a concern. This method is also the safest and easiest to implement of the options that I am covering. Plus, this is a supported operation since it is a native Windows tool.

"RAID backup"
If you are using certain RAID configurations on your servers, you may be able to do a “RAID backup.” This mechanism is by far the most dangerous of the ones that I will describe, and should be performed with the utmost attention to detail and an absolute understanding of your configuration before you touch anything. Sometimes this technique is also called "pulling a drive" because it involves taking one drive out of an array and putting it aside as a backup.

I would only recommend performing this on a RAID 0+1 or RAID 1 configuration with two physical drives and on servers with the exact same configuration. This configuration is good because the two hard drives contain a mirror of each other and either drive can fail and the system can continue to operate. In fact, the failure is usually transparent to the operating system (though the rebuild process will exhibit degraded performance). These two drives also contain the basic RAID configuration data for the array.

It is important to pull a drive at a powered off state or zero-transaction state (in a hot-swap scenario). A good example of a zero-transaction state is when you can get to the Windows startup menu by pressing F8 at system boot time. Before you make a selection at this menu, there are a minimum number of transactions being written to disk. If you pull a drive from your mirror while your operating system is loaded, you are essentially making a backup of an abnormal system termination or crash.

Once you pull out a drive from the RAID pair, you can either hold on to it as a backup of the system, or insert it into another (exact same hardware configuration) system as a backup machine. You can insert another drive into liberated slot on the first machine to rebuild the now failed array, but only insert this drive while the array is initialized. You do not want a foreign drive and an array with one drive missing to be initialized, as unpredictable results may occur. Initialize the array with one drive missing, then insert the new disk into the empty slot and the array will rebuild onto that disk. It is also important to ensure that you maintain drive slot assignments, for example, if your RAID 1 array is in drive positions 0 and 1, and you take the drive in position 0 to the backup machine (exact same configuration) to a drive cage with no hard drives, and power up that machine, initialize the array with one failed drive, and continue.

Different array controllers report status updates back to the user in BIOS in different ways. For example, HP SmartArray controllers ask you if you want to enable “Interim Recovery Mode” to continue with an array with a missing drive. Before you attempt a RAID backup, be 100 percent familiar with your array controller’s behavior. HP SmartArray controllers have a consistent behavior that I have become accustomed to. For example, hot spare drives are turned off, and rebuilding drives have the drive icon blink green. With Dell PERC RAID controllers, for example, it is possible to perform a RAID backup, but it requires entry into the BIOS to take the new array to a new system.

Some administrators like to keep a hard drive "on the shelf," knowing that it is a complete copy of the RAID array, and if a major failure with that system occurs, this drive could be placed on similar hardware for a recovery operation.

A problem scenario
It is important that I illustrate a configuration that I have had problems with—and I would advise against doing a RAID backup on a similar system. The configuration I had involved two or more separate logical drives, each with one array configured at RAID 1, all on the same physical channel or drive cage, using HP equipment. In this configuration, do not attempt a RAID backup because RAID integrity issues will likely arise. Though each of the logical drives is separate, they are "aware" of each other’s presence with the array configuration information on all drives. (This can be explained with more precision at the Smart Array controller level, but that is beyond the scope of this article.)

Linux bootable CD
Using a Linux CD distribution to get access to a file system on a locked-up machine is a common way to perform repairs on the filesystem. Knoppix is among the most robust and popular CD-based distros. If you are familiar with Linux or UNIX, you can perform a dd or tar operation of your data to a destination on another network resource or even a local tape drive. This method takes a little more finesse. For Windows administrators, this is a little different than some of the other options I have presented and has a little higher learning curve if you don’t already know Linux/UNIX.

Using Knoppix, you can boot from the CD and have the full complement of Linux file system tools, as well as possibly the best "plug-and-play on first boot" configuration for a CD based distro. However, depending on your server hardware configuration, everything may not be available to the Knoppix CD. A common implementation is that if you have two servers on the network, one backs up part of its data to the other over the network and the reverse. This again, would be supplemental to your primary backup and restore mechanism.

Closing thoughts
Eventually, some of these tricks may cease to be viable. For example, my Ghost procedure may not be viable for a long time because I am not sure how long we will have the underlying DOS drivers and compatibility for server class equipment. Remember that it is of pinnacle importance to have a full understanding of your current configuration before trying any of these techniques, which can potentially compromise your system configuration. These are generally unsupported and dangerous operations from a support point of view.

It is also important to keep in mind that these mechanisms are meant to be implemented as a supplement to your primary backup and recovery strategies. In other words, they only provide additional insurance and protection. However, the additional protection could mean the difference between being the hero or the goat for some administrators.