Data Centers

Step-By-Step: Using rsync to bolster redundancy and off-site backup storage

How to put the open source tool rsync in place


Most backups use a form of storage medium: tape, CD-R, RAID arrays, or other computers. Soon, DVD writers will reach a more attractive price range, and another form of large storage will be available. You can even use smaller storage mediums like ZIP, JAZZ, or ORB removable disks. There’s nothing wrong with any of these backup solutions, and, in most cases, relying on just one would be sufficient for a home user. For the company or enterprise, though, relying on one backup solution isn’t ideal. For example, relying on a RAID array doesn't do much good if both drives die, and relying on a CD-R drive alone is no good if there’s physical damage (fire, flood, etc.) or theft.

The best solution is to implement a number of backup solutions. Off-site backups are even better. Here, I'll look at an alternative to "conventional" backups to implement a little redundancy and off-site backup storage.

I’ll explain the open source tool rsync and use it to perform a multilevel backup operation. In this scenario, we’ll have a Web and mail server and we’ll use rsync to perform a daily backup to another system on the internal LAN. We’ll look at a quick-and-easy script to perform a weekly backup to a CD, as well. This gives us a multilevel backup solution: daily on a remote system, weekly onto a CD-R. The combinations are endless when you use a tool like rsync; you can have a computer half the world away store your backups by doing encrypted logins and rsync backups over the Internet.

Making rsync work for you
The first step is to have rsync installed on both computers. Most Linux distributions come with rsync and install it by default. You can determine the presence of rsync on any Linux system by using the following command:
# which rsync
/usr/bin/rsync


The which command will return the full path to the rsync binary if it’s on the system and in your PATH. The following script, which would be run from cron nightly, will provide a good starting point for your backups:
#!/bin/sh
RSYNC="/usr/bin/rsync -ar —delete -l -t -e ssh"
mkdir -p /backup/server/var/lib

cd /backup/server
$RSYNC bkuser@server:/etc .
$RSYNC bkuser@server:/var/lib/mysql var/lib
$RSYNC bkuser@server:/var/lib/rpm var/lib
$RSYNC bkuser@server:/var/qmail var
$RSYNC bkuser@server:/var/djbdns var
$RSYNC —exclude=ftp —exclude=logs bkuser@server:/home .


This script simply sets the $RSYNC variable to our command string for rsync, which tells us to transfer files from the remote to the local machine, preserving file time-stamps, attributes, and ownership, and to delete any file on the local machine that’s not present on the remote any longer. We also tell rsync to use ssh as the transport medium so that our transfer is encrypted. Rsync has a wide range of options; if you type man rsync on the command line, you'll get a whole list of them. Modify your rsync command line in the script to suit your needs.

In this instance, we’re using an unprivileged user, bkuser, on the remote server (identified by the hostname server—you’ll need to edit this name to fit your implementation). To get a complete backup, you may want to use root, but this isn't as secure. By using bkuser, only those files that are readable by bkuser will be transferred.

Finally, we store the files in the /backup/server directory. Here, we’re copying the remote /etc, /var/lib/mysql, /var/lib/rpm, /var/qmail, /var/djbdns, and /home directories. When copying /home, we’re excluding two directories: /home/ftp and /home/logs. This will create in /backup/server an identical copy of what is on the remote server. Every night when the script is run, rsync will download, update, and remove files to ensure that it’s identical to the remote server at the time it was run.

To add it to cron, you can use the following in your bkuser's crontab file,where /home/bkuser/remote-rsync.sh is our rsync script:
30 3 * * * /home/bkuser/remote-rsync.sh

In this case, we’re assuming that we have a user named bkuser who will perform the rsync, so be sure that /backup/server is writable by the bkuser user. This cron operation will perform the backup every night at 3:30 A.M.

Securing and automating the login
Now, we also want to make it easy for the user to log in. To do this, we’ll use ssh keys. Since we want bkuser on the local machine to log into bkuser on the remote machine without a password, we’ll have to generate an ssh key for bkuser on the local machine first. To do this, as the bkuser, execute the following:
ssh-keygen -t dsa

When you’re generating the key for bkuser, press [Enter] when asked for a passphrase. This will allow bkuser to use the ssh key without being prompted for a passphrase, which is essential if our rsync script is to execute without intervention. This will create two files in /home/bkuser/.ssh: id_dsa and id_dsa.pub.

Copy the id_dsa.pub file, which is bkuser's public keyfile, to the remote host, like this:
scp ~bkuser/.ssh/id_dsa.pub bkuser@server:~/.ssh/authorized_keys

This will place the local bkuser's public DSA key into the file ~/.ssh/authorized_keys for the remote bkuser. Now, if you were to su to bkuser on the local machine and ssh into the bkuser account on the server, you would obtain a shell without being asked for a password or passphrase.

Finally, to make this as secure as possible, you must make the bkusers on both machines have restricted logins by removing their passwords. On both the local and remote machines, you’ll need to edit the /etc/shadow file if you're using shadow passwords, or /etc/passwd if you're not. The shadow or passwd file will have a listing of all the users on the system—one per line. The first field is the username; the second is the password. Those beyond the first two fields aren’t our concern at the moment.

Change the second string, which will look like random garbage, to this string:
!!

This will prevent anyone from logging ino the account with a password. In our case, this will prevent anyone from logging in to the bkuser account on the local machine or the remote machine. This is ideal because, after this, the only ways to log in to the bkuser account would be:
  • ·        root via su
  • ·        bkuser (on the localhost) logging in on the remote server
  • ·        bkuser on the local machine (as root) using su

This means that no one will be able to compromise bkuser on the remote server by brute-forcing their way into the bkuser account on the local system.

At this point, we have a secure and reliable means for backing up the remote server. The only fallible part of this entire scheme is the question of what happens if the remote server somehow loses all the information on the drives we’re mirroring. We hope that someone would notice this before our backup, but to add another level of redundancy, we can make two copies of the remote server on our local machine. Because rsync will delete files locally that are missing on the remote, we could end up losing important files locally if, for instance, the MySQL files disappeared on the remote server. If you have the space, making a second copy would definitely be a good idea. If you don't, you may want to remove the —delete command from the rsync command line to prevent it from deleting files locally that are no longer on the remote system. If you do this, you’ll want to periodically (and perhaps manually) run the script with the —delete option so that your backups are true to the remote system.

To make a somewhat protected copy of your archive, you’ll need to first run the script once to determine the size of the archive. To do this, execute the following:
cd /backup/server
du -sh


This will return the size of the archive. For the sake of illustration, let's say it returns a value of 750 MB. If you look at this number, chances are the size will grow, not diminish. You want to leave a little room for removed files, etc. that might reduce the size, though. Pick a value that you’re comfortable with; we'll say 600 MB here. We’re going to tell our script that if the returned size of the directory tree is 600 MB or more, we’ll make the copy. If it's lower, we won’t because it may indicate that something is wrong.

To accomplish this, append this to the end of your rsync script. This makes sure that $CURSIZE, which is obtained by running the du command and telling it to summarize and output bytes, is greater than $MINSIZE, which is an arbitrary value we define (in this case, 600 MB, or 600,000,000 bytes). If the returned value of du, in bytes, is greater than our defined threshold, we recursively delete the directory /backup/second/server (which is our failsafe backup) and then copy the contents of /backup/server to /backup/second/server, so we have two identical copies.

If du returns a total size of less than 600 MB, we leave the backup as is and simply echo to the screen the number of bytes that our primary backup directory contains. Since this will be run by cron, the message will show up in our e-mail inbox letting us know what's going on. You'll also notice that we pipe du's output as input to cut so that we can properly compare the size.

The final backup
The final step in our backup solution is to write to a CD-R, tape, or any other backup medium that’s available. If you have no alternate backup medium available, you should be okay with this implementation; however, if something happens to the remote and local machines simultaneously, you’ll be left without a backup. Since this isn't very appealing, we'll look at backing up to a CD-R. Because CD-RW drives are so inexpensive these days, chances are you already have one or can obtain one.

The only problem with CD-R backups is that they should be performed manually. Unlike tape drives with large capacities, you're limited to about 650 MB or 700 MB per CD. This means that if your backup is larger than 650 MB, you’ll need to do some "staging" work prior to creating the images to burn to a CD. Likewise, you’ll want to see the size of your ISOs prior to burning them to a CD in case they do happen to grow beyond the size of the CD. This isn't difficult, but it does take some planning. If you're following the backup strategy so far, I recommend burning a copy of the backup to a CD at least once a week. Any less than that isn't very effective; any more frequently than that, and it will seem like too much work, and you may be tempted to skip it altogether.

We can have cron run another script to automatically make the ISOs for us, leaving them for us to burn manually. Your script would be relatively small and look something like this:
#!/bin/sh
DATE=`date +%b-%d-%Y`
rm -rf /backup/server.iso
mkisofs -V server-$DATE -r -J -o /backup/server.iso /backup/server


This simple script will make an ISO labeled with the date it was executed (i.e., "server-Jan-16-2002"). It will take the contents of /backup/server, our backup directory, and create an ISO image as /backup/server.iso. This is the image you would burn to a CD-R, with a command like this:
cdrecord -v -speed=2 -dev=0,0,0 -data /backup/server.iso

This command tells cdrecord to burn with a speed of 2 on the device "0,0,0" and using /backup/server.iso as the image to use. To obtain the device number for your CD-RW drive, you would use the following:
cdrecord —scanbus

Look at the output and identify your CD-RW drive. The first string on that line is the device string to use (see example). Here we can see that our CD-RW drive is identified as device 0,0,0.

If your ISO ends up being larger than 650 MB, you may have to create a staging area and split up the contents of your directory. For example, if you find that yourvar/lib/mysql directory on your backup is the largest directory, you might use something like this instead of the previously noted script that uses mkisofs to build the image.

Of course, this implies that you’ll always have roughly 1.5 times the size of your backup available as free space on your hard drive. Here, we basically make two directories: /backup/srv1 and /backup/srv2. In /backup/srv1, we copy everything and then remove the var/lib/mysql directory. In /backup/srv2, we only copy the contents of var/lib/mysql. Then we create two ISOs: one that contains the contents of /backup/srv1 and the other that contains the contents of /backup/srv2. We then remove our temporary staging directories. The end result is two ISOs to burn, both of which should be small enough to fit on a single CD. Apply this logic to your situation (perhaps your Web directories contain most of the data, not your MySQL databases), and you should still be able to find an acceptable and almost entirely automated backup process.

Conclusion
Redundant backups may seem somewhat paranoid, but they make the pain of data loss much easier to deal with. With the scenario outlined in this Daily Drill Down, you’ll always have a 24-hour backup and, in a worst-case situation, a one-week-old backup. Consider the value of your data and then determine if a redundant backup implementation is important enough to spend the time and effort to configure.

About Vincent Danen

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

Editor's Picks

Free Newsletters, In your Inbox