Data Centers

SolutionBase: Backup Linux reliably with rsync and tar

If you want to have a successful disaster recovery plan in place, you've got to start with backups. There are several utilities you can use to back up Linux servers, but one of the most basic ways is by using some common built-in utilities such as <i>rsync</i> and <i>tar</i>. Jack Wallen shows how to use them together to safely back up your Linux server.

This article is also available as a TechRepublic download.

Every IT administrator knows a good backup plan is an absolute necessity for successful disaster recovery. For many, the backup plan seems to always begin with the purchase of proprietary systems that are often costly, unreliable, and not terribly adept. Fortunately, Linux has a solution that is — like the operating system itself — cost-effective, reliable, and flexible.

The beauty of this system is that nearly all modern Linux distributions already have all the tools you need to set it up. The only detail not included is a bit of imagination, a piece of hardware to send the backup files to, and some time to script the files. Before we actually get into the setup, let's take a look at the tools that will be employed for this process.

rsync

The command line utility rsync is used to synchronize files and/or directories between file systems on two computers over a network. The rsync tool was written as a replacement for rcp, but with many new features. One of the features that makesrsync ideal for doing backups: it uses an algorithm that will only transfer modified files.

The standard usage for rsync:

rsync [OPTION]... SRC [SRC]... DEST

rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST

rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST

rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST

rsync [OPTION]... SRC

rsync [OPTION]... [USER@]HOST:SRC [DEST]

rsync [OPTION]... [USER@]HOST::SRC [DEST]

rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST]

One of the reasons I like rsync the most is because it can be used in conjunction with secure shell (SSH). The rsync stream is passed through SSH for encryption. The rsync tool also offers the following features:

  • Support for copying links, devices, owners, groups, and permissions
  • Exclude and exclude-from options similar to GNU tar
  • A CVS exclude mode for ignoring the same files that CVS would ignore
  • Does not require root privileges
  • Pipelining of file transfers to minimize latency costs
  • Support for anonymous or authenticated rsync servers (ideal for mirroring)

SSH

SSH will be used to authenticate between the machines and to encrypt the network traffic. SSH is a secure replacement for the "r" programs (rlogin, rsh, rcp, rexec). SSH gains its security simply because it uses encryption; clear text is never sent over a network. SSH uses RSA keys to authenticate the user to the server as well as RSA keys to authenticate the server to the user.

Typical SSH usage looks like:

ssh [-l login_name] hostname | user@hostname [command]

ssh [-afgknqstvxACNTX1246] [-b bind_address] [-c cipher_spec]

[-e escape_char] [-iidentity_file] [-l login_name] [-m mac_spec]

[-o option] [-p port] [-F configfile] [-L port:host:hostport] [-R

port:host:hostport] [-D port] hostname | user@hostname [command]

Of course, many of you already know how powerful and useful SSH is (as well as how to use it), so this won't be a crash course in how to use SSH.

tar

Also in play will be the tar tool, a GNU version of the tar archiving tool that works similarly to zip. A very powerful and versatile tool, the standard usage of tar is:

tar [ - ] A —catenate —concatenate | c —create | d —diff –compare | r —append | t —list | u —update | x -extract —get [ —atime-pre-serve ] [ -b, —block-size N ] [ -B, —read-full-blocks ] [ -C,—directory DIR ] [ —checkpoint ] [ -f, —file [HOSTNAME:]F ] [—force-local ] [ -F, —info-script F —new-volume-script F ] [ -G,—incremental ] [ -g, —listed-incremental F ] [ -h, —dereference ] [-i, —ignore-zeros ] [ -j, -I, —bzip ] [ —ignore-failed-read ] [ -k, —keep-old-files ] [ -K, —starting-file F ] [ -l, —one-file-system ] [ -L, —tape-length N ] [ -m, —modification-time ] [ -M, —multi-volume ] [ -N, —after-date DATE, —newer DATE ] [ -o, —old-archive,—portability ] [ -O, —to-stdout ] [ -p, —same-permissions, —pre- serve-permissions ] [ -P, —absolute-paths ] [ —preserve ] [ -R, —record-number ] [ —remove-files ] [ -s, —same-order, —preserve- order ] [ —same-owner ] [ -S, —sparse ] [ -T, —files-from=F ] [ —null ] [ —totals ] [ -v, —verbose ] [ -V, —label NAME ] [ —version ] [ -w, —interactive, —confirmation ] [ -W, —verify ] [ —exclude FILE ] [ -X, —exclude-from FILE ] [ -Z, —compress, —uncompress ] [ -z, —gzip, —ungzip ] [ —use-compress-program PROG ] [ —block-compress ] [ -[0-7][lmh] ] filename1 [ filename2, ... filenameN ] directory1 [ directory2, ...directoryN ]

cron

Finally, we'll add the cron tool (a UNIX daemon tool used to execute scheduled commands), so our backup can be automatic.

Setup

This backup scheme is going to work like this:

  1. Set up the server to accept rsync and SSH calls.
  2. Create the scripts to do the backups.
  3. Create public and private SSH keys and place them on the proper machines.
  4. Create a crontab entry to execute the scripts.

It's very simple.

Making them work together

Once you know the tools you're going to use, all you have to do is make them work together. For the purposes of this article, we are going to assume the following:

  • Client: machine to be backed up
  • Server: machine where backup will be stored
  • User: the account (the same on each machine) that will be used for the backups. So when I say /home/user you know that user is where user equals the actual name of the backup account.

The first thing to do is make sure that SSH is installed on the machines. You shouldn't have to worry about rsync, cron, and tar, because they are installed on all modern Linux distributions by default. So hop onto one machine and run the command rpm -q openssh, assuming you are running an rpm-based system; if it's installed, great.

If OpenSSH is not installed, you'll have to install it. Once again assuming you are running an rpm-based distribution, if OpenSSH is not installed, run the command (as root):

yum install openssh

to install the system.

To start the secure shell daemon (sshd), run the command:

/etc/rc.d/init.d/sshd start

andsshd should start up without any problem.

SSH keys

In order to make this backup system executable (without user input) you need to generate a public private keypair for SSH to use. If you skip this step, SSH will require the manual entry of a password each time the script is run. To do this, follow these steps:

On the Client (the machine you want to back up) as the user that will be holding the backup files:

mkdir ~/.ssh

chmod 700 ~/.ssh

ssh-keygen -q -f ~/.ssh/id_rsa -t rsa

You will then be asked to enter (and re-enter) a passphrase. For security purposes, you will want to use a secure passphrase. Do not use the user password or an empty passphrase, as this will render the backup vulnerable.

The next step is to lock down the file permissions of the new key files created. So run the commands:

chmod go-w ~/

chmod 700 ~/.ssh

chmod go-rwx ~/.ssh/*

Now you have to copy the public key (file id_rsa.pub) to the server. This file could be placed onto a CD or copied by whatever means you wish. With the key on the server, execute the following commands to ensure the public key information is in the correct file. These commands need to be run under the same user.

Again we'll assume the user is "user" and, in that account, run the commands:

mkdir ~/.ssh

chmod 700 ~/.ssh

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

rm ~/id_rsa.pub

With these in place, you can test to see if the keys are working by running the following command from the client:

ssh -o PreferredAuthentications=publickeyserver.ip.address

You should get zero errors.

Now that all the tools are installed, give rsync a test run.

Your first test

Log on to the user account on the client machine. Create a test directory with a few test files inside (we'll call this directory test.) Now, run the command:

rsync -e ssh -a —delete /home/user/test user@servier.ip:/home/user/test

Now check the server /home/user directory to see if the new test directory has been uploaded. If so, you're in luck.

Now, I'm going to describe one caveat to this system. Since you will most likely be backing up a server, a lot of the necessary directories will be housed in a directory tree that the standard user can not gain access to. If this is the case, you may have to set this system up using the root user.

However, do not do this if you think you have any weaknesses in your system. Setting the root user up to have secure shell access via public keys could leave your system vulnerable. You may have to add a few extra steps to your scripting and your crontabas a protective measure.

The scripts

You now need to create a bash script that can be run by cron. This script can be very simple. The first script that will have to be written will be a script to back up the necessary directory, place the archive in the /home/user directory, and give it permissions so that user can access it.

This script could look something like:

#!/bin/sh

tar -cfz /home/user/html_backup.tgz /var/www/html

chmod 777 /home/user/html_backup.tgz

Name this file html_backup, give it executable permissions (with the command chmodugo+xhtml_backup), and move it to /usr/bin. Now you'll need to decide when you want this script to be executed. It will need to occur before the rsync script in order to ensure a new file is in its place. So let's set up cron to run the html_backup at 11:50 p.m. every night. Open up the /etc/crontab file and append the following:

50 23 * * * root /usr/bin/html_backup

Now we'll write the rsync script. This script will be:

#!/bin/sh

rsync -e ssh -a —delete /home/user/html_backup.tgzuser@servier.ip:/home/user/html_backup.tgz

Note: The rsync line is only one line.

Save this file with the name html_rsync, give it executable permissions (with the command chmodugo+x), and move it to /usr/bin.

Now open up the crontab file again and append the following:

00 24 * * * user /usr/bin/html_rsync

Save the file.

Your system is now set up to archive the /var/www/html directory at 11:50 p.m., and to then send that file to the backup server at midnight.

Final thoughts

Of course, this system is only a basic system. You will probably need to tweak the scripts to your needs. Consider this article your springboard to get everything up and running. For an added level of safety, it might not be a bad idea to set up a cron job on the server to do a disk backup of the /home/user/html_backup.tgz file (or whatever files/directories you backup) nightly or weekly.

About

Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website getjackd.net.

Editor's Picks