Data Centers

Backup essentials: Safeguarding user data with cron and tar

Users know that they're supposed to back up data, but most of them never do it. That's why Bryan Pfaffenberger suggests that you consider using the tar and cron utilities. They offer a simple, automated solution to the backup dilemma.

New Linux users who are familiar with Mac and Windows workstations often look in vain for full-featured, user-friendly backup software. Of course, GNOME and KDE come with archiving utilities, but nothing in Linux comes close to the user-friendly backup utilities of commercial desktop systems. Many Linux backup programs are designed for industrial-strength applications; they back up user directories on huge, multiuser systems. Faced with no obvious alternatives, some users may decide not to back up their data at all. That decision can mean disaster. Wise Linux administrators will take proactive measures to ensure that user data is backed up properly.

Is there really a lack of good backup software? Not at all! Two essential utilities already exist on every Linux user's system: cron, a daemon that launches commands at times and intervals that you specify, and tar, a classic tape-archiving utility that also backs up data over networks. Cron and tar provide a simple, useful solution to any user's backup needs. Furthermore, you can implement this solution within minutes. Once implemented, it works in the background and requires no user action or intervention. In this Daily Drill Down, I'll explain how you can implement this simple solution.
Before I begin, you ought to know that tar isn't the safest or most reliable solution for mission-critical purposes. The solution that I propose is workable for small networks of up to ten or so workstations. For larger networks, you'll want a backup software package that’s specifically designed for your needs. I mention one such network backup package at the end of this Daily Drill Down.
A short philosophical note
Most computer users know that they should back up their data. They understand how important backups are. They promise to perform backups regularly, but they just won’t do it. Therefore, I favor backup tools that run automatically. They rule out the use of tape drives, removable hard drives, CD-ROM discs, and other items that require users to do something—even something as simple as sticking a tape or disk into the computer.

This solution uses Ethernet connections to route backup data to a second system (preferably a system that's reserved for nothing but backup duties). The transfer is done automatically when network traffic is low. Users don't have to do anything. They don't even have to know that it's happening.

Backing up with tar and cron: An overview
  • Tar: You'll use the tar (tape archiver) utility to back up your users’ home directories. This procedure involves making an initial full backup of the user's home directory, followed by daily incremental backups of any new or altered files. Full backups occur each week; then, a new backup cycle begins.
  • Cron: The cron daemon launches the needed backups (both the weekly full backups and the daily incremental backups). You can use the backup schedule that I describe, or you can devise your own schedule and back up at more or less frequent intervals.

Developing a backup strategy with tar
You’re probably familiar with the basics of tar, the standard Unix and Linux archiving utility. Unlike WinZip in the Windows environment, tar is just an archiving utility. It combines two or more files into a single file without compressing the files. Most versions of tar on Linux systems (such as Gnu tar) can work with compression utilities, but you'll want to forget about compression when you use tar to back up your data. Why? Tar won't perform incremental backups on compressed archives. (An incremental backup backs up only those files that are new or that have been modified since the last backup.) Most system administrators believe that it's a bad idea to compress backup data because of the risk that the compression process will corrupt some of the data. If you suffer a drive failure and lose your data, you’ll want your data back intact.

Required options for backup actions
A tar command must begin with one of the required options in Table A. They are useful for backup purposes.

Table A
Short form Long form Description
-c create Creates new archives and names files specified by the -f option.
Use this option to create a full backup.
-u update Appends specified files to existing archives specified by the -f option.
(Works only if the files are newer than the files already in the archive.)
Use this option to create an incremental backup.
-x extract Extracts files from archives specified by the -f option.

Additional options for backup actions
Once you’ve specified one of the required options (-c, -u, or -x), you'll need to add one or more of the options shown in Table B.

Table B
Short form Long form Description
filename file filename Use specific archive file or device name.
Specify an external location with the form hostname:filename
-h dereference Don’t archive symbolic links.
Archive the files that the links point to.
-W verify Verify archive after creating it.
-p same permissions;
preserve permissions
Keep permissions found on source files.
-P absolute paths Keep full path information for each file.

Performing a full backup
To perform a full backup, use a command form that’s similar to tar -cvffilename, followed by the name of the user's home directory. Here's an example:
tar -cWPpf lothlorien:suzanne.tar /home/suzanne

This command backs up all the files in /home/Suzanne. If you specify a directory name, like this command does, tar operates in recursive mode by default. In other words, the utility backs up all the files in /home/suzanne, along with any associated directories. The utility writes these files and directories (via NFS) to the user's directory on a server named lothlorien. The options tell tar to preserve the user's permissions (-p) and to store the full path information for each file (-P). You’ll need this information in order to restore the filesystem if the user's disk fails. Once the backup archive has been written, it’s verified against the original files. Of course, this command won’t work unless lothlorien is running and Suzanne has permission to write to the appropriate directory on lothlorien.

Since my network included some Windows boxes, I used Samba instead of NFS. To mount Suzanne's directory on lothlorien to Suzanne's computer, the following command works wonders:
smbmount //lothlorien/suzanne lothlorien -N

This command creates a Samba connection to Suzanne's directory on lothlorien, mounts the remote directory to a local directory named /home/suzanne/lothlorien, and suppresses the password query. Of course, this command only works if Suzanne's username and password are the same on both systems. You'll also need to modify smbmount so that it runs with the set-user ID bit-enabled. (As root, type chmod u+s /sbin/smbmount.) While you're at it, do the same for smbumount. (Type chmod u+s /sbin/smbumount.) Here's Samba’s version of the full backup command:
smbmount //lothlorien/suzanne lothlorien -N
tar -cWPpf lothlorien/suzanne.tar /home/suzanne
smbumount lothlorien

Performing an incremental backup
To perform an incremental backup, use a command form that’s similar to tar -uvf filename (where filename is the name of the archive file or the name of the device where backups occur). Here's the NFS version:
tar -uWPpf lothorien:suzanne.tar

And here's the Samba version:
smbmount //lothlorien/suzanne lothlorien -N
tar -uWPpf lothlorien/suzanne.tar
smbumount lothlorien

Both commands update the tar archive with only those files that are newer than the files in the current archive or that have been created since the last update occurred.

Repeating the full backup
Periodically, you'll want to repeat the full backup, and you’ll want tar to erase the last incremental backup. When you perform incremental backups with tar, tar adds new or altered files to the archive, but it doesn't remove any files that have been deleted since the last update occurred. As a result, tar archives tend to multiply in ways that make the vines and weeds in your garden look orderly. Moreover, archives may contain copies of sensitive or confidential files that users tried to delete. Thus, tar can compromise confidentiality. When you perform a full backup after an incremental backup occurs and you use the same archive name for the repeated full backup, tar erases the previous copy of the archive and writes a new one. The new archive won’t contain any files that the user deleted since the last backup occurred.

Scheduling the backups with cron
The cron daemon, which is installed by default on most Linux distributions, polls the system once per minute to see if any jobs have been scheduled to run. It checks for scheduled jobs both at the system administrator's level and at the user's level. Now, let's take a look at the process by which users can set up a cron schedule. (If you're setting up a user's system for backup purposes, switch to the user's account before you run crontab, the utility that creates the cron configuration file.)

Learning the essentials of cron syntax
To use crontab, you need to learn the proper syntax for specifying job schedules and commands with cron. Each job is specified in a one-line statement with six fields. From left to right, these fields include:
  • Minute: Possible values are 0 to 59.
  • Hour: Possible values are 0 to 23.
  • Day of the month: Possible values are 0 to 31.
  • Month: Possible values are 0 to 12. (You also can type the first three letters of the month's name.)
  • Day of week: Possible values are 0 to 7. (You also can type the first three letters of the day's name.)
  • Command: This is the same command that you'd type at the bash prompt.

To leave a field blank, use an asterisk (*) in place of one of the possible values. Here's an example of a cron statement:
0 23 * * fri tar -xPpf lothlorien/lothlorien.tar /home/suzanne

This command performs a full backup of Suzanne's home directory at exactly 11:00 P.M. every Friday. It also writes the archive file (lothlorien.tar) to the remote directory that’s currently mounted on /home/suzanne/lothlorien.

The following command specifies a complete backup regimen. It begins with a full backup every Friday and continues with incremental backups on Monday through Thursday.
0 23 * * fri tar -xWPpf lothlorien/lothlorien.tar /home/suzanne
0 23 * * mon tar -uWPpf lothlorien/lothlorien.tar /home/suzanne
0 23 * * tue tar -uWPpf lothlorien/lothlorien.tar /home/suzanne
0 23 * * wed tar -uWPpf lothlorien/lothlorien.tar /home/suzanne
0 23 * * thu tar -uWPpf lothlorien/lothlorien.tar /home/Suzanne

This command assumes that the remote system (lothlorien) is online and that this user (Suzanne) has permission to write to the mounted directory. To ensure that the remote directory is mounted, add the appropriate mounting command to the cron statement, as shown in the following example (which uses Samba):
55 22 * * mon smbmount //lothlorien/suzanne lothlorien -N
0 23 * * mon tar -uWPpf lothlorien/lothlorien.tar /home/suzanne

Using crontab
To use crontab to create the backup schedule, follow these steps:
  1. Log on as the user whose files you want to back up.
  2. In a terminal window (or at the console), type crontab -e and press [Enter]. You'll see the crontab editor, which uses your system's default text editor (such as vi). If you're using vi, type i in order to start inserting text.
  3. Type the cron commands, one on each line. Don’t forget to press [Enter] at the end of each line.
  4. Exit the editor. With vi, press [Esc] to exit the insert mode. Then, type :x and press [Enter] to save the file and exit the editor. When you've exited the editor, cron automatically writes the necessary files and adjusts the system configuration accordingly.

Restoring files
If one of your users suffers a hard disk failure, reinstall Linux on the new disk and use a command like the following to restore the user's home directory:
tar -xvPpf lothlorien/suzanne.tar

Tar automatically creates the needed directories and restores the files to their default directory locations.

This simple automatic backup strategy is appropriate for small networks with only a few users. There are better solutions for larger networks. If you're administering more than a few Linux workstations, take a look at Amanda, which is specifically designed to enable administrators to back up many workstations to a single, high-capacity backup server. The latest version can use Samba to incorporate Windows 95/98/NT hosts.

Bryan Pfaffenberger, a UNIX user since 1985, is a University of Virginia professor, an author, and a passionate advocate of Linux and open source software. A Linux Journal columnist, his recent Linux-related books include Linux Clearly Explained (Morgan-Kaufmann) and Mastering Gnome (Sybex; in press). His hobbies include messing around with his home LAN and sailing the southern Chesapeake Bay. He lives in Charlottesville, VA. If you’d like to contact Bryan, send him an e-mail.

The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks

Free Newsletters, In your Inbox