Backing up important data is obviously something we should all do. Unfortunately, it is not always easy to make it happen. We get lazy; we do not have the additional hardware for a backup server; it takes a long time and a lot of CDs to back up to optical media; we do not trust online backup services; backup schemes are difficult to set up and use — any of dozens of reasons can stand in our way. Still, we know we should be backing up our important data.

Modern open source Unix-like operating systems offer a plethora of options for incredibly simple, effective backup schemes, however. If the problem is figuring out how to set one up, a simple rsync solution may be exactly what you need.

The rsync utility is used to synchronize files between two systems. It does so by way of incremental copies, only copying from the source to the destination what has not already been copied there, saving time, network bandwidth, and system resources. This makes it well suited to the task of maintaining up to date backups of large collections of data, on whatever schedule suits the user. It also offers the benefit of using the SSH protocol by default to encrypt the file transfer, protecting your data from eavesdroppers as it is copied across the network.

The home directory on a Unix-like system can vary in the importance of what it stores, of course. It depends on the computer use habits of the individual. Some may have nothing of particular interest there, effectively using the home directory only as a place to store configuration files for the applications they use, generated automatically when those applications are installed. Others may store hundreds of gigabytes of family photos, videos, or music. Still others, likely the most technically proficient, may have customized configuration settings, archives of source code, useful scripts that automate common tasks. Then, of course, there is the case of writers, who may have huge archives of stories, articles, documentation, and correspondence that they have created and exchanged with others.

The combination of the importance of what the person wishes to keep stored and accessible, its “irreplaceability,” its extent (that is, the storage “size” of it), and its variability over time adds up to a set of conditions that a backup scheme should satisfy. Where none of these criteria particularly exist, as in the case of the person whose entire computing life revolves around the Web browser without any bookmarked URLs that are of any particular import, it is possible that no backups are needed at all — but such a state of affairs must be rare indeed. Most people would at least suffer some consternation at the loss of browser bookmarks.

A simple script that can save you having to check the manpage every time you want to back up your home directory with rsync can reduce the process to typing a single short word. Save the following as the contents of a file:

#!/bin/sh

rsync -av --delete /home/user user@host:/home/user/backup/arch/

Command options and arguments

The a option is rsync’s “archive” option. It is actually syntactic sugar, a shortcut of sorts, that is equivalent to this much lengthier option string:

-Dgloprt
  • The D itself is syntactic sugar for two other options: --devices and --specials. The devices option preserves device files, which is probably irrelevant if you are only backing up your home directory, while the specials option preserves “special” files such as symlinks.
  • The g stands for “group”, and preserves group ownership.
  • The l option copies symlinks as symlinks, rather than as files.
  • The o stands for “owner”, and preserves user account ownership.
  • The p preserves file permissions.
  • The r stands for “recursive”, and tells rsync to read all directories within the current directory, all directories within those subdirectories, and so on.
  • The t preserves modification times.

The v option tells rsync to be verbose. Use this if you want it to output what it is doing as it does it.

This rsync command assumes you do not want to continue storing files that you have deleted, and the --delete command tells rsync to delete any files in the backup archive that have been deleted on the filesystem whose contents you wish to back up. If you want to maintain copies of files you have deleted, you can eliminate that option as a means of ensuring you have a backup of files from the last backup you made just in case you later realize you did not want to delete the file.

The first path in this rsync command, /home/user, is the path to your user account’s home directory. On BSD Unix systems, this may take the form /usr/home/user, though they typically offer a symlink so that /home works as well as /usr/home, while Linux-based systems usually only use /home. Replace the user part of that with your account’s username.

The second path tells rsync where you want to store your backup archive.

  • user is your account username on the host system where you will be storing your backups — perhaps a fileserver.
  • host is the hostname or IP address of the host system where you will be storing your backups.

The example path assumes you are storing the backup within the home directory of your user account on the target system, within a subdirectory of a directory named “backup”. Thus, the arch part of the path should be changed to whatever name you want to use to refer to the specific archive of your desktop. For instance, if you are backing up your home directory from a laptop whose hostname is “pocket”, you might want to trade arch for pocket in the example path.

A number of other options might be of some interest. Check the rsync manpage for the H, R, S, e, x, and z options, as well as the long options --exclude, --delete-after, and --delete-during for more information.

Using the script

If you name the file bupper, you can make the file executable with this command:

chmod 700 bupper

If you name it something else, substitute your filename for “bupper” in that command. Place the file in your execution path; ~/bin/ is a usually a good place for personal administrative scripts. Make sure you add that directory to your execution path in the configuration script for the shell you use at the command line. Thereafter, all you will need to do to back up your home directory to the system you are using to store your backups is enter the command bupper, or whatever else you choose to name the file. You could as easily set the rsync command with all options as a shell alias; the syntax for this will vary, depending on which shell you use — bash, csh, sh, tcsh, zsh, or any of dozens of other options.

As long as you keep this updated regularly, you should feel relatively secure about the safety of important files in case you ever need to wipe your hard drive, replace it because of a head crash, or otherwise recover from some mishap or emergency that results in loss of data.

One reason to use a script instead of an alias is flexibility. A script is easier to utilize in scheduled operations, as in the case of a cronjob that executes your bupper command once a day.

Even with a scheduled job, you may want to execute the backup script by issuing the bupper command at the shell from time to time. Cases where this might be advisable include instances of having just made some change to what you have stored on your computer that is too important to wait several hours to back up, wanting to make sure everything is up to date just before making major changes to the system, and anticipating some risk to the system you are backing up such as electrical storms or taking a laptop out of your network on a trip.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays