Disaster Recovery

A simple rsync script to back up your home directory

Backing up your files is a very important and very often neglected measure to save yourself the frustration of lost data. Overcome that neglectful tendency, and protect your data from accidental loss with a simple rsync script.

Backing up important data is obviously something we should all do. Unfortunately, it is not always easy to make it happen. We get lazy; we do not have the additional hardware for a backup server; it takes a long time and a lot of CDs to back up to optical media; we do not trust online backup services; backup schemes are difficult to set up and use -- any of dozens of reasons can stand in our way. Still, we know we should be backing up our important data.

Modern open source Unix-like operating systems offer a plethora of options for incredibly simple, effective backup schemes, however. If the problem is figuring out how to set one up, a simple rsync solution may be exactly what you need.

The rsync utility is used to synchronize files between two systems. It does so by way of incremental copies, only copying from the source to the destination what has not already been copied there, saving time, network bandwidth, and system resources. This makes it well suited to the task of maintaining up to date backups of large collections of data, on whatever schedule suits the user. It also offers the benefit of using the SSH protocol by default to encrypt the file transfer, protecting your data from eavesdroppers as it is copied across the network.

The home directory on a Unix-like system can vary in the importance of what it stores, of course. It depends on the computer use habits of the individual. Some may have nothing of particular interest there, effectively using the home directory only as a place to store configuration files for the applications they use, generated automatically when those applications are installed. Others may store hundreds of gigabytes of family photos, videos, or music. Still others, likely the most technically proficient, may have customized configuration settings, archives of source code, useful scripts that automate common tasks. Then, of course, there is the case of writers, who may have huge archives of stories, articles, documentation, and correspondence that they have created and exchanged with others.

The combination of the importance of what the person wishes to keep stored and accessible, its "irreplaceability," its extent (that is, the storage "size" of it), and its variability over time adds up to a set of conditions that a backup scheme should satisfy. Where none of these criteria particularly exist, as in the case of the person whose entire computing life revolves around the Web browser without any bookmarked URLs that are of any particular import, it is possible that no backups are needed at all -- but such a state of affairs must be rare indeed. Most people would at least suffer some consternation at the loss of browser bookmarks.

A simple script that can save you having to check the manpage every time you want to back up your home directory with rsync can reduce the process to typing a single short word. Save the following as the contents of a file:

#!/bin/sh

rsync -av --delete /home/user user@host:/home/user/backup/arch/

Command options and arguments

The a option is rsync's "archive" option. It is actually syntactic sugar, a shortcut of sorts, that is equivalent to this much lengthier option string:

-Dgloprt
  • The D itself is syntactic sugar for two other options: --devices and --specials. The devices option preserves device files, which is probably irrelevant if you are only backing up your home directory, while the specials option preserves "special" files such as symlinks.
  • The g stands for "group", and preserves group ownership.
  • The l option copies symlinks as symlinks, rather than as files.
  • The o stands for "owner", and preserves user account ownership.
  • The p preserves file permissions.
  • The r stands for "recursive", and tells rsync to read all directories within the current directory, all directories within those subdirectories, and so on.
  • The t preserves modification times.

The v option tells rsync to be verbose. Use this if you want it to output what it is doing as it does it.

This rsync command assumes you do not want to continue storing files that you have deleted, and the --delete command tells rsync to delete any files in the backup archive that have been deleted on the filesystem whose contents you wish to back up. If you want to maintain copies of files you have deleted, you can eliminate that option as a means of ensuring you have a backup of files from the last backup you made just in case you later realize you did not want to delete the file.

The first path in this rsync command, /home/user, is the path to your user account's home directory. On BSD Unix systems, this may take the form /usr/home/user, though they typically offer a symlink so that /home works as well as /usr/home, while Linux-based systems usually only use /home. Replace the user part of that with your account's username.

The second path tells rsync where you want to store your backup archive.

  • user is your account username on the host system where you will be storing your backups -- perhaps a fileserver.
  • host is the hostname or IP address of the host system where you will be storing your backups.

The example path assumes you are storing the backup within the home directory of your user account on the target system, within a subdirectory of a directory named "backup". Thus, the arch part of the path should be changed to whatever name you want to use to refer to the specific archive of your desktop. For instance, if you are backing up your home directory from a laptop whose hostname is "pocket", you might want to trade arch for pocket in the example path.

A number of other options might be of some interest. Check the rsync manpage for the H, R, S, e, x, and z options, as well as the long options --exclude, --delete-after, and --delete-during for more information.

Using the script

If you name the file bupper, you can make the file executable with this command:

chmod 700 bupper

If you name it something else, substitute your filename for "bupper" in that command. Place the file in your execution path; ~/bin/ is a usually a good place for personal administrative scripts. Make sure you add that directory to your execution path in the configuration script for the shell you use at the command line. Thereafter, all you will need to do to back up your home directory to the system you are using to store your backups is enter the command bupper, or whatever else you choose to name the file. You could as easily set the rsync command with all options as a shell alias; the syntax for this will vary, depending on which shell you use -- bash, csh, sh, tcsh, zsh, or any of dozens of other options.

As long as you keep this updated regularly, you should feel relatively secure about the safety of important files in case you ever need to wipe your hard drive, replace it because of a head crash, or otherwise recover from some mishap or emergency that results in loss of data.

One reason to use a script instead of an alias is flexibility. A script is easier to utilize in scheduled operations, as in the case of a cronjob that executes your bupper command once a day.

Even with a scheduled job, you may want to execute the backup script by issuing the bupper command at the shell from time to time. Cases where this might be advisable include instances of having just made some change to what you have stored on your computer that is too important to wait several hours to back up, wanting to make sure everything is up to date just before making major changes to the system, and anticipating some risk to the system you are backing up such as electrical storms or taking a laptop out of your network on a trip.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

21 comments
Sterling chip Camden
Sterling chip Camden

What would be the most effective way to maintain incremental backups? I'd like to be able to keep a monthly backup for a year, weekly backups for a quarter, and incremental daily backups from the prior weekly backup for a month. But I don't have enough space for all those daily backups in full. Great article, BTW.

Neon Samurai
Neon Samurai

Rsync is fantastic and only gets better when you add in ssh. Transfer between osX, Debian, the NAS box, my remote servers.. it "just works" (tm). The primary reason I continue to mention the lack of native SSH service in windows is primarily because of how slick SSH and Rsync over SSH are. The only thing I miss with rsync is syncronization. It will go from A to B. It'll delete things at B which do not exist at A. In a single processing step, it won't copy both directions between A and B to balance out what has changed in both locations. I've even put time into trying to figure out ways to sync using an intermediary location but no such luck. Am I missing something? Can Rsync do a true sync between two locations rather than a one way sync with "--delete"? For now, Unison provides a cross platform tool with synchronization between both side but it means having the GUI layer too. But.. for one-way A to B transfers, rsync if fantastic. I use it to sync media to my palmtop and even to sync PIM data from the palm top to desktop and notebook (GPE). I'm not yet to the point where rsync is my choice for copying around on the same machine but it's used where possible between machines.

apotheon
apotheon

If you have enough room for three full backups plus the incremental diffs, you can just maintain three rsync backup lineages -- one of them updated monthly, another weekly, and another daily.

Neon Samurai
Neon Samurai

With my scripted backup, I squash directories into tarballs. It includes the date as part of the filename.tar.bz2. For April, cleanup will involve removing March; "rm *-201003*.tar.bz2" Now the part I haven't automated is the actual file removal. My backup script does not yet remove the outdated copies. Alternatively, I know some of the rsync based backup options use links. This is to save on space; if a file hasn't changed, simply include a link from the current backup to that older file backup 1 = file1.ext backup 2 = link-file1.ext, file2.ext backup 3 = link-file1.ext, file3,ext

pgit
pgit

The primary reason I continue to mention the lack of native SSH service in windows is primarily because of how slick SSH and Rsync over SSH are. I have said exactly this a bazillion times. I'm pretty sure Microsoft is actively interested in preventing such an incredibly useful, free, cross platform functionality as ssh and rsync. Windows backup solutions suck, unless you want to fork over to a third party vendor. If ssh were easier to implement on Windows this would be a much easier occupation... I tried unison across differing OSs, but never got over what appears to be very strict versioning. I couldn't find the same version # for both Linux and Windows, and any attempt to use it threw an error to the effect that the versions must be the same.

apotheon
apotheon

Unfortunately (for you), rsync is a backup tool -- which assumes a one-way relationship. If you want to be able to synchronize between multiple locations, especially without ugly kludges and an intermediary server, you need something like Unison or a DVCS such as Mercurial.

lastchip
lastchip

But from reading the rsnapshot link, it has a problem that is all too familiar to me. Let me explain. I have a web server that obviously is facing the world. I've made every attempt to secure it, with my fairly limited knowledge, but so far, have been successful. Part of the securing process has been to adopt very long and random passwords that would hopefully take considerable effort to break. I need to backup that server on a regular basis to another computer on my network. At the moment, I do it manually, using ssh, which requires me to enter my password at the appropriate time. And yes, I fall into all the pitfalls that Chad has outlined, including not doing it as regularly as I should. Now, it seems to me, not only rsnapshot, but all the automatic backups I've researched so far, need key based logins without a passphrase or password and I feel really uncomfortable with that. Am is I missing something or misunderstanding something and becoming paranoid about a problem that perhaps, only exists in my mind?

apotheon
apotheon

. . . but haven't gotten around to it.

Neon Samurai
Neon Samurai

Unison should just be taking two directories and keeping them in sync. I've not relied on it's rsync, just it's sync between two locations. With Windows, I use portable Unison; keeps it's profiles clean and in one place and I normally have it on the flashdrive it's syncing folders to/from. With *nix, I install Unison then create it's relevant profiles. Never had a conflict over versions but I'm not sharing the profiles config between versions.

Neon Samurai
Neon Samurai

I've been considering Subversion but that means keeping a central server rather than more arbitrary relationships. Unison is my tool of choice. When I was last comparing, it had the best management of sync and conflict resolution. It's also available across *nix and Windows with a portableapps version. I was hoping I'd missed an Rsync command switch but so be it. Rsync is is great for what it does.

Neon Samurai
Neon Samurai

I think it's looking for certificate login over ssh. In general, certificate login is better; password check happens on your local machine and private certificate does not get used unless the password is correct. With automated stuff, the issue is the password. Normally programs want a password-less certificate rather than having to use the cert and store the password for it. In that case, the challenge becoming protecting your private certificate. The central server gets a dedicated "remote connecting" user with ssh certs. Both public and private certs can be set Read Only by Owner ("- r-- --- ---" or "chmod 400"). I go so far as to then remove the password from the connecting user; one must log in as a regular user, su root, su connecting user - (then all they get is that user's certificates since it hasn't rights to do much anything on the system). On each remote system, create a dedicated "incoming connections" user. I don't believe this user needs ssh certificates since it is only for recieving connections. ssh-copy-id the connecting user's certificate into the each remote system's incoming user. Once the certificate is in place, delete the incoming user's password and confirm that you can still connect by certificate. Again; one must log in as regular user, su root, su incoming user since this user should never have need to login by password. Each remote system has a dedicated incoming user and the central user's public certificate. The core serve has the connecting user's public and private certificate. On both central and remote systems, the dedicated user has no password; one must su to user. The only account that can log into the remote user's is the central user with it's private certificate. It sounds like your setup would have two machines; central server and webserver. Your central server would have the public and private keys. You want it to reach out to the webserver to harvest backups not wait for webserver to reach in and feed it backups. On your webserver, you have the "backup connection user" account and the central server's public certificate. If your webserver is broken, they may get your central server's public certificate. They can't reverse the private cert out of it or reverse out a usable password. They can't initiate connections from the public cert side. You just need to keep your central server protected since it contains the private cert which would let one initiate connections with other related systems. In the end, you do have to decide if you trust unprotected certificates in your chosen OS's file permissions. Can you limit the valuable cert to one machine and can you get at that certificate without being Root or the valid certificate owner?

apotheon
apotheon

Using public key authentication for SSH basically authenticates one machine with another, because (generally speaking) SSH keys are generated on a given machine, rather than (presumably) carried around with you between machines over time like OpenPGP keys. The public keys for SSH are generally meant to authenticate a machine rather than a person, in other words, which makes them ideal for purposes like backups and less ideal for purposes like secure communication between people. SSH establishes an encrypted connection with a given remote system before exchanging authentication data, and the key itself is not actually exchanged; rather, the private key operates on some data that is sent to the remote machine, and if the remote machine possesses the corresponding public key it can then reverse the operation performed by the private key to authenticate the machine that possesses the private key. Does that make sense? Do you have concerns that are not addressed like this -- such as the ability for a local user to use a machine's private key to access the remote machine via SSH? If you are running automated backups, the remote machine must by definition "trust" the machine connecting to it. Otherwise, either the backup will not happen, or you will have to be on-hand to authenticate as a user every time a backup runs.

pgit
pgit

I've been using a cp argument in the scripts to move the last daily sync to a backup that rsync won't touch first, before doing the sync. So there's the 'live' copy the sync will update, and a rolling one day old copy. The -f switch overwrites the daily-old without prompting. (eg automated with cron) I do the same with a weekly, monthly or any other requirement. Just run the "monthly" script once a month, which cp's the current to a separate "month" copy. Arguments in the cp can point the given archival copy anywhere, off site, on a removable device etc. Now that you all got me thinking, I should just use a separate rsync for weekly, monthly etc as needed, rather than copying to an archive. It'll use a lot less overhead and I can dump the 'wait' I have to put between the cp and rsync to insure the cp completes before the rsync changes the source. Duh, duh and duh. The only reason to use a cp would be to place the archive on a windows system, where rsync and ssh aren't welcome. Thankfully all the backup solutions I've deployed are on Linux. Funny how the obvious can sit there unseen, right in front of you basically indefinitely, until Chad writes an article about it. =D Problem is the cp also just plain works(tm). A heck of a lot more inefficient...

pgit
pgit

Since I'd given up on unison I was unaware of 'portable unison.' Looks like the solution to those pesky version errors. Thanks for the tip... as usual. ;)

lastchip
lastchip

Thank you so much for this very comprehensive reply regarding certificate protection. The idea of creating a dedicate backup user is superb and something I hadn't considered. That along with severely restricting permissions is a path well worth taking and something as I progress this, will build into the overall process. In essence, although I remain sceptical about all security issues, both are Debian computers, which remain (in my view) really solid machines that I don't have any major concerns about. Thanks again for taking the time to reply. It's most appreciated.

apotheon
apotheon

I'm quite pleased that you found my response so helpful. A grateful reply like yours makes my day.

lastchip
lastchip

Thank you so much for that personalised reply. I think the second sentence in your first paragraph addressed the concern I had, which it seems was unfounded. My concern, was that somehow, a malicious user could use the automated process to gain access to the server; something that I've jealously guarded against. In terms of a security risk from within - there is none, as it's a family operation and no one from outside has access. As I think I may have said before, you have a wonderful ability to take complex subjects and break them down to an understandable level (even for me :-)) And yes, your post made perfect sense. Many thanks indeed. I shall progress along this path further.

Editor's Picks