Rsync is a tool to create backups of local or remote files that many administrators use. Rsync is effective because it only transfers files that have changed, so if you have a directory containing 3GB of data, but only 20MB of that data is different at the destination, only that 20MB of changed data is transferred. This results in saved space, time, and bandwidth.
Backups using rsync can be taken a step further with a tool like rsnapshot, which allows you to retain multiple point-in-time backups that seamlessly rotate. For instance, with rsnapshot you can retain a week's worth of individual backups of the same data with very little overhead, in terms of disk usage. Rsnapshot accomplishes this by creating hard links of files; rather than copying the files themselves, two "files" will point to the same stored data on disk. A simple illustration may illustrate the concept better:
$ echo test >1
$ cp 1 2
$ ln 1 3
$ ls -li 1 2 3
44630027 -rw-r--r-- 2 user user 5 Mar 7 16:31 1
44630134 -rw-r--r-- 1 user user 5 Mar 7 16:31 2
44630027 -rw-r--r-- 2 user user 5 Mar 7 16:31 3
The above ls command lists the file's inodes in addition to the regular ls output. Without getting overly technical, an inode is a pointer to a block of data on a disk. Each file will have its own inode. In the example above, the file 1 has an inode of 44630027. The file is then copied to 2, which then has an inode of 44630134, and occupies its own portion of the disk. Each of these files is its own distinct block of data. When using the ln tool to create a hard link (symlinks are created with ln -s), file 3 is linked to the same inode as file 1, creating a new file pointing to the same inode, which can be seen by both 1 and 3 having the same inode number. If you then were to change file 3 by adding new data to it, file 1 would likewise change -- despite having two separate file names, they are both essentially the same file.
Because of this, the directory does not occupy 15 bytes of space, but 10.
Rsnapshot does the same thing. If you have a directory containing 3GB of data and create a new directory with a hard link, you technically have two directories containing a total of 6GB of data, but only 3GB of space is being consumed on-disk.
Using this principle, rsnapshot can have multiple snapshots of backup data over various periods of time with very little extra space cost, due to the use of hard links. The only increase in size is due to the files that differ.
Rsnapshot is a perl tool that can be downloaded from http://rsnapshot.org/. Installation is easy; you can download an RPM from the Web site or download the tarball and use the traditional ./configure; make method of installation; some distributions may package rsnapshot for easier install.
The primary configuration file is /etc/rsnapshot.conf, and it is heavily commented. Rsnapshot requires a root directory for where all snapshots are stored; the default is /.snapshots/. Depending on space availability and filesystem layout, this may not be the best place to store snapshots. It can be changed using the snapshot_root directive in the configuration file. For example:
Like rsync, rsnapshot can use SSH for remote backups. By default, this is not enabled, so you will want to uncomment the cmd_ssh directive if you wish to use it.
Defining the backup intervals in the configuration file is done with the "interval" directive. Rsnapshot requires that the smallest interval be listed first, so to keep six backups a day (four-hour interval), seven daily backups (one week), and four weekly backups (one month), specify:
interval hourly 6
interval daily 7
interval weekly 4
If the version of rsync installed on your system is version 2.5.7 or later, you will want to enable the link_dest directive. This lets rsync handle creating recursive hard links, which means you can backup every single file on your system in one pass:
At the bottom of the file are the backup definitions. For a very basic local backup, you might use:
backup /home/ localhost/
backup /etc/ localhost/
This will back up the /home and /etc directories on the local system and save them in /srv/rsnapshot/localhost/ (or wherever snapshot_root is pointing to).
Finally, edit /etc/crontab and add:
0 */4 * * * root /usr/bin/rsnapshot hourly
50 23 * * * root /usr/bin/rsnapshot daily
40 23 * * 6 root /usr/bin/rsnapshot weekly
30 23 1 * * root /usr/bin/rsnapshot monthly
This will call rsnapshot in hourly mode every four hours: at midnight, 4am, 8am, noon, 4pm, and 8pm. It will run the daily backup every day at 11:50pm. The weekly backup will be run every Saturday at 11:40pm. The monthly backup, if defined in the configuration file, will be run at 11:30pm on the first day of each month.
With all of this in place, rsnapshot will start backing up the specified directories. To give it a try without waiting for the next cron run, use:
# rsnapshot -v hourly
To see how much storage space is being used by the snapshots, use:
# rsnapshot du
In the above example, the latest hourly backup of the local /etc directory would be stored in /srv/rsnapshot/hourly.0/localhost/etc/.
Rsnapshot is a great tool. It is easy to configure, and it makes retaining easy-to-access point-in-time backups simple, without extra complexity or the overhead of wasting space by storing the same (unchanged) files over and over again. If you currently use rsync to backup files, you owe it to yourself to give rsnapshot a try.
Get the PDF version of this tip here.
Delivered each Tuesday, TechRepublic's free Linux and Open Source newsletter provides tips, articles, and other resources to help you hone your Linux skills. Automatically sign up today!
Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.