Saturday, June 21, 2008


I was recently exploring alternate disk-to-disk backup strategies when I ran across rdiff-backup. I liked what I saw, so I gave it a try. Quite frankly, it was awesome. It worked out-of-the-box (it even comes with openSUSE making it easy for me to install) and worked very well with no surprises.

rdiff-backup makes good use of librsync to only tranfer the changed portions of files, and stores the most-current backup tree otherwise unchanged, storing previous backups as reverse deltas against the current tree. This has several advantages for me:

  1. Huge data sets which change infrequently or in only minor ways can be backed up much more quickly.
  2. The resulting tree is easy to search, restore from, etc.. provided you are looking for the "most recent" data. You get to use all of the tools that you use all day every day. You don't have to learn a new tool to get at your data, if the most recent backup will suffice.
  3. Even getting at older data is not much harder, but you do have to use rdiff-backup. Unfortunately, neither rdiff-backup -h not --help work, but the manpage is pretty good, and the "how do I restore stuff" sections are straightforward and don't seem to contain any gremlins. A simple rdiff-backup --restore-as-of 10D /backups/home/user/data /home/user/data.10daysago will suffice.
  4. Storing reverse deltas is a huge bonus as most often I'm interested in the most-recent data, and my interest drops off sharply the further back you go. This has the added advantage (versus forward deltas) that you never need to perform more than one full backup, and each backup is essentially only what has changed since the last backup, making the backups FAST and very space efficient.

The only real difference that I can see between the tree that got backed up and the resulting tree is the addition of an 'rdiff-backup-data' directory, which stores a bunch of rdiff-backup meta-data and the reverse deltas (called 'incrementals') relating to previous backups.

I managed to get rdiff-backup up and running on 5 machines in 5-10 minutes each. As far as a productivity win goes, this is a big one. I've never had anything work so well out of the box and with so little fuss.

No comments: