Monday, December 8, 2008

Network Block Device + MD RAID1 = Fun

For the last few years I've been making use of drbd to provide a sort of semi-connected network raid1 as part of my overall backup and disaster recovery system. Recently, I've been experimenting with using nbd (network block device) and Linux MD raid1 (with bitmaps) to provide a similar functionality, and have some interesting findings as a result. Essentially, drbd takes some sort of storage (any seekable file) on this machine and mirrors it over the network to another machine. drbd is primarily found in HA (high-availability) environments. I've been using drbd like this: I took a pair of 80G drives and placed them in two machines: the first machine is my local fileserver, and the second machine is a workstation that is booted occasionally (about once a day). I configured drbd to use each of these drives as a mirror of the other, and set the server as primary. Then I formatted the newly-available block device (/dev/drbd0 in my case) with ext3, mounted it, and used it for rsync+hardlink-style backups (now I'm using rdiff-backup). Whenever the workstation would come up, drbd would take note and only synchronize the blocks of the underlying storage device that had changed. This was very fast, easily saturating my 100mbit network and under non-jumbo-framed (standard 1500 byte) gig-e would sustain north of 15-20 MiB/s. Not bad. However, I had a few problems with drbd:
  1. Performance
  2. The overhead for keeping track of what has changed and what hasn't was not designed for this usage scenario and is, apparently, hugely expensive. I do not have any hard numbers (unlike me) but I'd eyeball it in the 20-30% range. That's quite a bit. Sometimes it felt much worse than that!
  3. Reliability
  4. drbd is designed for use in an HA environment. However, I encountered numerous kernel crashes and other weirdnesses when it was put under heavy I/O. At one point, I had to *reboot* my file server 3 times in one day, normally the only time I reboot it is for a new kernel.
So I sat back and thought "Why not use AoE or NBD (network block device) and combine it with raid1 and bitmaps?". For starters, drbd does much more than just mirroring. It has an idea as to which mirror is the master, it can switch back and forth, it has automatic reconnect, rebuild, and so on. More than a little bit of "glue" would have to be written for me to replicate even most of the functionality that drbd provides. But I tried anyway, and largely succeeded for my needs. I use freedt a GPL'd daemontools replacement, so I wrote a run file which uses nbd-client to check to see if the server is up or not. If it is, it connects and enters a loop within which it performs a number of health checks (imperfectly) and if it detects that the server has gone away, shuts the block device back down. It also has hooks for up, pre_down and down which I use to interface with Linux MD. What I have is a largely autonomous system which, typically within 5 seconds, will note that the server is up, connect to it, add the block devices to the correct raid array (if any), and take it back out should the server disappear to disconnect. The raid array is built using raid1 and internal bitmaps which means it takes under a minute for me to synchronize 400-600MB worth of changes. As I refine the scripts, I may post them here, if anybody has any interest. I'm not looking to replace drbd, quite honestly it worked great for me for a long time, but this works and it was fun.


Mark said...

are you still using NBD+MD instead of DRBD? Would you share your scripts?

Mark said...

are you still using NBD+MD instead of DRBD? did you find problems with NBD deadlock?

Jon said...

I'm not doing it like that any more. In fact, I didn't do it like that for very long at all. The scripts have long since disappeared, but they didn't do much in any case. These days, I'd be much more inclined to give DRBD another try.