Saturday, December 13, 2008

NFSv4 on openSUSE 11.0

For the past decade or so I've been using NFS inside my home network. NFS is one of those things that has a low-enough barrier to entry that it's really easy to get going, but it's opaque and esoteric enough that when things go wrong it can be a profoundly unreal experience. NFS has many weaknesses, however, among them security (there isn't any), performance (it's not as good as it should be, even if it's not BAD), and complexity (lots of ports, portmapper, choice of UDP/TCP, and so on), and something called "cache coherency" which is essentially a tradeoff between performance and correct behavior. NFSv4 goes a long, long way to fixing most of these problems. However, it brings some new behaviors and requirements which cause some headache.

First, the good: NFSv3 operates over what seems like a half-dozen or more ports. Those ports are frequently dynamically assigned, which means that you get to involve portmapper. Portmapper's job is basically to say "nfs is on X, lockd is on Y, and statd is on Z" and so on. It also complicates firewall management, and can cause all manner of headaches. NFSv4 operates over ONE PORT - 2049. Excepting gss (security: authentication and authorization) which has it's own ports, id mapping which may operate over LDAP or NIS or whatever, the core NFSv4 protocol operates entirely over one port and uses TCP. That's awesome!

More good: NFSv4 has better cache coherency, locking, a better transport, and so on.

The bad: The exports file format has changed very slightly, and *how* filesystems are exported has changed a bunch. NFSv4 exports a single "root" filesystem under which all others may or may not show up. This root filesystem is identified in /etc/exports with "fsid=0" (or fsid=root). To save time, let's assume that you are going to place this root on your server at /exports. On your server, "mkdir /exports". Despite the documentation suggesting that symlinks could be used to expose filesystems that are not rooted in /exports, that's not true. On linux, you really only have three choices:

  1. Move/Copy the contents to the exported directory
  2. Use bind mounts (mkdir -p /exports/pictures-of-cheese && mount -o bind /pictures-of-cheese /exports/pictures-of-cheese)
  3. Mount the filesystem directly: mount /dev/sdb1 /the-hoff

Once you've done this, you can fiddle your /etc/exports file. Mine looks like this:

/exports 192.168.100.0/24(fsid=0,insecure,no_subtree_check)
/exports/the-hoff 192.168.100.0/24(rw,nohide,insecure,no_subtree_check,root_squash) 

OK?

Set:

NFS4_SUPPORT="yes" in /etc/sysconfig/nfs

and restart portmapper and NFS.

Setting up your client to use NFSv4 is done the same way, by editing /etc/sysconfig/nfs. Do so and restart portmapper and nfs.

Mount the filesystem on the client:

mkdir /nfs
mount 192.168.100.1:/ /nfs -t nfs4

If that doesn't work, I can't help you. Note that unlike NFSv3 we are mounting the *root* filesystem. If you are used to mounting /the-hoff (hah), just use a symlink to point into /nfs/the-hoff. Make sure it shows up before you continue.

Now the fun begins. id mapping. NFSv3 using AUTH_SYS (what almost everybody was using) worked like this: ids were used to identify users and groups. That's basically it. A user "bob" with id 1000 on the client would map 100% to user "sally" on the server if sally had id 1000. Names were irrelevant.

With NFSv4 that's no longer the case. Names are important. And to make things awesome, you can't use the old behavior, and if a name doesn't match it's automatically set to the "guest" user or group, the value of which is set in /etc/idmapd.conf - let me tell you - on machines which have slightly different /etc/passwd that's a fun one. It's an argument for LDAP or some other centralized directory, but it's still a pain.

So now I've got NFSv4 up and running. Why doesn't anything work? The default debug level of idmapd.conf is 0. I set it to 999 and got just a bit more noise in the logs, helping me to figure all of this out.

Otherwise it seems to be working OK but not great. It's working better than CIFS on 2.6.25.18 - while testing rdiff-backup (to a CIFS mount) last night I got the kernel to oops 4 times - 4 reboots in 15 minutes does not make a stable filesystem. I tried 2.6.27.something and it worked much better, but given the long-standing locking issues with CIFS I'm not about to switch to it. Don't believe me? Google for 'cifs' and 'sqlite'. Remember, everybody and their brother is now using sqlite, firefox and xbmc two examples I can think of right away.

And now the post is done.

No comments: