Wednesday, January 7, 2009

openSUSE 11.0 -> 11.1 Upgrade Issues

I recently upgraded a number of machines to openSUSE 11.1 from openSUSE 11.0. By and large the upgrade went perfectly, however, a number of issues cropped up almost immediately. The first was that I could not log in.

I use an encrypted home directory via pam_mount. The pam options for various programs got good and hosed by the upgrade, and openSUSE's response to the bug I (and others) filed was "you should always check your .rpmsave files after an upgrade". Lame. I might be experienced enough to check /etc/pam.d/* for changes but the overwhelming majority of folks aren't. If I had used the (easy) system tools (YaST) to create an encrypted home directory in 10.3 or 11.0 and then had it stop working in 11.1, and been unable to fix it, I'd have been royally cranked. My various attempts at fixing the problem also resulted in a number of bugs filed.

The next problem, and a far more serious one, was repeated filesystem corruption the same volume. Coincidence? I think not. I use ext3, typically with journal=data, and haven't had filesystem corruption of any kind since about 2000, and it was a rarity even then. Further investigation showed me that while my home directory was being mounted correctly, it wasn't being *un* mounted correctly. Furthermore, subsequent logins re-did the loopback and dm-crypt setup so I had 3 or 4 or 5 mappings, only the last of which was any good. This resulted in some lost work for me, an hour or two above and beyond what I had lost as part of the investigation. I filed another bug on that and it's yet to receive any attention after almost two weeks. Look, I know the developers are busy, but releasing a new operating system around the Holidays when lots of folks are going to be one, or taking, or recovering from vacation doesn't strike me as a great idea. Still, anybody with encrypted home directories mounted via pam_mount might be experiencing the same problems and THAT doesn't make for good PR - "openSUSE 11.1 hosed my home directory" is not a headline I would like to read, if I worked for Novell.

So I set about to determine if a newer version of pam_mount could help solve my problem, since rolling back to the older version didn't. I built newer versions of libHX and pam_mount (which has a thoroughly scary changelog, although one must remember the great complexity it must deal with) using the openSUSE Build Service (which really is a truly awesome and wonderful thing!). That didn't work (no amount of futzing could cajole it to mount my home directory), so I went ahead with the latest versions of each and sat down for a longer debugging session. Armed with diff and lots of other stuff, I basically determined that the latest version will work just fine - it correctly avoids duplicate mounts and deals better with unmounting and so on and so forth - but the trick is two-fold:

  1. You have to basically remove almost everything from /etc/security/pam_mount.conf.xml except for the <volume> elements and, of course, the outer element.
  2. You also have to fill out options which didn't need to be specified with 0.35 (or 0.47) because the values were the same as the defaults, but the defaults have changed. Those options are:
    • fskeycipher - use aes-256-cbc here as it's the (former?) default
    • fskeyhash - use md5 here as it is the former default
Again, make *sure* to remove the other elements as the defaults seem to work and the ones that are present (again, from an older version) do not.

I hope this post serves three purposes:

  1. To help some other poor schmuck out of the near-disaster that was *my* upgrade experience
  2. Hopefully to cajole Novell into giving a *bit* more thought into how others might react to losing access to their $HOME
  3. To rant (this is the biggest one).

Some updates:

The bugs I filed did get some attention! The verdict: from a clean 11.0 upgraded to 11.1 the bugs to not appear. I went so far as to reinstall the stock pam_mount version, try stuff, and back and forth. No such luck. However, since my workstation is now working again, at least I'm not so cranky. Someday if I get some ambition I'll try to find out what is going wrong here, but I got some help from Novell and that's more than I deserved!

1 comment:

Anonymous said...

In such cases, it is best to talk to the maintainers. Unless you are fond of hunting bugs yourself (there is a masochist in everybody of us!).