Sunday, July 19, 2020

The ABI stability matryoshka

In the C++ on Sea conference last week Herb Sutter had a talk about replacing an established thingy with a new version. Obviously the case of ABI stability came up and he answered with the following (the video is not available so this quote is only approximate, though there is an earlier version of the talk viewable here):
Backwards compatibility is important so that old code can keep working. When upgrading to a new system it would be great if you could voluntarily opt into using the old ABI. So far no-one has managed to do this but if we could crack this particular nut, making major ABI changes would become a lot easier.
Let's try to do exactly that. We'll start with a second (unattributed) quote that often gets thrown around in ABI stability discussions:
Programming language specifications do not even acknowledge the existance of an ABI. It is wholly a distro/tool vendor problem and they should be the ones to solve it.
Going from this we can find out the actual underlying problem, which is running programs of two different ABI versions at the same time on the same OS. The simple solution of rebuilding the world from scratch does not work. It could be done for the base platform but, due to business and other reasons, you can't enforce a rebuild of all user applications (and those users, lest we forget, pay a very hefty amount of money to OS vendors for the platform their apps run on). Mixing new and old ABI apps is fragile and might fail due to the weirdest of reasons no matter how careful you are. The problem is even more difficult in "rolling release" cases where you can't easily rebuild the entire world in one go such as Debian unstable, but we'll ignore that case for now.

It turns out that there already exists a solution for doing exactly this: Flatpak. Its entire reason of existance is to run binaries with different ABI (and even API) on a given Linux platform while making it appear as if it was running on the actual host. There are other ways of achieving the same, such as Docker or systemd-nspawn, but they aim to isolate the two things from each other rather than unifying them. Thus a potential solution to the problem is that whenever an OS breaks ABI compatibility in a major way (which should be rare, like once every few years) it should provide the old ABI version of itself as a Flatpak and run legacy applications that way. In box diagram architecture format it would look like this:


The main downside of this is that the OS vendor's QA department has twice as much work as they need to validate both ABI versions of the product. There is also probably a fair bit of work work to make the two version work together seamlessly, but once you have that you can do all sorts of cool things, such as building the outer version with stdlibc++'s debug mode enabled. Normally you can't do that easily as it massively breaks ABI, but now it is easy. You can also build the host with address or memory sanitizer enabled for extra security (or just debugging).

If you add something like btrfs subvolumes and snapshotting and you can do all sorts of cool things. Suppose you have a very simple system with a web server and a custom backend application that you want to upgrade to the new ABI version. It could go something like this:

  1. Create new btrfs subvolume, install new version to that and set up the current install as the inner "Flatpak" host.
  2. Copy all core system settings to the outer install.
  3. Switch the main subvolume to the new install, reboot.
  4. Now the new ABI environment is running and usable but all apps still run inside the old version.
  5. Copy web server configuration to the outer OS and disable the inner one. This is easy because the all system software has the exact same version in both OS installs. Reboot.
  6. Port the business app to run on the new ABI version. Move the stored data and configuration to the outer version. The easiest way to do this is to have all this data on its own btrfs subvolume which is easy to switch over.
  7. Reboot. Done. Now your app has been migrated incrementally to the new ABI without intermediate breakage (modulo bugs).
The best part is that if you won't or can't upgrade your app to the new ABI, you can stop at step #5 and keep running the old ABI code until the whole OS goes out of support. The earlier ABI install will remain as is, can be updated with new RPMs and so on. Crucially this will not block others from switching to the new ABI at their leisure. Which is exactly what everyone wanted to achieve in the first place.

No comments:

Post a Comment