Tuesday, May 4, 2021

"Should we break the ABI" is the wrong question

The ongoing battle on breaking C++'s ABI seems to be gathering steam again. In a nutshell there are two sets of people in this debate. The first set wants to break ABI to gain performance and get rid of bugs, whereas the second set of people want to preserve the ABI to keep old programs working. Both sides have dug their heels in the ground and refuse to budge.

However debating whether the ABI should be broken or not is not really the issue. A more productive question would be "if we do the break, how do we keep both the old and new systems working at the same time during a transition period". That is the real issue. If you can create a good solution to this problem, then the original problem goes away because both sides get what they want. In other words, what you want to achieve is to be able to run a command like this:

prog_using_old_abi | prog_using_new_abi

and have it work transparently and reliably. It turns out that this is already possible. In fact many (most?) people are reading this blog post on a computer that already does exactly that.

Supporting multiple ABIs at the same time

On Debian-based systems this is called multi-arch support. It allows you to, for example, run 32 and 64 bit apps transparently on the same machine at the same time. IIRC it was even possible to upgrade a 32 bit OS install to 64 bits by first enabling the 64 bit arch and then disabling the 32 bit one. The basic gist of multiarch is that rather than installing libraries to /usr/lib, they get installed to something like /usr/lib/x86_64. The kernel and userspace tools know how to handle these two different binary types based on the metadata in ELF executables.

Given this we could define an entirely new processor type, let's call it x86_65, and add that as a new architecture. Since there is no existing code we can do arbitrary changes to the ABI and nothing breaks. Once that is done we can create the full toolchain, rebuild all OS packages with the new toolchain and install them. The old libraries remain and can be used to run all the old programs that could not be recompiled (for whatever reason). 

Eventually the old version can be thrown away. Things like old Steam games could still be run via something like Flatpak. Major corporations that have old programs they don't want to touch are the outstanding problem case. This just means that Red Hat and Suse can sell them extra-long term support for the old ABI userland + toolchain for an extra expensive price. This way those organizations who don't want to get with the times are the ones who end up paying for the stability and in return distro vendors get more money. This is good.

Simpler ABI tagging

Defining a whole virtual CPU just for this seems a bit heavy handed and would probably encounter a lot of resistance. It would be a lot smoother if there were a simpler way to version this ABI change. It turns out that there is. If you read the ELF specification, you will find that it has two fields for ABI versioning (and Wikipedia claims that the first one is usually left at zero). Using these fields the multiarch specification could be expanded to be a "multi-abi" spec. It would require a fair bit of work in various system components like the linker, RPM, Apt and the like to ensure the two different ABIs are never loaded in the same process. As an bonus you could do ABI breaking changes to libc at the same time (such as redefining intmax_t) There does not seem to be any immediate showstoppers, though, and the basic feasibility has already been proven by multiarch.

Obligatory Rust comparison

Rust does not have a stable ABI, in fact it is the exact opposite. Any compiler version is allowed to break the ABI in any way it wants to. This has been a constant source of friction for a long time and many people have tried to make Rust commit to some sort of a stable ABI. No-one has been successful. It is unlikely they will commit to distro-level ABI stability in the foreseeable future, possibly ever.

However if we could agree to steady and continuous ABI transitions like these every few years then that is something that they might agree to. If this happens then the end result would be beneficial to everyone involved. Paradoxically it would also mean that by having a well established incremental way to break ABI would lead to more ABI stability overall.

1 comment:

  1. Three things to add:
    1. Debian also has an interesting hybrid architecture called x32, which combines amd64 instructions and registers with 32-bit integers and pointers.
    2. Debian multi-arch even allows you to transparently run Linux processes in a completely different architecture in Qemu, e.g. x86 on ARM.
    3. Rust is a bad example because it has the declared goal to not support shared libraries at all. Same for Go.