Tuesday, November 19, 2019

Some intricacies of ABI stability

There is a big discussion ongoing in the C++ world about ABI stability. People want to make a release of the standard that does a big ABI break, so a lot of old cruft can be removed and made better. This is a big and arduous task, which has a lot of "fun" and interesting edge, corner and hypercorner cases. It might be interesting to look at some of the lesser known ones (this post is not exhaustive, not by a long shot). All information here is specific to Linux, but other OSs should be roughly similar.

The first surprising thing to note is that nobody really cares about ABI stability. Even the people who defend stable ABIs in the committee do not care about ABI stability as such. What they do care about is that existing programs keep on working. A stable ABI is just a tool in making that happen. For many problems it is seemingly the only tool. Nevertheless, ABI stability is not the end goal. If the same outcome can be achieved via some other mechanism, then it can be used instead. Thinking about this for a while leads us to the following idea:
Since "C++ness" is just linking against libstdc++.so, could we not create a new one, say libstdc++2.so, that has a completely different ABI (and even API), build new apps against that and keep the old one around for running old apps?
The answer to this questions turns out to be yes. Even better, you can already do this today on any recent Debian based distribution (and probably most other distros too, but I have not tested). By default the Clang C++ compiler shipped by the distros uses the GNU C++ standard library. However you can install the libc++ stdlib via system packages and use it with the -stdlib=libc++ command line argument. If you go even deeper, you find that the GNU standard library's name is libstdc++.so.6, meaning that it has already had five ABI breaking updates.

So … problem solved then? No, not really.

Problem #1: the ABI boundary

Suppose you have a shared library built against the old ABI that exports a function that looks like this:

void do_something(const std::unordered_map<int, int> &m);

If you build code with the new ABI and call this function, the bit representation of the unordered map causes problems. The caller has a pointer to a bunch of bits in the new representation whereas the callee expects bits in the old representation. This code compiles and links but will invoke UB at runtime when called and, at best, crash your app.

Problem #2: the hidden symbols

This one is a bit complicated and needs some background information. Suppose we have a shared library foo that is implemented in C++ but exposes a plain C API. Internally it makes calls to the C++ standard library. We also have a main program that uses said library. So far, so good, everything works.

Let's add a second shared library called bar that also implemented in C++ and exposes a C API. We can link the main app against both these libraries and call them and everything works.

Now comes the twist. Let's compile the bar library against a new C++ ABI. The result looks like this:


A project mimicing this setup can be obtained from this Github repo. In it the abi1 and abi2 libraries both export a function with the same name that returns an int that is either 1 or 2. Libraries foo and bar check the return value and print a message saying whether they got the value they were expecting. It should be reiterated that the use of the abi libraries is fully internal. Nothing about them leaks to the exposed interface. But when we compile and run the program that calls both libraries, we get the following output snippet:

Foo invoked the correct ABI function.
Bar invoked the wrong ABI function.

What has happened is that both libraries invoked the function from abi1, which means that in the real world bar would have crashed in the same way as in problem #1. Alternatively both libraries could have called abi2, which would have broken foo. Determining when this happens is left as an exercise to the reader.

The reason this happens is that the functions in abi1 and abi2 have the same mangled name and the fact that symbol lookup is global to a process. Once any given name is determined, all usages anywhere in the same process will point to the same entity. This will happen even for non-weak symbols.

Can this be solved?

As far as I know, there is no known real-world solution to this problem that would scale to a full operating system (i.e. all of Debian, FreeBSD or the like). If there are any university professors reading this needing problems for your grad students, this could be one of them. The problem itself is fairly simple to formulate: make it possible to run two different, ABI incompatible C++ standard libraries within one process. The solution will probably require changes in the compiler, linker and runtime loader. For example, you might extend symbol resolution rules so that they are not global, but instead symbols from, say library bar would first be looked up in its direct descendents (in this case only abi2) and only after that in other parts of the tree.

To get you started, here is one potential solution I came up with while writing this post. I have no idea if it actually works, but I could not come up with an obvious thing that would break. I sadly don't have the time or know-how to implement this, but hopefully someone else has.

Let's start by defining that the new ABI is tied to C++23 for simplicity. That is, code compiled with -std=c++23 uses the new ABI and links against libstdc++.so.7, whereas older standard versions use the old ABI. Then we take the Itanium ABI specification and change it so that all mangled names start with, say, _^ rather than _Z as currently. Now we are done. The different ABIs mangle to different names and thus can coexist inside the same process without problems. One would probably need to do some magic inside the standard library implementations so they don't trample on each other.

The only problem this does not solve is calling a shared library with a different ABI. This can be worked around by writing small wrapper functions that expose an internal "C-like" interface and can call external functions directly. These can be linked inside the same library without problems because the two standard libraries can be linked in the same shared library just fine. There is a bit of a performance and maintenance penalty during the transition, but it will go away once all code is rebuilt with the new ABI.

Even with this, the transition is not a light weight operation. But if you plan properly ahead and do the switch, say, once every two standard releases (six years), it should be doable.

1 comment:

  1. > Once any given name is determined, all usages anywhere in the same process will point to the same entity.

    Actually, isn't this true only if your ABIs are exposed as shared libraries? I assume you are talking about the global offset table and procedure linker table here. My vague understanding is that red hat's software collections libraries' g++ distribution actually statically links libstdc++ symbols that come from newer versions of libstdc++ than the version that RHEL7 or (centos 7) comes with.

    ReplyDelete