Nibble Stew: November 2019

Monday, November 25, 2019

Process invocation will forever be broken

Invoking new processes is, at its core, a straightforward operation. Pretty much everything you need to know to understand it can be seen in the main declaration of the helloworld program:

#include<stdio.h>

int main(int argc, char **argv) {
printf("Hello, world.\n");
return 0;
}

The only (direct) information passed to the program is an array of strings containing its (command line) arguments. Thus it seems like an obvious conclusion that there is a corresponding function that takes an executable to run and an array of strings with the arguments. This turns out to be the case, and it is what the exec family of functions do. An example would be execve.

This function only exists on posixy operating systems, it is not available on Windows. The native way to start processes on Windows is the CreateProcess function. It does not take an array of strings, instead it takes a string:

BOOL CreateProcessA(
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
...

The operating system then internally splits the string into individual components using an algorithm that is not at all simple or understandable and whose details most people don't even know.

Side note: why does Windows behave this way?

I don't know for sure. But we can formulate a reasonable theory by looking in the past. Before Windows existed there was DOS, and it also had a way of invoking processes. This was done by using interrupts, in this case function 4bh in interrupt 21h. Browsing through online documentation we can find a relevant snippet:

Action: Loads a program for execution under the control of an existing program. By means of altering the INT 22h to 24h vectors, the calling prograrn [sic] can ensure that, on termination of the called program, control returns to itself.

On entry: AH = 4Bh

AL = 0: Load and execute a program

AL = 3: Load an overlay

DS.DX = segment:offset of the ASCIIZ pathname

ES:BX = Segment:offset of the parameter block

Parameter block bytes:

0-1: Segment pointer to envimmnemnt [sic] block

2-3: Offset of command tail

4-5: Segment of command tail

Here we see that the command is split in the same way as in the corresponding Win32 API call, into ta command to execute and a single string that contains the arguments (the command tail, though some sources say that this should be the full command line). This interrupt handler could take an array of strings instead, but does not. Most likely this is because it was the easiest thing to implement in real mode x86 assembly.

When Windows 1.0 appeared, its coders probably either used the DOS calls directly or copied the same code inside Windows' code base for simplicity and backwards compatibility. When the Win32 API was created they probably did the exact same thing. After all, you need the single string version for backwards compatibility anyway, so just copying the old behaviour is the fast and simple thing to do.

Why is this behaviour bad?

There are two main use cases for invoking processes: human invocations and programmatic invocations. The former happens when human beings type shell commands and pipelines interactively. The latter happens when programs invoke other programs. For the former case a string is the natural representation for the command, but this is not the case for the latter. The native representation there is an array of strings, especially for cross platform code because string splitting rules are different on different platforms. Implementing shell-based process invocation on top of an interface that takes an array of strings is straightforward, but the opposite is not.

Often command lines are not invoked directly but are instead passed from one program to another, stored to files, passed over networks and so on. It is not uncommon to pass a full command line as a command line argument to a different "wrapper" command and so on. An array of string is trivial to pass through arbitrarily deep and nested scenarios without data loss. Plain strings not so much. Many, many, many programs do command string splitting completely wrong. They might split it on spaces because it worksforme on this machine and implementing a full string splitter is a lot of work (thousands of lines of very tricky C at the very least). Some programs don't quote their outputs properly. Some don't unquote their inputs properly. Some do quoting unreliably. Sometimes you need to know in advance how many layers of unquoting your string will go through in advance so you can prequote it sufficiently beforehand (because you can't fix any of the intermediate blobs). Basically every time you pass commands as strings between systems, you get a parsing/quoting problem and a possibility for shell code injection. At the very least the string should carry with it information on whether it is a unix shell command line or a cmd.exe command line. But it doesn't, and can't.

Because of this almost all applications that deal with command invocation kick the can down the road and use strings rather than arrays, even though the latter is the "correct" solution. For example this is what the Ninja build system does. If you go through the rationale for this it is actually understandable and makes sense. The sad downside is that everyone using Ninja (or any such tool) has to do command quoting and parsing manually and then ninja-quote their quoted command lines.

This is the crux of the problem. Because process invocation is broken on Windows, every single program that deals with cross platform command invocation has to deal with commands as strings rather than an array of strings. This leads to every program using commands as strings, because that is the easy and compatible thing to do (not to mention it gives you the opportunity to close bugs with "your quoting is wrong, wontfix"). This leads to a weird kind of quantum entanglement where having things broken on one platform breaks things on a completely unrelated platform.

Can this be fixed?

Conceptually the fix is simple: add a new function, say, CreateProcessCmdArray to Win32 API. It is identical to plain CreateProcess except that it takes an array of strings rather than a shell command string. The latter can be implemented by running Windows' internal string splitter algorithm and calling the former. Seems doable, and with perfect backwards compatibility even? Sadly, there is a hitch.

It has been brought to my attention via unofficial channels [1] that this will never happen. The people at Microsoft who manage the Win32 API have decreed this part of the API frozen. No new functionality will ever be added to it. The future of Windows is WinRT or UWP or whatever it is called this week.

UWP is conceptually similar to Apple's iOS application bundles. There is only one process which is fully isolated from the rest of the system. Any functionality that need process isolation (and not just threads) must be put in its own "service" that the app can then communicate with using RPC. This turned out to be a stupid limitation for a desktop OS with hundreds of thousands of preexisting apps, because it would require every Win32 app using multiple processes to be rewritten to fit this new model. Eventually Microsoft caved under app vendor pressure and added the functionality to invoke processes into UWP (with limitations though). At this point they had a chance to do a proper from-scratch redesign for process invocation with the full wealth of knowledge we have obtained since the original design was written around 1982 or so. So can you guess whether they:

Created a proper process invocation function that takes an array of strings?
Exposed CreateProcess unaltered to UWP apps?

You guessed correctly.

Bonus chapter: msvcrt's execve functions

Some of you might have thought waitaminute, the Visual Studio C runtime does ship with functions that take string arrays so this entire blog post is pointless whining. This is true, it does provide said functions. Here is a pseudo-Python implementation for one of them. It is left as an exercise to the reader to determine why it does not help with this particular problem:

def spawn(cmd_array):

cmd_string = ' '.join(cmd_array)

CreateProcess(..., cmd_string, ...)

[1] That is to say, everything from here on may be completely wrong. Caveat lector. Do not quote me on this.

Tuesday, November 19, 2019

Some intricacies of ABI stability

There is a big discussion ongoing in the C++ world about ABI stability. People want to make a release of the standard that does a big ABI break, so a lot of old cruft can be removed and made better. This is a big and arduous task, which has a lot of "fun" and interesting edge, corner and hypercorner cases. It might be interesting to look at some of the lesser known ones (this post is not exhaustive, not by a long shot). All information here is specific to Linux, but other OSs should be roughly similar.

The first surprising thing to note is that nobody really cares about ABI stability. Even the people who defend stable ABIs in the committee do not care about ABI stability as such. What they do care about is that existing programs keep on working. A stable ABI is just a tool in making that happen. For many problems it is seemingly the only tool. Nevertheless, ABI stability is not the end goal. If the same outcome can be achieved via some other mechanism, then it can be used instead. Thinking about this for a while leads us to the following idea:

Since "C++ness" is just linking against libstdc++.so, could we not create a new one, say libstdc++2.so, that has a completely different ABI (and even API), build new apps against that and keep the old one around for running old apps?

The answer to this questions turns out to be yes. Even better, you can already do this today on any recent Debian based distribution (and probably most other distros too, but I have not tested). By default the Clang C++ compiler shipped by the distros uses the GNU C++ standard library. However you can install the libc++ stdlib via system packages and use it with the -stdlib=libc++ command line argument. If you go even deeper, you find that the GNU standard library's name is libstdc++.so.6, meaning that it has already had five ABI breaking updates.

So … problem solved then? No, not really.

Problem #1: the ABI boundary

Suppose you have a shared library built against the old ABI that exports a function that looks like this:

void do_something(const std::unordered_map<int, int> &m);

If you build code with the new ABI and call this function, the bit representation of the unordered map causes problems. The caller has a pointer to a bunch of bits in the new representation whereas the callee expects bits in the old representation. This code compiles and links but will invoke UB at runtime when called and, at best, crash your app.

Problem #2: the hidden symbols

This one is a bit complicated and needs some background information. Suppose we have a shared library foo that is implemented in C++ but exposes a plain C API. Internally it makes calls to the C++ standard library. We also have a main program that uses said library. So far, so good, everything works.

Let's add a second shared library called bar that also implemented in C++ and exposes a C API. We can link the main app against both these libraries and call them and everything works.

Now comes the twist. Let's compile the bar library against a new C++ ABI. The result looks like this:

A project mimicing this setup can be obtained from this Github repo. In it the abi1 and abi2 libraries both export a function with the same name that returns an int that is either 1 or 2. Libraries foo and bar check the return value and print a message saying whether they got the value they were expecting. It should be reiterated that the use of the abi libraries is fully internal. Nothing about them leaks to the exposed interface. But when we compile and run the program that calls both libraries, we get the following output snippet:

Foo invoked the correct ABI function.
Bar invoked the wrong ABI function.

What has happened is that both libraries invoked the function from abi1, which means that in the real world bar would have crashed in the same way as in problem #1. Alternatively both libraries could have called abi2, which would have broken foo. Determining when this happens is left as an exercise to the reader.

The reason this happens is that the functions in abi1 and abi2 have the same mangled name and the fact that symbol lookup is global to a process. Once any given name is determined, all usages anywhere in the same process will point to the same entity. This will happen even for non-weak symbols.

Can this be solved?

As far as I know, there is no known real-world solution to this problem that would scale to a full operating system (i.e. all of Debian, FreeBSD or the like). If there are any university professors reading this needing problems for your grad students, this could be one of them. The problem itself is fairly simple to formulate: make it possible to run two different, ABI incompatible C++ standard libraries within one process. The solution will probably require changes in the compiler, linker and runtime loader. For example, you might extend symbol resolution rules so that they are not global, but instead symbols from, say library bar would first be looked up in its direct descendents (in this case only abi2) and only after that in other parts of the tree.

To get you started, here is one potential solution I came up with while writing this post. I have no idea if it actually works, but I could not come up with an obvious thing that would break. I sadly don't have the time or know-how to implement this, but hopefully someone else has.

Let's start by defining that the new ABI is tied to C++23 for simplicity. That is, code compiled with -std=c++23 uses the new ABI and links against libstdc++.so.7, whereas older standard versions use the old ABI. Then we take the Itanium ABI specification and change it so that all mangled names start with, say, _^ rather than _Z as currently. Now we are done. The different ABIs mangle to different names and thus can coexist inside the same process without problems. One would probably need to do some magic inside the standard library implementations so they don't trample on each other.

The only problem this does not solve is calling a shared library with a different ABI. This can be worked around by writing small wrapper functions that expose an internal "C-like" interface and can call external functions directly. These can be linked inside the same library without problems because the two standard libraries can be linked in the same shared library just fine. There is a bit of a performance and maintenance penalty during the transition, but it will go away once all code is rebuilt with the new ABI.

Even with this, the transition is not a light weight operation. But if you plan properly ahead and do the switch, say, once every two standard releases (six years), it should be doable.

Monday, November 18, 2019

What is -pipe and should you use it?

Every now and then you see people using the -pipe compiler argument. This is particularly common on vintage handwritten makefiles. Meson even uses the argument by default, but what does it actually do? GCC manpages say the following:

-pipe
Use pipes rather than temporary files for communication
between the various stages of compilation. This fails
to work on some systems where the assembler is unable to
read from a pipe; but the GNU assembler has no trouble.

So presumably this is a compile speed optimization. What sort of an improvement does it actually provide? I tested this by compiling LLVM on my desktop machine both with and without the -pipe command line argument. Without it I got the following time:

ninja 14770,75s user 799,50s system 575% cpu 45:04,98 total

Adding the argument produced the following timing:

ninja 14874,41s user 791,95s system 584% cpu 44:41,22 total

This is an improvement of less than one percent. Given that I was using my machine for other things at the same time, the difference is most likely statistically insignificant.

This argument may have been needed in the ye olden times of supporting tens of broken commercial unixes. Nowadays the only platform where this might make a difference is Windows, given that its file system is a lot slower than Linux's. But is its pipe implementation any faster? I don't know, and I'll let other people measure that.

The "hindsight is perfect" design lesson to be learned

Looking at this now, it is fairly easy to see that this command line option should not exist. Punting the responsibility of knowing whether files or pipes are faster (or even work) on any given platform to the user is poor usability. Most people don't know that and performance characteristics of operating systems change over time. Instead this should be handled inside the compiler with logic roughly like the following:

if(assembler_supports_pipes(...) &&

pipes_are_faster_on_this_platform(...)) {

communicate_with_pipes();

} else {

communicate_with_files();

}