Sunday, December 31, 2017

These three things could improve the Linux development experience dramatically, #2 will surprise you

The development experience on a modern Linux system is fairly good, however there are several strange things, mostly due to legacy things no longer relevant, that cause weird bugs, hassles and other problems. Here are three suggestions for improvement:

1. Get rid of global state

There is a surprisingly large amount of global (mutable) state everywhere. There are also many places where said global state is altered in secret. As an example let's look at pkg-config files. If you have installed some package in a temporary location and request its linker flags with pkg-config --libs foo, you get out something like this:

-L/opt/lib -lfoo

The semantic meaning of these flags is "link against that is in /opt/lib". But that is not what these flags do. What they actually mean is "add /opt/lib to the global link library search path, then search for foo in all search paths". This has two problems. First of all, the linker might, or might not, use the library file in /opt/lib. Depending on other linker flags, it might find it somewhere else. But the bigger problem is that the -L option remains in effect after this. Any library search later might pick up libraries in /opt/lib that it should not have. Most of the time things work. Every now and then they break. This is what happens when you fiddle with global state.

The fix to this is fairly simple and requires only changing the pkg-config file generator so it outputs the following for --libs foo:


2. Get rid of -lm, -pthread et al

Back when C was first created, libc had very little functionality in it. Because of reasons, new functionality was added it went to its own library that you could then enable with a linker flag. Examples include -lm to add the math library and -ldl to get dlopen and friends. Similarly when threads appeared, each compiler had its own way of enabling them, and eventually any compiler not using -pthread died out.

If you look at the compiler flags in most projects there are a ton of gymnastics for adding all these flags not only to compiler flags but also to things like .pc files. And then there is code to take these flags out again when e.g. compiling on Visual Studio. And don't even get me started on related things like ltdl.

All of this is just pointless busywork. There is no reason all these could not be in libc proper and available and used always. It is unlikely that math libraries or threads are going to go away any time soon. In fact this has already been done in pretty much any library that is not glibc. VS has these by default, as does OSX, the BSDs and even alternative Linux libcs. The good thing is that Glibc maintainers are already in the process of doing this transition. Soon all of this pointless flag juggling will go away.

3. Get rid of 70s memory optimizations

Let's assume you are building an executable and that your project has two internal helper libraries. First you do this:

gcc -o myexe myexe.o lib1.a lib2.a

This gives you a linker error due to lib2 missing some symbols that are in lib1. To fix this you try:

gcc -o myexe myexe.o lib2.a lib1.a

But now you get missing symbols in lib1. The helper libraries have a circular dependency so you need to do this:

gcc -o myexe myexe.o lib1.a lib2.a lib1.a

Yes, you do need to define lib1 twice. The reason for this lies in the fact that in the 70s memory was limited. The linker goes through the libraries one by one. When it process a static library, it copies all symbols that are listed as missing and then throws away the rest. Thus if lib2 requires any symbol that myexe.o did not refer to, tough luck, all those symbols are gone. The only way to access them is to add lib1 to the linker line and have it processed in full for a second time.

This simple issue can be fixed by hand but things get more complicated if the come from external dependencies. The correct fix for this would be to change the linker to behave roughly like this:
  • Go through the entire linker line and find all libraries.
  • Look which point to same physical files and deduplicate them
  • Wrap all of these in a single -Wl,--start-group -Wl,--end-group
  • Do symbol lookup once in a global context
This is a fair bit of work and may cause some breakage. On the other hand we do know that this works because many linkers already do this, for example Visual Studio and LLVM's new lld linker.