torstai 16. maaliskuuta 2017

Is static linking the solution to all of our problems?

Almost all programming languages designed in the last couple of years have a strong emphasis on static linking. Their approach to dependencies is to have them all in source which is compiled for each project separately. This provides many benefits, such as binaries that can be deployed everywhere and not needing to have or maintain a stable ABI in the language. Since everything is always recompiled and linked from scratch (apart from the standard library), ABI is not an issue.

The proponents of static linking often claim that shared libraries are unnecessary. Recompiling is fast and disks are big, thus it makes more sense to link statically than define and maintain ABI for shared libraries, which is a whole lot of ungrateful and hard work.

To see if this is the case, let's do an approximation experiment.

Enter the Dost!

Let's assume a new programming language called Dost. This language is special in that it provides code that is just as performant as the equivalent C code and takes the same amount of space (which is no small feat). It has every functionality anyone would ever need, does not require a garbage collector and whose syntax is loved by all. The only thing it does not do is dynamic linking. Let us further imagine that, by magic, all open source projects in the world get rewritten in Dost overnight. How will this affect a typical Linux distro?

Take for example the executables in /usr/bin. They are all implemented in Dost, and thus are linked statically. They are probably a bit larger than their original C versions which were linked dynamically. But by how much? How would we find out?

Science to the rescue

Getting a rough ballpark estimate is simple. Running ldd /usr/bin/executable gives a list of all libraries the given executable links against. If it were linked statically, the executable would have a duplicate copy of all these libraries. Said in another way, each executable grows by the size of its dependencies. Then it is a matter of writing a script that goes through all the executables, looks up their dependencies, removes language standard libraries (libc, stdlibc++, a few others) and adds up how much extra space these duplicated libraries would take.

The script to do this can be downloaded from this Github repo. Feel free to run it on your own machines to verify the results.

Measurement results

Running that script on a Raspberry Pi with Rasbian used for running an IRC client and random compile tests says that statically linked binaries would take an extra 4 gigabytes of space.

Yes, really.

Four gigabytes is more space than many people have on their Raspi SD card. Wasting all that on duplicates of the exact same data does not seem like the best use of those bits. The original shared libraries take only about 5% of this, static linking expands them 20 fold. Running the measurement script on a VirtualBox Ubuntu install says that on that machine the duplicates would take over 10 gigabytes. You can fit an entire Ubuntu install in that space. Twice. Even if this were not in issue for disk space, it would be catastrophic for instruction caches.

A counterargument people often make is that static linking is more efficient than dynamic linking because the linker can throw away those parts of dependencies that are not used. If we assume that the linker did this perfectly, executables would need to use only 5% of the code in their dependencies for static linking to take less space than dynamic linking. This seems unlikely to be the case in practice.

In conclusion

Static linking is great for many use cases. These include embedded software, firmwares and end user applications. If your use case is running a single application in a container or VM, static linking is a great solution that simplifies deployment and increases performance.

On the other hand claiming that a systems programming language that does not provide a stable ABI and shared libraries can be used to build the entire userland of a Linux distribution is delusional. 

maanantai 13. maaliskuuta 2017

Dependencies and unity builds with Meson

Prebuilt dependencies provided by the distro are awesome. You just install them and start working on your own project. Unfortunately there are many cases where distro packages are either not available or too old. This is especially common when compiling on non-Linux platforms such as Windows but happens on Linux as well when using jhbuild, Flatpak, Snappy or one of the many other aggregators. Dependencies obtained via Meson subprojects also fall into this category.

Unity builds

There is a surprisingly simple way of compiling projects faster: unity builds. The basic principle is that if your target has files foo.cpp, bar.cpp and baz.cpp, you don't compile them. Instead you generate a file called target-unity.cpp, whose contents are this:

#include<foo.cpp>
#include<bar.cpp>
#include<baz.cpp>

Then you compile just this file. This makes the compilation go faster. A lot faster. As an example converting Qt Creator to compile as a Unity build made it compile 90% faster. Counterintuitively it is even faster to compile a unity file with one core than to use four cores to build the files individually. If this is the first time you have encountered unity builds, this probably feels like a sham, something that just can't be possible. Not only is this possible, but unity builds are used in production in many places, especially in game development. As a different kind of example SQLite ships as a single "amalgamation file", which is the same thing as a unity build. Unity builds also act as a sort of poor man's link time optimization, which works even on compilers that do not support LTO natively.

Unity builds have their own limitations and problems. Some of them are discussed at the end of this article.

Dependencies

Meson has had unity build support available for a long time. The unfortunate downside is that incremental builds take a lot more time. This makes the edit-compile-debug cycle slower which is annoying. There are now two merge requests outstanding (number one, number two) that aim to make the user experience better.

With these changes you can tell Meson to unity build only subprojects, not the main project. This means that all your deps build really fast but the master project is built incrementally. In most use cases subprojects are read only. Only the master project is edited. For most people dependencies are slower to build than projects using them, so this gives a nice productivity boost. The other merge request enables finer grained control by allowing the user to override unityness for each target separately.

Please try out the branches and write your feedback and comments to the merge requests.

Problems of unity builds

Not all projects can be built as unity builds out of the box. The most common problem is having static variables and functions with the same name in different source files. In a unity build these will clash and not compile. There are other problems of similar nature, especially for projects that do crazy things with the preprocessor. These problems are fixable but take some effort. Most open source projects probably won't compile as unity builds out of the box.

Any project that wants to provide for a unity build must, in practice, have gating CI that compiles the source as a unity build. Otherwise it will break every now and then.