Thursday, March 16, 2017

Is static linking the solution to all of our problems?

Almost all programming languages designed in the last couple of years have a strong emphasis on static linking. Their approach to dependencies is to have them all in source which is compiled for each project separately. This provides many benefits, such as binaries that can be deployed everywhere and not needing to have or maintain a stable ABI in the language. Since everything is always recompiled and linked from scratch (apart from the standard library), ABI is not an issue.

The proponents of static linking often claim that shared libraries are unnecessary. Recompiling is fast and disks are big, thus it makes more sense to link statically than define and maintain ABI for shared libraries, which is a whole lot of ungrateful and hard work.

To see if this is the case, let's do an approximation experiment.

Enter the Dost!

Let's assume a new programming language called Dost. This language is special in that it provides code that is just as performant as the equivalent C code and takes the same amount of space (which is no small feat). It has every functionality anyone would ever need, does not require a garbage collector and whose syntax is loved by all. The only thing it does not do is dynamic linking. Let us further imagine that, by magic, all open source projects in the world get rewritten in Dost overnight. How will this affect a typical Linux distro?

Take for example the executables in /usr/bin. They are all implemented in Dost, and thus are linked statically. They are probably a bit larger than their original C versions which were linked dynamically. But by how much? How would we find out?

Science to the rescue

Getting a rough ballpark estimate is simple. Running ldd /usr/bin/executable gives a list of all libraries the given executable links against. If it were linked statically, the executable would have a duplicate copy of all these libraries. Said in another way, each executable grows by the size of its dependencies. Then it is a matter of writing a script that goes through all the executables, looks up their dependencies, removes language standard libraries (libc, stdlibc++, a few others) and adds up how much extra space these duplicated libraries would take.

The script to do this can be downloaded from this Github repo. Feel free to run it on your own machines to verify the results.

Measurement results

Running that script on a Raspberry Pi with Rasbian used for running an IRC client and random compile tests says that statically linked binaries would take an extra 4 gigabytes of space.

Yes, really.

Four gigabytes is more space than many people have on their Raspi SD card. Wasting all that on duplicates of the exact same data does not seem like the best use of those bits. The original shared libraries take only about 5% of this, static linking expands them 20 fold. Running the measurement script on a VirtualBox Ubuntu install says that on that machine the duplicates would take over 10 gigabytes. You can fit an entire Ubuntu install in that space. Twice. Even if this were not in issue for disk space, it would be catastrophic for instruction caches.

A counterargument people often make is that static linking is more efficient than dynamic linking because the linker can throw away those parts of dependencies that are not used. If we assume that the linker did this perfectly, executables would need to use only 5% of the code in their dependencies for static linking to take less space than dynamic linking. This seems unlikely to be the case in practice.

In conclusion

Static linking is great for many use cases. These include embedded software, firmwares and end user applications. If your use case is running a single application in a container or VM, static linking is a great solution that simplifies deployment and increases performance.

On the other hand claiming that a systems programming language that does not provide a stable ABI and shared libraries can be used to build the entire userland of a Linux distribution is delusional. 

8 comments:

  1. There's a flaw in your methodology: statically linking a library drops any object that isn't actually used. For a simple example, consider a program linked against libc.so.6 and uses only, say, gettimeofday() ; now link it against the static libc, it will not grow by the (massive) size of the libc, but by the size of the .o that contains the gettimeofday function (plus the size of the .o that contain the functions it uses). In the case of gettimeofday, it would be gettimeofday.o, localtime.o, tzset.o and lowlevellock.o, if I'm not mistaken.

    ReplyDelete
    Replies
    1. Quoting myself from the article above in case you did not read all of it before commenting:

      A counterargument people often make is that static linking is more efficient than dynamic linking because the linker can throw away those parts of dependencies that are not used. If we assume that the linker did this perfectly, executables would need to use only 5% of the code in their dependencies for static linking to take less space than dynamic linking. This seems unlikely to be the case in practice.

      Delete
    2. I was wondering about this, too. Tried to compile a couple of random programs from /usr/bin as static, turned out that it's far more difficult than expected. Either dh_configure eats the static flag, there are dependencies missing that are handled by dynamic linker or something else... and looks like it's different for each package.

      Delete
    3. Unfortunately this blog post is just making assumptions without any empirical evidence. Glibc does not work well with static linking. A better starting point would be a distro that uses 'Musl' by default, for instance 'Alpine Linux'. You still need to recompile stuff, but it's easier with the native libc.

      I agree that having everything linked statically system wide is a bad idea, but it might be worth the trouble for some small set of applications. For example my 'busybox' & utility initramfs image is 95% smaller when compiled and linked statically against musl (compared to glibc). Haskell and Rust programs apparently become significantly smaller when (statically) linked with musl. I benchmarked hundreds of concurrent shell+screen+irssi sessions on a vps environment, simulating a irc shell server. The memory footprint of the statically linked software (musl+screen+dash+irssi) was 50% lower, 8 vs 16 GB (vs glibc+screen+bash+irssi). Overall, the size difference between dynamic/static libc is much smaller when using musl, which appaears to be designed for static linking unlike glibc.

      Delete
  2. Don't forget the bandwidth difference when an openssl or libc upgrade means you have to redownload every binary that ever linked against such a core library... Static linking is great for Google's internal systems, with several tens of thousands of machines just dedicated to recompiling code and 10Gbps ethernet to every box... not so much for normal humans!

    ReplyDelete
  3. You say your scenario does not allow dynamic linking, but then you subtract the size of language stdlibs. But why is that possible, if they're not dynamically linked? What other mechanism does the language use to make those symbols available in this scenario?

    ReplyDelete
    Replies
    1. In many languages the standard library is provided as a shared library but everything else is statically linked. I removed stdlibs in this measurement to mimic that behaviour.

      Delete