Nibble Stew: Is static linking the solution to all of our problems?

Thursday, March 16, 2017

Is static linking the solution to all of our problems?

Almost all programming languages designed in the last couple of years have a strong emphasis on static linking. Their approach to dependencies is to have them all in source which is compiled for each project separately. This provides many benefits, such as binaries that can be deployed everywhere and not needing to have or maintain a stable ABI in the language. Since everything is always recompiled and linked from scratch (apart from the standard library), ABI is not an issue.

The proponents of static linking often claim that shared libraries are unnecessary. Recompiling is fast and disks are big, thus it makes more sense to link statically than define and maintain ABI for shared libraries, which is a whole lot of ungrateful and hard work.

To see if this is the case, let's do an approximation experiment.

Enter the Dost!

Let's assume a new programming language called Dost. This language is special in that it provides code that is just as performant as the equivalent C code and takes the same amount of space (which is no small feat). It has every functionality anyone would ever need, does not require a garbage collector and whose syntax is loved by all. The only thing it does not do is dynamic linking. Let us further imagine that, by magic, all open source projects in the world get rewritten in Dost overnight. How will this affect a typical Linux distro?

Take for example the executables in /usr/bin. They are all implemented in Dost, and thus are linked statically. They are probably a bit larger than their original C versions which were linked dynamically. But by how much? How would we find out?

Science to the rescue

Getting a rough ballpark estimate is simple. Running ldd /usr/bin/executable gives a list of all libraries the given executable links against. If it were linked statically, the executable would have a duplicate copy of all these libraries. Said in another way, each executable grows by the size of its dependencies. Then it is a matter of writing a script that goes through all the executables, looks up their dependencies, removes language standard libraries (libc, stdlibc++, a few others) and adds up how much extra space these duplicated libraries would take.

The script to do this can be downloaded from this Github repo. Feel free to run it on your own machines to verify the results.

Measurement results

Running that script on a Raspberry Pi with Rasbian used for running an IRC client and random compile tests says that statically linked binaries would take an extra 4 gigabytes of space.

Yes, really.

Four gigabytes is more space than many people have on their Raspi SD card. Wasting all that on duplicates of the exact same data does not seem like the best use of those bits. The original shared libraries take only about 5% of this, static linking expands them 20 fold. Running the measurement script on a VirtualBox Ubuntu install says that on that machine the duplicates would take over 10 gigabytes. You can fit an entire Ubuntu install in that space. Twice. Even if this were not in issue for disk space, it would be catastrophic for instruction caches.

A counterargument people often make is that static linking is more efficient than dynamic linking because the linker can throw away those parts of dependencies that are not used. If we assume that the linker did this perfectly, executables would need to use only 5% of the code in their dependencies for static linking to take less space than dynamic linking. This seems unlikely to be the case in practice.

In conclusion

Static linking is great for many use cases. These include embedded software, firmwares and end user applications. If your use case is running a single application in a container or VM, static linking is a great solution that simplifies deployment and increases performance.

On the other hand claiming that a systems programming language that does not provide a stable ABI and shared libraries can be used to build the entire userland of a Linux distribution is delusional.

8 comments:

glandiumMarch 17, 2017 at 12:27 AM
There's a flaw in your methodology: statically linking a library drops any object that isn't actually used. For a simple example, consider a program linked against libc.so.6 and uses only, say, gettimeofday() ; now link it against the static libc, it will not grow by the (massive) size of the libc, but by the size of the .o that contains the gettimeofday function (plus the size of the .o that contain the functions it uses). In the case of gettimeofday, it would be gettimeofday.o, localtime.o, tzset.o and lowlevellock.o, if I'm not mistaken.
ReplyDelete
Replies
James BrownMarch 17, 2017 at 2:13 AM
Don't forget the bandwidth difference when an openssl or libc upgrade means you have to redownload every binary that ever linked against such a core library... Static linking is great for Google's internal systems, with several tens of thousands of machines just dedicated to recompiling code and 10Gbps ethernet to every box... not so much for normal humans!
ReplyDelete
Replies
zeenixMarch 17, 2017 at 9:25 AM
Sinun blogi on Suomeksi :)
ReplyDelete
Replies
djbMarch 17, 2017 at 10:02 AM
You say your scenario does not allow dynamic linking, but then you subtract the size of language stdlibs. But why is that possible, if they're not dynamically linked? What other mechanism does the language use to make those symbols available in this scenario?
ReplyDelete
Replies