Sunday, August 12, 2018

Implementing a distributed compilation cluster

Slow compilation times are a perennial problem. There have been many attempts at caching and distributing the problem such as distcc and Icecream. The main bottleneck on both of these is that some work must be done on the "user's desktop" machine which is then transferred over the network. Depending on the implementation this may include things such as fully preprocessing the source file and then sending the result over the net (so it can be compiled on the worker machine without needing any system headers).

This means that the user machine can easily become the bottleneck. In order to remove this slowdown all the work would need to be done on worker machines. Thus the architecture we need would be something like this:


In this configuration the entire source tree is on a shared network drive (such as NFS). It is mounted in the same path on all build workers as well as the user's desktop machine. All workers and the desktop machine also must have an identical setup, that is, same compilers and installed dependencies. This is fairly easy to achieve with Docker or any similar container technology.

The main change needed to distribute the work is to create a compiler wrapper script, much like distcc or icecc, that sends the compilation request to the work distributor. It consists only of a command line to execute and the path to run it in. The distributor looks up the machine with the smallest load, sends the command, waits for the result and then returns the result to the developer machine.

Note that the input or output files do not need to be transferred between the developer machine and the workers. It is taken care of automatically by NFS. This includes any changes made by the user on their local checkout which are not in revision control. The code that implements all of this (in an extremely simple, quick, dirty and unreliable way) can be found in this Github repo. The implementation is under 300 lines of Python.

Experimental results

Since I don't have a data center to spare I tested this on a single 8 core i7 computer. The "native OS" ran the NFS server and work distributor. The workers were two cloned Virtualbox images each having 2 cores. For testing I compiled LLVM, which is a fairly big C++ code base.

Using the wrapper is straightforward and consists of setting up the original build directory with this:

FORCE_INLINE=1 CXX='/path/to/wrapper workserver_address g++' cmake <options>

Force inline is needed so configuration tests are run on the local machine. They write to /tmp, which is not shared and the executables might be run on a different machine than where they are compiled leading to failures. This could also be solved by having a shared temporary folder but that would increase the complexity of this simple experiment.

Compiling the source just over NFS in a single machine using 2 cores took about an hour. Compiling it with two workers took about 47 minutes. This is not particularly close to the optimal time of 30 minutes so there is a fair bit of overhead in the implementation. Most of this is probably due to NFS and the fact that absolutely everything ran on the same physical machine. NFS also had coherency problems. Sometimes some process invocations could not see files created by their dependency tasks. The most common case was linker invocations, which were missing one or more object files. Restarting the build always made it pass. I tried to add sync commands as necessary but could not make it 100% reliable.

Miscellaneous things of note

In this test only compilation was parallelised. However this same approach works with every executable that is standalone, that is, it does not need to talk to any other ongoing process via IPC. Every build system that supports setting the compiler manually can be used with this scheme. It also works for parallelising tests for build systems that support invoking tests with an arbitrary runner. For example, in Meson you could do this:

meson test --wrapper='/path/to/wrapper workserver_address'

The system also works (in theory) identically on other operating systems such as macOS and Windows. Setting up the environment is even easier because most projects do not use "system dependencies" on those platforms, only the compiler. Thus on Windows you could mount a smb drive with the code on, say, D:\code on all machines and, assuming they have the same version of Visual Studio, it should just work (not actually tested).

Adding caching support is fairly easy. All machines need to have a common directory mounted, point CCACHE_DIR to that and set the wrapper command on the desktop machine to:

CXX='/path/to/wrapper workserver_address ccache g++'

No comments:

Post a Comment