Tuesday, July 17, 2018

How expensive is globbing for sources in large projects

A common holy war in build systems is whether you should explicitly list all sources that make up a target or use a globbing pattern. There are both technical and non-technical arguments on both sides. The latter mostly deal with reliability and flexibility vs convenience. In this post we are going to ignore them completely and instead focus on the technical parts, specifically the overhead of globbing. The measurement script used can be downloaded from this repo.

In this test we used the full checkout of Chromium source code. The tests were run under Windows, since it is noticeably slower than Linux on both file operations and process invocations. The task simulation consists of roughly three parts:

  1. Scan the source tree for all directories that contain sources
  2. Generate glob patterns for detected directories (corresponding roughly to "one target for all sources in one directory")
  3. Run the globs
This ignores a bunch of steps, such as serialising the glob results to files and calculating the delta between two glob sets. These are probably fairly fast compared to file access operations, though.

Scanning the source tree and generating the globs

There is no direct correlation between this step and a regular build system. It is mostly interesting as a comparison between file operations between a hot and a cold cache. Running the scan on a cold cache takes 2 minutes but for a warm cache about 6 seconds.

Since this step is always run first, the following tests are all operating with a hot cache.

The actual globbing

Running all globs on the Chromium source tree takes between 2 and 6 seconds. This is the absolute lowest time that can be obtained for a no-op build without daemons because all globs must be re-evaluated every time.

The rule of thumb for UI design is that everything under one second is perceived as instantaneous. This means that for these sizes globbing causes a noticeable delay. Whether this is seen as insignificant or aggravating depends on each user.

Extra bonus: C++ modules

Since we have the measurement script, let's use it for something more interesting. Modules are an upcoming C++ feature to increase build times and a ton of other coolness depending on who you ask. The current specification works by having a kind of "module export declaration" at the beginning of source files. The idea is that you first compile those to generate a sort of a module declaration file and then you can start the actual compilation that uses said files.

If you thought "waitaminute, that sounds exactly like how FORTRAN is compiled", you are correct. Because of this it has the same problem that you can't compile source files in an arbitrary order, but instead you must first somehow scan them to find out the interdependencies between source (not header) files. In practice what this means is that instead of single-phase compilation all files must be processed twice. All scan operations must be done before any compilation jobs can start because otherwise you might start to compile a file before its dependencies are fully processed.

The scanning can be done in one of two ways. Either the build system scans the sources meaning it needs to understand the syntax of source files or the compiler can be invoked in a special preprocessing mode. Note that build systems such as Ninja do not do any such operations by themselves but instead always invoke external processes to do their work.

Testing the performance impact of these two is straightforward. The first one can be done by reading the first ten lines of each source file and then throwing them away. Measuring this time gives a fairly good estimate of the file processing overhead. The second way can be measured by doing the exact same thing but also invoking the compiler with no-op command line arguments to get the process invocation overhead.

Scanning the files directly takes roughly 120 seconds. For an 8 core machine this means a 15 second delay (at minimum) before any compilation tasks can begin. This is not great but for a full build it should be tolerable.

When spawning a compiler process the same operation takes 69 minutes. This is intolerably slow and would require an order of magnitude speedup in compilation times to be worthwhile. Unlike regular compilations, dependency scanning can not be sped up with unity builds because the specification requires that the module declaration must be at the very beginning of source files (and presumably there can not be more than one in a single TU).

No comments:

Post a Comment