Friday, February 16, 2018

Automatically finding slow headers in C++ projects

A common problem in older C++ codebases is that sources compile slowly due to massive header includes. Headers include other headers, which include even more headers and then, somewhere in the guts of the system, someone includes a header that is very slow to parse. Now things are slow and nobody really knows why.

Trawling through the header soup manually is not feasible. Even if you were to manually inspect the headers, it is difficult to know which are the slow ones. Educated guesses can be made, such as anything having the word "boost" in its name is slow, but this only gets you so far. Fortunately it turns out that it is fairly straightforward to write a tool to find the slow ones automatically.

We need two things to be able to reliably measure the inclusion time breakdown of the headers of any source file.

  1. The transitive list of all header files it includes.
  2. The exact compiler flags used to compile the source.
The former can be obtained from a dependency file that the compiler can be told to generate during compilation (and which almost all modern build systems use by default). The latter can be obtained from the compilation_commands database which is also generated by most build tools today. The actual algorithm is simple: for each dependency header, create a dummy cpp file that just #includes that header, compile the source and measure the time it took.

I created a repo with the measurement script and a sample project to test it on. It has one source file and a few internal headers that include external headers. Here's the top part of its output:

0.5875 ../h1.h
0.5254 /usr/include/c++/7/regex
0.2779 /usr/include/c++/7/shared_mutex
0.2747 /usr/include/c++/7/condition_variable
0.2685 ../h2.h
0.2563 /usr/include/c++/7/locale
0.2445 /usr/include/c++/7/sstream
0.2337 ../h3.h
0.2330 /usr/include/c++/7/iostream
0.2329 /usr/include/c++/7/istream

Iostream has been traditionally considered to be big, bloated and slow to compile. However in this simple example we find that shared_mutex is even slower.

There are, of course, many caveats with this method. The main one being that this does not measure the code generation time, only parsing time. These two are usually highly correlated, though.

No comments:

Post a Comment