I submitted talk proposals about Pystd, the from-scratch written standard library for C++ (custom design, not a implementation of the ISO specification) to a bunch of conferences. Unfortunately all of them were rejected, so it's blog posting time.
A controversial opinion
Pretty much everybody agrees that compiling C++ is slow, but I personally feel that it is intolerably slow. Other people might disagree, and that is fine, but let's look at some numbers.
Compiling a helloworld executable in C takes approximately 0.02 seconds. Compiling a C++ exe that does the same thing using #include<print> takes a second if optimizations are disabled and up to 2.3 seconds with them enabled. This is approximately a 100x slowdown. I'm using a Ryzen 7 3700X processor, so not the latest and greatest but not too shabby either. I have talked about this slowdown with some people in person and weirdly often their answers have been "that's not a problem, two seconds is insignificant". Even if you accepted this (which personally I don't) the big problem comes from scaling because the slowdown factor is approximately linear. Let's assume that in a less extreme case the slowdown was only 20x instead of 100x. In this case a program that should be done in 0.1 seconds takes 2 seconds, and therefore a program that should compile in one minute would take on the order of 20 minutes to compile.
Why is compilation so slow?
C++ is actually very fast to compile, the slowdowns come mostly from the way the standard library is implemented. This is actually fairly easy to test yourself by running the following shell snippet.
echo '#include<vector>' | g++ -x c++ -E - -std=c++23 | wc
The -E flag tells the compiler to stop after preprocessing. The output is the source code that is fed to the compiler proper. Instead we pass it to wc and find out that merely including vector expands out to 29 000 lines of code. The number is not directly comparable to "human written code", but still, almost 30k lines of code just to get a growable array of elements seems a bit much. And vector is actually one of the lighter headers. Memory is 55 thousand lines (especially bad, since 99% of the time people just want unique_ptr), print is 65 thousand lines and filesystem is 80 thousand lines.
The unfortunate reality is that if you include even a single standard library header, your compile times tank and there's nothing you can do about it.
Just say no
Pystd was originally just a project for me to implement low level primitives (hash maps etc) for scratch for the fun of it. Pretty quickly I came to the three design priorities:
- Compilation time
- Simplicity of implementation
- Performance
I'm not aware of an existing standard library where build time minimization would have been a design priority. Those that are fast, like the standard libraries of C and Go, seem to mostly follow from the simplicity of their respective languages.
At the time of writing building Pystd and all tests from scratch using a single core takes 4 seconds. This consists of 45 individual processes (mostly compiles, a few links). Enabling optimizations balloons the build time to 9 seconds. Using all 16 cores brings it down to 1.9 seconds.
What we have thus far
If we ignore test code, Pystd has 6500 lines of headers and 5600 lines of source in total. Even adding these two together yields a line count of roughly one third of std::vector's (preprocessed) implementation. Functionality provided by Pystd includes:
- vector, string, validating u8string, string views, spans
- Hash map, ordered map (using a B-tree)
- sort (approximately as fast as stdlibc++), stable_sort (faster than stdlibc++)
- Random selection of things in the ISO algorithm header
- Optional, expected, variant, unique_ptr
- Functionality roughly equivalent to Python modules argparse, pathlib (including the ** operator), regular expressions (using pcre) and tempfile
Note that all of these are "complete" as it were. Typically they only contain the most commonly used subset of functionality. That might be a fairly small.
Performance
There is an earlier blog post about the performance. The numbers for converting the CapyPDF library are as follows:
- Compile times dropped ~80%
- Unstripped binary size reduced by ~75%
- Stripped binary size reduced by ~30%
- Runtime performance became ~25% faster (yes, faster, not slower)
Regression can be prevented
Two typical issues people raise when hearing something needs to be "fast to compile" are the following:
- What even is "fast"? It is highly subjective thing that depends on each person and the computer they are using.
- Even if something is fast now, it can not remain fast. New functionality gets added all the time, so the code is destined to become ever slower and eventually it is just as slow as the default standard library.
The boring solution to both of these issues is the same: a predefined time budget. Pystd has a requirement that compiling a source file that includes any single public header must take at most 0.15 seconds. This limit was originally 0.1 seconds and it worked perfectly with GCC, but Clang's process startup time is longer than that. The test script that validates the performance is here. It must pass even on a Raspberry Pi.
Interestingly the tester script was not originally single-threaded. I parallelised it only because it was the single slowest part of Pystd's compile-test cycle taking over a second.
This is the requirement that all new functionality in Pystd must meet. If the code you want to add compiles too slow then either you rewrite the whole package to compile faster or you submit a patch to the upstream compiler to make it run faster.
Try it yourself
The code for Pystd is available in the usual places. Beginners might want to try using the sample project instead.
The code works on Linux and macOS. It does not support MSVC, because the implementation uses pack indexing, which MSVC has not implemented yet.
No comments:
Post a Comment