In an earlier blog post we found that optimizing the memory layout of a red-black tree does not seem to work. A different way of implementing an ordered container is to use a B-tree. It was originally designed to be used for on-disk data. The design principle was that memory access is "instant" while disk access is slow. Nowadays this applies to memory access as well, as cache hits are "instant" and uncached memory is slow.

I implemented a B-tree in Pystd. Here is how all the various containers compare. For test data we used numbers from zero to one million in a random order.

As we can see, an unordered map is massively faster than any ordered container. If your data does not need to be ordered, that is the one you should use. For ordered data, the B-tree is clearly faster than either red-black tree implementation.

Tuning the B-tree

B-trees have one main tunable parameter, namely the spread factor of the nodes. In the test above it was five, but for on disk purposes the recommended value is "in the thousands". Here's how altering the value affects performance.

The sweet spot seems to be in the 256-512 range, where the operations are 60% faster than standard set. As the spread factor grows towards infinity, the B-tree reduces to just storing all data in a single sorted array. Insertion into that is an O(N^2) algorithm as can be seen here.

Getting weird

The B-tree implementation has many assert calls to verify the internal structures. We can compile the code with -DNDEBUG to make all those asserts disappear. Removing redundant code should make things faster, so let's try it.

There are 13 measurements in total and disabling asserts (i.e. enabling NDEBUG) makes the code run slower in 8 of those cases. Let's state that again, since it is very unexpected: in this particular measurement code with assertions enabled runs faster than the same code without them. This should not be happening. What could be causing it?

I don't know for sure, so here is some speculation instead.

First of all a result of 8/13 is probably not statistically significant to say that enabling assertions makes things faster. OTOH it does mean that enabling them does not make the code run noticeably slower. So I guess we can say that both ways of building the code are approximately as fast.

As to why that is, things get trickier. Maybe GCC's optimizer is just really good at removing unnecessary checks. It might even be that the assertions give the compiler more information so it can skip generating code for things that can never happen. I'm not a compiler engineer, so I'll refrain from speculating further, it would probably be wrong in any case.

Nibble Stew

Wednesday, July 23, 2025

Comparing a red-black tree to a B-tree

Tuning the B-tree

Getting weird

No comments:

Post a Comment