Wednesday, August 6, 2025

Let's properly analyze an AI article for once

Recently the CEO of Github wrote a blog post called Developers reinvented. It was reposted with various clickbait headings like GitHub CEO Thomas Dohmke Warns Developers: "Either Embrace AI or Get Out of This Career" (that one feels like an LLM generated summary of the actual post, which would be ironic if it wasn't awful). To my great misfortune I read both of these. Even if we ignore whether AI is useful or not, the writings contain some of the absolute worst reasoning and stretched logical leaps I have seen in years, maybe decades. If you are ever in the need of finding out how not to write a "scientific" text on any given subject, this is the disaster area for you.

But before we begin, a detour to the east.

Statistics and the Soviet Union

One of the great wonders of statistical science of the previous century was without a doubt the Soviet Union. They managed to invent and perfect dozens of ways to turn data to your liking, no matter the reality. Almost every official statistic issued by USSR was a lie. Most people know this. But even most of those do not grasp just how much the stats differed from reality. I sure didn't until I read this book. Let's look at some examples.

Only ever report percentages

The USSR's glorious statistics tended to be of the type "manufacturing of shoes grew over 600% this five year period". That certainly sounds a lot better than "In the last five years our factory made 700 pairs of shoes as opposed to 100" or even "7 shoes instead of 1". If you are really forward thinking, you can even cut down shoe production on those five year periods when you are not being measured. It makes the stats even more impressive, even though in reality many people have no shoes at all.

The USSR classified the real numbers as state secrets because the truth would have made them look bad. If a corporation only gives you percentages, they may be doing the same thing. Apply skepticism as needed.

Creative comparisons

The previous section said the manufacturing of shoes has grown. Can you tell what it is not saying? That's right, growth over what? It is implied that the comparison is to the previous five year plan. But it is not. Apparently a common comparison in these cases was the production amounts of the year 1913. This "best practice" was not only used in the early part of the 1900s, it was used far into the 1980s.

Some of you might wonder why 1913 and not 1916, which was the last year before the bolsheviks took over? Simply because that was the century's worst year for Russia as a whole. So if you encounter a claim that "car manufacturing was up 3700%" some year in 1980s Soviet Union, now you know what that actually meant.

"Better" measurements

According to official propaganda, the USSR was the world's leading country in wheat production. In this case they even listed out the production in absolute tonnes. In reality it was all fake. The established way of measuring wheat yields is to measure the "dry weight", that is, the mass of final processed grains. When it became apparent that the USSR could not compete with imperial scum, they changed their measurements to "wet weight". This included the mass of everything that came out from the nozzle of a harvester, such as stalks, rats, mud, rain water, dissidents and so on.

Some people outside the iron curtain even believed those numbers. Add your own analogy between those people and modern VC investors here.

To business then

The actual blog post starts with this thing that can be considered a picture.

What message would this choice of image tell about the person using it in their blog post?

  1. Said person does not have sufficient technical understanding to grasp the fact that children's toy blocks should, in fact, be affected by gravity (or that perspective is a thing, but we'll let that pass).
  2. Said person does not give a shit about whether things are correct or could even work, as long as they look "somewhat plausible".
Are these the sort of traits a person in charge of the largest software development platform on Earth should have? No, they are not.

To add insult to injury, the image seems to have been created with the Studio Ghibli image generator, which Hayao Miyazaki described as an abomination on art itself. Cultural misappropriation is high on the list of core values at Github HQ it seems.

With that let's move on to the actual content, which is this post from Twitter (to quote Matthew Garrett, I will respect their name change once Elon Musk starts respecting his child's).

Oh, wow! A field study. That makes things clear. With evidence and all! How can we possibly argue against that?

Easily. As with a child.

Let's look at this "study" (and I'm using the word in its loosest possible term here) and its details with an actual critical eye. The first thing is statistical representativeness. The sample size is 22. According to this sample size calculator I found, a required sample size for just one thousand people would be 278, but, you know, one order of magnitude one way or another, who cares about those? Certainly not business big shot movers and shakers. Like Stockton Rush for example.

The math above assumes an unbiased sampling. The post does not even attempt to answer whether that is the case. It would mean getting answers to questions like:

  • How were the 22 people chosen?
  • How many different companies, skill levels, nationalities, genders, age groups etc were represented?
  • Did they have any personal financial incentive on making their new AI tools look good?
  • Were they under any sort of duress to produce the "correct" answers?
  • What was/were the exact phrase(s) that was asked?
  • Were they the same for all participants?
  • Was the test run multiple times until it produced the desired result?
The latter is an age old trick where you run a test with random results over and over on small groups. Eventually you will get a run that points the way you want. Then you drop the earlier measurements and publish the last one. In "the circles" this is known as data set selection.

Just to be sure, I'm not saying that is what they did. But if someone drove a dump truck full of money to my house and asked me to create a "study" that produced these results, that is exactly how I would do it. (I would not actually do it because I have a spine.)

Moving on. The main headline grabber is "Either you embrace AI or get out of this career". If you actually read the post (I know), what you find is that this is actually a quote from one of the participants. It's a bit difficult to decipher from the phrasing but my reading is that this is not a grandstanding hurrah of all things AI, but more of a "I guess this is something I'll have to get used to" kind of submission. That is not evidence, certainly not of the clear type. It is an opinion.

The post then goes on a buzzwordsalad tour of statements that range from the incomprehensible to the puzzling. Perhaps the weirdest is this nugget on education:

Teaching [programming] in a way that evaluates rote syntax or memorization of APIs is becoming obsolete.

It is not "becoming obsolete". It has been considered the wrong thing to do for as long as computer science has existed. Learning the syntax of most programming languages takes a few lessons, the rest of the semester is spent on actually using the language to solve problems. Any curriculum not doing that is just plain bad. Even worse than CS education in Russia in 1913.

You might also ponder that if the author is so out of touch with reality in this simple issue, how completely off base the rest of his statements might be. In fact the statement is so wrong at such a fundamental level that it has probably been generated with an LLM.

A magician's shuffle

As nonsensical as the Twitter post is, we have not yet even mentioned the biggest misdirection in it. You might not even have noticed it yet. I certainly did not until I read the actual post. Try if you can spot it.

Ready? Let's go.

The actual fruit of this "study" boils down to this snippet.

Developers rarely mentioned “time saved” as the core benefit of working in this new way with agents. They were all about increasing ambition.

Let that sink in. For the last several years the main supposed advantage of AI tools has been the fact that they save massive amounts of developer time. This has lead to the "fire all your developers and replace them with AI bots" trend sweeping the nation. Now even this AI advertisement of a "study" can not find any such advantages and starts backpedaling into something completely different. Just like we have always been at war with Eastasia, AI has never been about "productivity". No. No. It is all about "increased ambition", whatever that is. The post then carries on with this even more baffling statement.

When you move from thinking about reducing effort to expanding scope, only the most advanced agentic capabilities will do.

Really? Only the most advanced agentics you say? That is a bold statement to make given that the leading reason for software project failure is scope creep. This is the one area where human beings have decades long track record for beating any artificial system. Even if machines were able to do it better, "Make your project failures more probable! Faster! Spectacularer!" is a tough rallying cry to sell. 

 To conclude, the actual findings of this "study" seem to be that:

  1. AI does not improve developer productivity or skills
  2. AI does increase developer ambition
This is strictly worse than the current state of affairs.

Wednesday, July 23, 2025

Comparing a red-black tree to a B-tree

 In an earlier blog post we found that optimizing the memory layout of a red-black tree does not seem to work. A different way of implementing an ordered container is to use a B-tree. It was originally designed to be used for on-disk data. The design principle was that memory access is "instant" while disk access is slow. Nowadays this applies to memory access as well, as cache hits are "instant" and uncached memory is slow.

I implemented a B-tree in Pystd. Here is how all the various containers compare. For test data we used numbers from zero to one million in a random order.


As we can see, an unordered map is massively faster than any ordered container. If your data does not need to be ordered, that is the one you should use. For ordered data, the B-tree is clearly faster than either red-black tree implementation.

Tuning the B-tree

B-trees have one main tunable parameter, namely the spread factor of the nodes. In the test above it was five, but for on disk purposes the recommended value is "in the thousands". Here's how altering the value affects performance.


The sweet spot seems to be in the 256-512 range, where the operations are 60% faster than standard set. As the spread factor grows towards infinity, the B-tree reduces to just storing all data in a single sorted array. Insertion into that is an O(N^2) algorithm as can be seen here.

Getting weird

The B-tree implementation has many assert calls to verify the internal structures. We can compile the code with -DNDEBUG to make all those asserts disappear. Removing redundant code should make things faster, so let's try it.

There are 13 measurements in total and disabling asserts (i.e. enabling NDEBUG) makes the code run slower in 8 of those cases. Let's state that again, since it is very unexpected: in this particular measurement code with assertions enabled runs faster than the same code without them. This should not be happening. What could be causing it?

I don't know for sure, so here is some speculation instead.

First of all a result of 8/13 is probably not statistically significant to say that enabling assertions makes things faster. OTOH it does mean that enabling them does not make the code run noticeably slower. So I guess we can say that both ways of building the code are approximately as fast.

As to why that is, things get trickier. Maybe GCC's optimizer is just really good at removing unnecessary checks. It might even be that the assertions give the compiler more information so it can skip generating code for things that can never happen. I'm not a compiler engineer, so I'll refrain from speculating further, it would probably be wrong in any case.

Saturday, July 5, 2025

Deoptimizing a red-black tree

An ordered map is typically slower than a hash map, but it is needed every now and then. Thus I implemented one in Pystd. This implementation does not use individually allocated nodes, but instead stores all data in a single contiguous array.

Implementing the basics was not particularly difficult. Debugging it to actually work took ages of staring at the debugger, drawing trees by hand on paper, printfing things out in Graphviz format and copypasting the output to a visualiser. But eventually I got it working. Performance measurements showed that my implementation is faster than std::map but slower than std::unordered_map.

So far so good.

The test application creates a tree with a million random integers. This means that the nodes are most likely in a random order in the backing store and searching through them causes a lot of cache misses. Having all nodes in an array means we can rearrange them for better memory access patterns.

I wrote some code to reorganize the nodes so that the root is at the first spot and the remaining nodes are stored layer by layer. In this way the query always processes memory "left to right" making things easier for the branch predictor.

Or that's the theory anyway. In practice this made things slower. And not even a bit slower, the "optimized version" was more than ten percent slower. Why? I don't know. Back to the drawing board.

Maybe interleaving both the left and right child node next to each other is the problem? That places two mutually exclusive pieces of data on the same cache line. An alternative would be to place the entire left subtree in one memory area and the right one in a separate one. Thinking about this for a while, this can be accomplished by storing the nodes in tree traversal order, i.e. in numerical order.

I did that. It was also slower than a random layout. Why? Again: no idea.

Time to focus on something else, I guess.

Friday, June 20, 2025

Book creation using almost entirely open source tools

Some years ago I wrote a book. Now I have written a second one, but because no publisher wanted to publish it I chose to self-publish a small print run and hand it out to friends (and whoever actually wants one) as a a very-much-not-for-profit art project.

This meant that I had to create every PDF used in the printing myself. I received the shipment from the printing house yesterday.  The end result turned out very nice indeed.

The red parts in this dust jacket are not ink but instead are done with foil stamping (it has a metallic reflective surface). This was done with Scribus. The cover image was painted by hand using watercolor paints. The illustrator used a proprietary image processing app to create the final TIFF version used here. She has since told me that she wants to eventually shift to an open source app due to ongoing actions of the company making the proprietary app.

The cover itself is cloth with a debossed emblem. The figure was drawn in Inkscape and then copypasted to Scribus.

Evert fantasy book needs to have a map. This has two and they are printed in the end papers. The original picture was drawn with a nib pen and black ink and processed with Gimp. The printed version is brownish to give it that "old timey" look. Despite its apparent simplicity this PDF was the most problematic. The image itself is monochrome and printed with a Pantone spot ink. Trying to print this with CMYK inks would just not have worked. Because the PDF drawing model for spot inks in images behaves, let's say, in an unexpected way, I had to write a custom script to create the PDF with CapyPDF. As far as I know no other open source tool can do this correctly, not even Scribus. The relevant bug can be found here. It was somewhat nerve wrecking to send this out to the print shop with zero practical experience and a theoretical basis of "according to my interpretation of the PDF spec, this should be correct". As this is the first ever commercial print job using CapyPDF, it's quite fortunate that it succeeded pretty much perfectly.

The inner pages were created with the same Chapterizer tool as the previous book. It uses Pango and Cairo to generate PDFs. Illustrations in the text were drawn with Krita. As Cairo only produces RGB PDFs, as the last step it had to be converted to grayscale using Ghostscript.

Monday, June 16, 2025

A custom C++ standard library part 4: using it for real

Writing your own standard library is all fun and games until someone (which is to say yourself) asks the important question: could this be actually used for real? Theories and opinions can be thrown about the issue pretty much forever, but the only way to actually know for sure is to do it.

Thus I converted CapyPDF, which is a fairly compact 15k LoC codebase from the C++ standard library to Pystd, which is about 4k lines. All functionality is still the same, which is to say that the test suite passes, there are most likely new bugs that the tests do not catch. For those wanting to replicate the results themselves, clone the CapyPDF repo, switch to the pystdport branch and start building. Meson will automatically download and set up Pystd as a subproject. The code is fairly bleeding edge and only works on Linux with GCC 15.1.

Build times

One of the original reasons for starting Pystd was being annoyed at STL compile times. Let's see if we succeeded in improving on them. Build times when using only one core in debug look like this.

When optimizations are enabled the results look like this:

In both cases the Pystd version compiles in about a quarter of the time.

Binary size

C++ gets a lot of valid criticism for creating bloated code. How much of that is due to the language as opposed to the library?

That's quite unexpected. The debug info for STL types seems to take an extra 20 megabytes. But how about the executable code itself?

STL is still 200 kB bigger. Based on observations most of this seems to come from stdlibc++'s implementation of variant. Note that if you try this yourself the Pystd version is probably 100 kB bigger, because by default the build setup links against libsubc++, which adds 100+ kB to binary sizes whereas linking against the main C++ runtime library does not.

Performance

Ok, fine, so we can implement basic code to build faster and take less space. Fine. But what about performance? That is the main thing that matters after all, right? CapyPDF ships with a simple benchmark program. Let's look at its memory usage first.

Apologies for the Y-axis does not starting at zero. I tried my best to make it happen, but LibreOffice Calc said no. In any case the outcome itself is expected. Pystd has not seen any performance optimization work so it requiring 10% more memory is tolerable. But what about the actual runtime itself?

This is unexpected to say the least. A reasonable result would have been to be only 2x slower than the standard library, but the code ended up being almost 25% faster. This is even stranger considering that Pystd's containers do bounds checks on all accesses, the UTF-8 parsing code sometimes validates its input twice, the hashing algorithm is a simple multiply-and-xor and so on. Pystd should be slower, and yet, in this case at least, it is not.

I have no explanation for this. It is expected that Pystd will start performing (much) worse as the data set size grows but that has not been tested.

Friday, June 6, 2025

Custom C++ stdlib part 3: The bleedingest edge variant

Implementing a variant type in C++ is challenging to say the least. I tried looking into the libstd++ implementation and could not even decipher where the actual data is stored. There is a lot of inheritance going on and helper classes that seem to be doing custom vtable construction and other metaprogramming stuff. The only thing I could truly grasp was a comment saying // "These go to eleven". Sadly there was not a comment // Smell my glove! which would seem more suitable for this occasion.

A modern stdlib does need a variant, though, so I had to implement one. To make it feasible I made the following simplifying assumptions.

  1. All types handled must be noexcept default constructible, move constructible and movable. (i.e. the WellBehaved concept)
  2. If you have a properly allocated and aligned piece of memory, placement new'ing into it works (there may be UB-shenanigans here due to the memory model)
  3. The number of different types that a variant can hold has a predefined static maximum value.
  4. You don't need to support any C++ version older than c++26.
The last one of these is the biggest hurdle, as C++26 will not be released for at least a year. GCC 15 does have support for it, though, so all code below only works with that.

The implementation

At its core, a Pystd variant is nothing more than a byte buffer and an index specifying which type it holds:

template<typename T...>
class Variant {
    <other stuff>
    char buf[compute_size<0, T...>()] alignas(compute_alignment<0, T...>());
    int8_t type_id;
};

The functions to compute max size and alignment requirements for types are simple to implement. The main problem lies elsewhere, specifically: going from a type to the corresponding index, going from a compile time index value to a type and going from a runtime index to the corresponding type.

The middle one is the simplest of these. As of C++26 you can directly index the argument pack like so:

    using nth_type = T...[compile_time_constant];

Going from type to an index is only slightly more difficult:

Going from a runtime value to a type is the difficult one. I don't know how to do it "correctly", i.e. the way a proper stdlib implementation does it. However, since the number of possible types is limited at compile time, we can cheat (currently only 5 types are supported):

Unrolling (type) loops like its 1988! This means you can't have variants with hundreds of different types, but in that case you probably need an architectural redesign rather than a more capable variant.

With these primitives implementing public class methods is fairly simple.

The end result

The variant implementation in Pystd and its helper code took approximately 200 lines of code. It handles all the basic stuff in addition to being exception safe for copy operations (implemented as copy to a local variable + move). Compile times remain in fractions of a second per file even though Pystd only has a single public header.

It works in the sense that you can put different types in to it, switch between them and so on without any compiler warnings, sanitizer issues or Valgrind complaints. So be careful with the code, I have only tested it, not proven it correct.

No performance optimization or even measurements have been made.