Monday, June 15, 2026

Beware of Star Trek managers, especially when bearing MBAs

Almost exactly three years ago the Oceangate submarine implosion happened. The disaster came about when a billionaire called Stockton Rush created his own unclassified submarine to go sightseeing on the Titanic. Ignoring all advice from experts he created a "macgyveresque death trap" that eventually killed him and sadly also 4 innocent people. The whole thing was a massive display of stupidity and arrogance with unfortunate outcomes. We are not going to go into the actual event any deeper, but those interested can find lots of material online.

Instead we are going to look more deeply into one often overlooked points of Stockton Rush's character. Apparently he felt like he was something of a "new James T. Kirk" (link1 paywalled, link2). Liking Star Trek is not that unusual. I'm guessing that more than 99% of the readers of this blog are fellow Star Trek fans. The problem lies elsewhere, but to understand it we first have travel back in time.

A brief overview of the British navy during the Napoleonic wars (by a non-historian, so probably inaccurate)

The original concept for Star Trek was, approximately, The Adventures of Horatio Hornblower in Space! The Enterprise is basically a British warship sailing through the vast ocean of outer space. The command structure mirrors this, where you have a captain, navigator, ship's doctor and so on. The Next Generation leaned into this even further by having a first officer and so on. The original Star Trek never went into detail on how the main cast got to their current positions, just that there was an Starfleet Academy they went to.

In the Napoleonic era of Hornblower things were quite different. Anyone who wanted to become a captain pretty much had to be from the upper classes. They had to obtain a letter of recommendation so that they could join a vessel as a midshipman at the age of 13 or so. They were expected to be able seamen by this time and then spent the next six to seven years working on the ship rigging sails and doing all manner of random jobs. This went on for six to nine years depending on circumstances, after which the person could take a formal examination to become a lieutenant. The test was not trivial, many people could not pass even after trying multiple times.

A lieutenant then had to work successfully for several years before obtaining the rank of captain. Even that did not guarantee a commission. Some captains never commanded a ship simply because there were not enough of them to go around. All in all becoming a ship's captain was a long and difficult journey. In a surprisingly non-British turn of events it was not possible for aristocrats to sneak past the gates. Getting a midshipman position was obviously easier with connections, but the lieutenant's test was something they had to pass on their own.

All of this is to say that every captain of the time was an expert with decades of working experience on many different positions aboard the ship.

What does a captain actually do?

[Note: I have not fact checked this portion at all. Feel free to consider it fanfiction.]

The year is 1808 and we are aboard a British warship about to leave for a mission of great importance. The captain gives the order to set sail. Whistles are blown, bells are rung and sailors springs into action. Every single man, with one exception, is either doing manual labour or directly supervising their underlings. That exception is the captain, who seemingly stands around doing nothing (at least if you ask the crew). This is not so.

What he is doing is crucial. He is observing the state of the ship and her crew. This includes things like overall crew morale, any aberrations from normal operations that could cause problems, thinking of workflow improvements and so on. In a sense he has to sense the ship itself. This only works because of two things. First of all he has personal experience doing the exact work he is observing. If you have not personally "been there", you can't really know if a crew is working well or not. You need a "gut feeling" to be able to sense this. Secondly the captain does not have any manual labour so he can focus all of his mental energy on observing the ship's state. He is preparing for all the unexpected things that may occur in the future. This can only happen if your brain is free from menial tasks.

This is exactly what most books on business and project management advocate. It is a time tested way of improving your chances of success. A highly skilled commander can take an average team of people and lead them to victory. It is the basic plot of most military and sports movies.

Getting back to the present

Now take a typical modern day billionaire-via-inheritance and show them Star Trek at an impressionable age. Do they see the advantages of education, hard work and ethics? The foundation upon which Gene Roddenberry carefully built the show? Hell no! What they see is this (TNG screenshot used because TOS did not have a suitable maritime episode).

And then they think: "Wow! I want to be exactly like that! Parading around in a funny hat while everyone obeys my orders without question is my life's mission from now on. And I get to have sexy space sex with hot sexy space ladies of sex whenever I want. This appeals to me even more profoundly than Atlas Shrugged." Some of them might go on to watch Master and Commander and shed tears upon realizing that they cant publicly flog employees for failing to salute their superiors. Yet. Expect this to be made legal in Silicon Valley any day now.

Liking Kirk is not in any way a bad thing. Wanting to "be just like Kirk" is, because in the real world running a business like Kirk runs the Enterprise is a terrible way to do things. An example will illustrate this nicely. Let's imagine a random episode where the Enterprise has gotten into trouble. Eventually Kirk will call for Scotty and tell him: "You need to <babble> <babble> <babble>." Scotty will then reply with a varying level of scottish accent something like: "I cannae change the laws of physics, captain". Kirk will then say the same thing again, just more aggressively and in a close up shot. Scotty replies with "Well in that case I can get it done in sixty minutes." Kirk counters with: "You have five." And thus the problem is solved. Kirk gets a commendation for incredible valour under stress while Scotty, who did all of the actual work, is never mentioned.

In "Kirk style" management the Big Boss tells his underlings what to do. If they try to give any sort of feedback, the Boss ignores everything and just repeats his original orders again and again until the other party yields. The only reason underlings ever resist The Vision is that they are lazy and it is the job of the manager to put them in their place. This seems like a bit from a comedy show, but I have unfortunately worked under bosses like this. Either you try to talk at least some sense into them, fail, get labeled as a "not team player", watch the project crash and get blamed for the failure, or you try to do the impossible task given to you, fail watch the project crash and get blamed for the failure.

Another major problem with Kirk is that due to the way tv shows and movies need to be structured, he is actually an obsessive gambler. The stakes must always get higher and the ways to get out of trouble must become ever crazier. Kirk will break any and all laws and regulations he sees fit and then, once he has succeeded, no disciplinary action is taken. The ends justify the means. Idolising this sort of behaviour leads to thinking that wild one-in-a-million gambits will succeed at least 99 times out of a hundred. And even if it fails, you can get out of it by betting everything on an even bigger gamble. The real world does not work like that. Reality is not a story and you are not its hero. It does not owe you eternal, or even eventual, success. Had Stockton Rush survived his death trap, he would most likely have faced criminal charges and, if convicted, gone to jail.

The myth of the existence of the professional manager

Let's make one last detour in the 1800s and assume that the 7th Earl of Sidcup or some such really wants to get his idiot son instated as a captain. He and contacts the appropriate naval officers.

"My offspring needs to become a captain of a ship post haste!"

"Well first he has to become a midshipman and ..."

"Phah! None of that nonsense. It's way too slow and not becoming of my statute. Also my son is 35 years old so the post of a midshipman would be beneath him."

"I see. Well what sort of prior naval experience does he have?"

"None."

"Has he even ever been out to sea?"

"Not to my knowledge. But that does not matter. He is highly skilled in using the abacus excelius to compute annual budgets."

"By himself?"

"Of course not. That is what secretaries are for. He just gives them orders. That he can do. And that is all that matters. Same as in sailing."

This person is unlikely to get his wish with this line of reasoning. On the other hand in modern business life this is common. For example when startups get VC money, a common requirement is that they need to get a "proper manager" as a CEO. Typically this means the investor's friend, and more often than not an MBA.

Contrary to common belief, having an MBA does not make you incompetent at managing a technical company (though there is a strong positive correlation). It is entirely possible to be a good manager on a field you have no personal experience in. You just have to have a lot of humility, listen (actually, properly listen) to your employees and let the people with hands-on experience make the most technical, product and development decisions. In other words you have to be the enabler, not the maverick decision maker. People with these sorts of personality traits are rare and typically their career choices steer as far away from getting an MBA as possible.

The absolute worst thing happens if the CEO in question combines the (lack of) skills of an MBA with the attitude of Kirk. That leads to incompetent decisions based on willful ignorance, executed with the fury of an egomaniac who refuses to even entertain the notion that they might be wrong. Further, any person inside the organisation who dares to point out potential flaws in the plan will soon find themselves outside said organisation. Disagreement is treason. Treason shall not go unpunished.

In the 1800s the British navy could be said to be the best in the world. It seems plausible that one component of this success was the requirement that the officers running their ships had to have actual experience operating the ship. Not looking at other people operating it. Not pretending to read about operating it for a test. Actually doing it. If we look around how MBA wielding sociopath CEOs are enshittifying absolutely everything about the tech industry, bringing this requirement back into active use starts to feel awfully tempting.

Epilogue: Why doesn't everything immediately explode?

A reasonable counterpoint to everything written above would be that if managers truly are that bad, shouldn't all those companies be bankrupt by now? In an ideal world they would be, but there are opposing forces that keep them going.

The first one is that all corporations have inertia. If you took an established major company and actively started to mismanage it to death, it would still take years for things to eventually collapse. 

The second one is a dirty little secret. Many employees care more about the product they work on than "corporate visions" that seem to stem from overuse of peyote. They don't blindly obey idiotic commands but instead try to make things silently work within the system. Basically this means that corporations thrive despite their mad kings, not because of them. I know several people who have worked in these kinds of organizations and this is not as rare of an occurrence as one might imagine.

I have also experienced it personally. Years ago I was at a company, whose CEO (who, to the best of my knowledge, did not have an MBA) wanted to change the company's product so that it would do a specific thing X. Everybody thought this was a horrible idea and tried to reason with him using solid business and technical reasons (which turned out to be 100% correct). That failed. Spectacularly. This lead to an eternal series of secret meetings. The participants were main developers and all managers except the CEO. The only item on the agenda was "How can we make the CEO think that we did what he ordered while doing the exact opposite".

Eventually we did succeed, but boy was that a surreal couple of weeks.

Monday, June 8, 2026

Faking keyword arguments to functions in C++

One of the many nice language features in Python are keyword arguments. They make some types of APIs  concise and readable. Like so:

Unfortunately C does not have keyword arguments and, by extension, neither does C++. Adding them as a language feature would take 15-20 years of effort, most of which would consist of trying to convince people via email that such a feature is important and should be added.

There have been attempts to implement this via macros and template magic (link), but they have not seen widespread usage probably because they are using macros and template magic. However it turns out that with modern language features you can fake keyword arguments fairly convincingly. Like so:

The add_argument method takes a single argument which is a struct. The extra curly braces inside the parentheses boil down to "whatever the underlying argument is, construct it in place with these parameters". The dotted names are designated initializers, so those fields get the specified value whereas other fields get their default values.

And there you go, keyword arguments in C++. You just have to squint a bit and pretend not to see the extra curly braces.

Tuesday, April 21, 2026

CapyPDF is approaching feature sufficiency

In the past I have written many blog posts on implementing various PDF features in CapyPDF. Typically they explain the feature being implemented, how confusing the documentation is, what perverse undocumented quirks one has to work around to get things working and so on. To save the effort of me writing and you reading yet another post of the same type, let me just say that you can now use CapyPDF to generate PDF forms that have widgets like text fields and radio buttons.

What makes this post special is that forms and widget annotations were pretty much the last major missing PDF feature Does that mean that it supports everything? No. Of course not. There is a whole bunch of subtlety to consider. Let's start with the fact that the PDF spec is massive, close to 1000 pages. Among its pages are features that are either not used or have been replaced by other features and deprecated.

The implementation principle of CapyPDF thus far has been "implement everything that needs special tracking, but only to the minimal level needed". This seems complicated but is in fact quite simple. As an example the PDF spec defines over 20 different kinds of annotations. Specifying them requires tracking each one and writing out appropriate entries in the document metadata structures. However once you have implemented that for one annotation type, the same code will work for all annotation types. Thus CapyPDF has only implemented a few of the most common annotations and the rest can be added later when someone actually needs them.

Many objects have lots of configuration options which are defined by adding keys and values to existing dictionaries. Again, only the most common ones are implemented, the rest are mostly a matter of adding functions to set those keys. There is no cross-referencing code that needs to be updated or so on. If nobody ever needs to specify the color with which a trim box should be drawn in a prepress preview application, there's no point in spending effort to make it happen.

The API should be mostly done, especially for drawing operations. The API for widgets probably needs to change. Especially since form submission actions are not done. I don't know if anything actually uses those, though. That work can be done based on user feedback.

Friday, April 17, 2026

Multi merge sort, or when optimizations aren't

In our previous episode we wrote a merge sort implementation that runs a bit faster than the one in stdlibc++. The question then becomes, could it be made even faster. If you go through the relevant literature one potential improvement is to do a multiway merge. That is, instead of merging two arrays into one, you merge four into one using, for example, a priority queue.

This seems like a slam dunk for performance.

  • Doubling the number of arrays to merge at a time halves the number of total passes needed
  • The priority queue has a known static maximum size, so it can be put on the stack, which is guaranteed to be in the cache all the time
  • Processing an element takes only log(#lists) comparisons
Implementing multimerge was conceptually straightforward but getting all the gritty details right took a fair bit of time. Once I got it working the end result was slower. And not by a little, either, but more than 30% slower. Trying some optimizations made it a bit faster but not noticeably so.

Why is this so? Maybe there are bugs that cause it to do extra work? Assuming that is not the case, what actually is? Measuring seems to indicate that a notable fraction of the runtime is spent in the priority queue code. Beyond that I got very little to nothing.

The best hypotheses I could come up with has to with the number of comparisons made. A classical merge sort does two if statements per output elements. One to determine which of the two lists has a smaller element at the front and one to see whether removing the element exhausted the list. The former is basically random and the latter is always false except when the last element is processed. This amounts to 0.5 mispredicted branches per element per round.

A priority queue has to do a bunch more work to preserve the heap property. The first iteration needs to check the root and its two children. That's three comparisons for value and two checks whether the children actually exist. Those are much less predictable than the comparisons in merge sort. Computers are really efficient at doing simple things, so it may be that the additional bookkeeping is so expensive that it negates the advantage of fewer rounds.

Or maybe it's something else. Who's to say? Certainly not me. If someone wants to play with the code, the implementation is here. I'll probably delete it at some point as it does not have really any advantage over the regular merge sort.

Monday, April 6, 2026

Sorting performance rabbit hole

In an earlier blog post we found out that Pystd's simple sorting algorithm implementations were 5-10% slower than their stdlibc++ counterparts. The obvious follow up nerd snipe is to ask "can we make the Pystd implementation faster than stdlibc++?"

For all tests below the data set used was 10 million consecutive 64 bit integers shuffled in a random order. The order was the same for all algorithms.

Stable sort

It turns out that the answer for stable sorting is "yes, surprisingly easily". I made a few obvious tweaks (whose details I don't even remember any more) and got the runtime down to 0.86 seconds. This is approximately 5% faster than std::stable_sort. Done. Onwards to unstable sort.

Unstable sort

This one was not, as they say, a picnic. I suspect that stdlib developers have spent more time optimizing std::sort than std::stable_sort simply because it is used a lot more.

After all the improvements I could think of were done, Pystd's implementation was consistently 5-10% percent slower. At this point I started cheating and examined how stdlibc++'s implementation worked to see if there are any optimization ideas to steal. Indeed there were, but they did not help.

Pystd's insertion sort moves elements by pairwise swaps. Stdlibc++ does it by moving the last item to a temporary, shifting the array elements onwards and then moving the stored item to its final location. I implemented that. It made things slower.

Stdlibc++'s moves use memmove instead of copying (at least according to code comments).  I implemented that. It made things slower.

Then I implemented shell sort to see if it made things faster. It didn't. It made them a lot slower. So did radix sort.

Then I reworked the way pivot selection is done and realized that if you do it in a specific way, some elements move to their correct partitions as a side effect of median selection. I implemented that and it did not make things faster. It did not make them slower, either, but the end result should be more resistant against bad pivot selection so I left it in.

At some point the implementation grew a bug which only appeared with very large data sets. For debugging purposes I reduce the limit where introsort switches from qsort to insertion sort from 16 to 8. I got the bug fixed but the change made sorting a lot slower. As it should.

But this raises a question, namely would increasing the limit from 16 to 32 make things faster? It turns out that it did. A lot. Out of all perf improvements I implemented, this was the one that yielded the biggest improvement by a fairly wide margin. Going to 64 elements made it even faster, but that made other algorithms using insertion sort slower, so 32 it is. For now at least.

After a few final tweaks I managed to finally beat stdlibc++. By how much you ask? Pystd's best observed time was 0.754 seconds while stdlibc++'s was 0.755 seconds. And it happened only once. But that's enough for me.

Monday, March 23, 2026

Everything old is new again: memory optimization

At this point in history, AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full blast. Thus the amount of memory in consumer computers and phones seems to be going down. After decades of not having to care about memory usage, reducing it has very much become a thing.

Relevant questions to this state of things include a) is it really worth it and b) what sort of improvements are even possible. The answers to these depend on the task and data set at hand. Let's examine one such case. It might be a bit contrived, unrepresentative and unfair, but on the other hand it's the one I already had available.

Suppose you have to write script that opens a text file, parses it as UTF-8, splits it into words according to white space, counts the number of time each word appears and prints the words and counts in decreasing order (most common first).

The Python baseline

This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code. Its memory consumption on a small text file [update: repo's readme, which is 1.3k] looks like this.

Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.

The native version

A fully native C++ version using Pystd requires 60 lines of code to implement the same thing. If you ignore the boilerplate, the core functionality fits in 20 lines. The steps needed are straightforward:

  1. Mmap the input file to memory.
  2. Validate that it is utf-8
  3. Convert raw data into a utf-8 view
  4. Split the view into words lazily
  5. Compute the result into a hash table whose keys are string views, not strings

The main advantage of this is that there are no string objects. The only dynamic memory allocations are for the hash table and the final vector used for sorting and printing. All text operations use string views , which are basically just a pointer + size.

In code this looks like the following:

Its memory usage looks like this.

Peak consumption is ~100 kB in this implementation. It uses only 7.7% of the amount of memory required by the Python version.

Isn't this a bit unfair towards Python?

In a way it is. The Python runtime has a hefty startup cost but in return you get a lot of functionality for free. But if you don't need said functionality, things start looking very different.

But we can make this comparison even more unfair towards Python. If you look at the memory consumption graph you'll quite easily see that 70 kB is used by the C++ runtime. It reserves a bunch of memory up front so that it can do stack unwinding and exception handling even when the process is out of memory. It should be possible to build this code without exception support in which case the total memory usage would be a mere 21 kB. Such version would yield a 98.4% reduction in memory usage.

Thursday, March 19, 2026

Simple sort implementations vs production quality ones

One of the most optimized algorithms in any standard library is sorting. It is used everywhere so it must be fast. Thousands upon thousands of developer hours have been sunk into inventing new algorithms and making sort implementations faster. Pystd has a different design philosophy where fast compilation times and readability of the implementation have higher priority than absolute performance. Perf still very much matters, it has to be fast, but not at the cost of 10x compilation time.

This leads to the natural question of how much slower such an implementation would be compared to a production quality one. Could it even be faster? (Spoilers: no) The only way to find out is to run performance benchmarks on actual code.

To keep things simple there is only one test set, sorting 10'000'000 consecutive 64 bit integers that have been shuffled to a random order which is the same for all algorithms. This is not an exhaustive test by any means but you have to start somewhere. All tests used GCC 15.2 using -O2 optimization. Pystd code was not thoroughly hand optimized, I only fixed (some of the) obvious hotspots.

Stable sort

Pystd uses mergesort for stable sorting. The way the C++ standard specifies stable sort means that most implementations probably use it as well. I did not dive in the code to find out. Pystd's merge sort implementation consists of ~220 lines of code. It can be read on this page.

Stdlibc++ can do the sort in 0.9 seconds whereas Pystd takes .94 seconds. Getting to within 5% with such a simple implementation is actually quite astonishing. Even when considering all the usual caveats where it might completely fall over with a different input data distribution and all that.

Regular sort

Both stdlibc++ and Pystd use introsort. Pystd's implementation has ~150 lines of code but it also uses heapsort, which has a further 100 lines of code). Code for introsort is here, and heapsort is here.

Stdlibc++ gets the sort done in 0.76 seconds whereas Pystd takes 0.82 seconds. This makes it approximately 8% slower. It's not great, but getting within 10% with a few evening's work is still a pretty good result. Especially since, and I'm speculating here, std::sort has seen a lot more optimization work than std::stable_sort because it is used more.

For heavy duty number crunching this would be way too slow. But for moderate data set sizes the performance difference might be insignificant for many use cases.

Note that all of these are faster (note: did not measure) than libc's qsort because it requires an indirect function call on every comparison i.e. the comparison method can not be inlined.

Compiling the stdlib version takes 0.68 seconds while the Pystd version takes 0.33 seconds. Without optimizations the times are 0.55 and 0.21 seconds, respectively.

Where does the time go?

Valgrind will tell you that quite easily.

This picture shows quite clearly why big O notation can be misleading. Both quicksort (the inner loop of introsort) and heapsort have "the same" average time complexity but every call to heapsort takes approximately 4.5 times as long.