Saturday, February 6, 2021

Why most programming language performance comparisons are most likely wrong

For as long as programming languages have existed, people have fought over which one of them is the fastest. These debates have ranged from serious scientific research to many a heated late night bar discussion. Rather than getting into this argument, let's look at the problem at a higher level, namely how would you compare the performance of two different programming languages. The only really meaningful approach is to do it empirically, that is, implementing a bunch of test programs in both programming languages, benchmarking them and then declaring the winner.

This is hard. Really hard. Insanely hard in some cases and very laborious in any case. Even though the problem seems straightforward, there are a ton of error sources that can trip up the unaware (and even many very-much-aware) performance tester.

Equivalent implementations?

In order to make the two implementations comparable they should be "of equal quality". That is, they should have been implemented by people with roughly the same amount of domain knowledge as well as programming skills in their chosen language. This is difficult to organise. If the implementations are written by different people, they may approach the problem with different algorithms making the relative performance not a question of programming languages per se, but of the programming approaches chosen by each programmer.

Even if both implementation are written by the same person using the same algorithm, there are still problems. Typically people are better at some programming languages than others. Thus they tend to provide faster implementations in their favourite language. This causes bias, because the performance is not a measure of the programming languages themselves, but rather the individual programmer. These sorts of tests can be useful in finding usability and productivity differences, but not so much for performance.

Thus you might want to evaluate existing programs written by many expert programmers. This is a good approach, but sometimes even seasoned researches get it wrong. There is a paper that tries to compare different programming languages for performance and power efficiency using this approach. In their test results one particular program's C implementation was 30% faster than the same program written in C++. This single measurement throws a big shade over the entire paper. If we took the original C source, changed all the sources' file extension from .c to .cpp and recompiled, the end result should have the same performance within a few percentage points. Thus we have to conclude that one of the following is happening (in decreasing order of probability):
  1. The C++ version is suboptimally coded.
  2. The testing methodology has a noticeable flaw.
  3. The compiler used has a major performance regression for C++ as opposed to C.
Or, in other words, the performance difference comes somewhere else than the choice of programming language.

The difficulty of measurement

A big question is how does one actually measure the performance of any given program. A common approach is to run the test multiple times in a row and then do something like the following:
  • Handle outliers by dropping the points at extreme ends (that is, the slowest and fastest measurements)
  • Calculate the mean and/or median for the remaining data points
  • Compare the result between different programs, the one with the fastest time wins
Those who remember their high school statistics lessons might calculate standard deviation as well. This approach seems sound and rigorous, but it contains several sources of systematic error. The first of these is quite surprising and has to do with noise in measurements.

Most basic statistical tools assume that the error is normally distributed with an average value of zero. If you are measuring something like temperature or speed this is a reasonable assumption. For this case it is not. A program's measured time consists of the "true" time spent solving the problem and overhead that comes from things like OS interruptions, disk accesses and so on. If we assume that the noise is gaussian with a zero average then what it means is that the physical machine has random processes that make the program run faster than it would if the machine was completely noise free. This is, of course, impossible. The noise is strongly non-gaussian simply because it can never have a negative value.

In fact, the measurement that is the closest to the platonic ideal answer is the fastest one. It has the least amount of noise interference from the system. That is the very same measurement that was discarded in the first step when outliers were cleaned out. Sometimes doing established and reasonable things makes things worse.

Statistics even harder

Putting that aside, let's assume we have measurements for the two programs, which do look "sufficiently gaussian". Numerical analysis shows that language #1 takes 10 seconds to run whereas language #2 takes 9 seconds. A 10% difference is notable and thus we can conclude that language #2 is faster. Right?

Well, no. Suppose the actual measurement data look like this:


Is the one on the right faster or not? Maybe? Probably? Could be? Answering this question properly requires going all the way to university level statistics. First one formulates a null hypothesis, that is, that the two programs have no performance difference. Then one calculates the probability that both of these measurements have come from the same probability distribution. If the probability for this is small (typically 5%), then the null hypothesis is rejected and we have proven that one program is indeed faster than the other. This method is known as Student's t-test. and it is used commonly in heavy duty statistics. Note that some implementations of the test assume gaussian data and if you data has some other shape, the results you get might not be reliable.

This works for one program, but a rigorous test has many different programs. There are statistical methods for evaluating those, but they get even more complicated. Looking up how they work is left as an exercise to the reader.

All computers' alignment is Chaotic Neutral

Statistics are hard, but fortunately computers are simple because they are deterministic, reliable and logical. For example if you have a program and you modify it by adding a single NOP instruction somewhere in the stream, the biggest impact it could possibly have is one extra instruction cycle, which is so vanishingly small as to be unmeasurable. If you do go out and measure it, though, the results might confuse and befuddle you. Not only can this one minor change make the program 10% slower (or possibly even more), it can even make it 10% faster. Yes. Doing pointless extra work can make your the program faster.

If this is the first time you encounter this issue you probably won't believe it. Some fraction might already have gone to Twitter to post their opinion about this "stupid and wrong" article written by someone who is "clearly incompetent and an idiot". That's ok, I forgive you. Human nature and all that. You'll grow out of it eventually. The phenomenon is actually real and can be verified with measurements. How is it possible that having the CPU do extra work could make the program faster?

The answer is that it doesn't. The actual instruction is irrelevant. What actually matters is code alignment. As code shifts around in memory, its performance characteristics change. If a hot loop gets split by a cache boundary it slows down. Unsplitting it speeds it up. The NOP does not need to be inside the loop for this to happen, simply moving the entire code block up or down can cause this difference. Suppose you measure two programs in the most rigorous statistical way possible. If the performance delta between the two is under something like 10%, you can not reasonably say one is faster than the other unless you use a measurement method that eliminates alignment effects.

It's about the machine, not the language

As programs get faster and faster optimisation undergoes an interesting phase transition. Once performance gets to a certain level the system no longer about what the compiler and CPU can do to run the developer's program as fast as possible. Instead it becomes about how the programmer can utilize the CPU's functionality as efficiently as possible. These include things like arranging your data into a layout that the processor can crunch with minimal effort and so on. In effect this means replacing language based primitives with hardware based primitives. In some circles optimization works weirdly backwards in that the programmer knows exactly what SIMD instructions they want a given loop to be optimized into and then fiddles around with the code until it does. At this point the functionality of the programming language itself is immaterial.

This is the main reason why languages like C and Fortran are still at the top of many performance benchmarks, but the techniques are not limited to those languages. Years ago I worked on a fairly large Java application that had been very thoroughly optimized. Its internals consisted of integer arrays. There were no classes or even Integer objects in the fast path, it was basically a recreation of C inside Java. It could have been implemented in pretty much any programming language. The performance differences between them would have mostly come down to each compiler's optimizer. They produce wildly different performance outcomes even when using the same programming language, let alone different ones. Once this happens it's not really reasonable to claim that any one programming language is clearly faster than others since they all get reduced to glorified inline assemblers.

References

Most of the points discussed here has been scraped from other sources and presentations, which the interested reader is encouraged to look up. These include the many optimization talks by Andrei Alexandrescu as well as the highly informational CppCon keynote by Emery Berger. Despite its venue the latter is fully programming language agnostic so you can watch it even if you are the sort of person who breaks out in hives whenever C++ is mentioned.

Monday, February 1, 2021

Using a gamepad to control a painting application

One of the hardest things in drawing and painting is controlling the individual strokes. Not only do you have to control the location but also the pressure, tilt and rotation of the pen or brush. This means mastering five or six degrees of freedom at the same time with extreme precision. Doing it well requires years of practice. Modern painting applications and tools like drawing tablets emulate this experience quite well, but the beauty of computers is that we can do even more.

As an experiment I wrote a test application that separates tilt and pressure from drawing. In this approach one hand draws the shape as before, but the other controls can be controlled with the other hand by using a regular gamepad controller. Here's what it looks like (in case your aggregator strips embedded YouTube players, here is the direct link).

The idea itself is not new, there are discussions about it in e.g. Krita's web forum. Nonetheless it was a fun weekend hack (creating the video actually took longer than writing the app). After playing around with the app for a while this seems like a useful feature for an actual painting application. It is not super ergonomic though, but that may just be an issue with the Logitech gamepad I had. Something like the Wii remote would probably feel smoother, but I don't have one to test.

The code is available here for those who want to try it out.

Wednesday, January 6, 2021

Quick review of Lenovo Yoga 9i laptop

Some time ago I pondered on getting a new laptop. Eventually I bought a Lenovo Yoga 9i, which ticked pretty much all the boxes. I also considered a Dell 9310 but chose against it due to two reasons. Firstly, several reviews say that the keyboard feels bad with too shallow a movement. The second bit being that Dell's web site for Finland does not actually sell computers to individuals, only corporations, and their retailers did not have any of the new models available.

The hardware

It's really nice. Almost everything you need is there, such as USB A and C, touch screen, pen, 16GB of ram, Tiger Lake CPU, Xe graphics and so on. The only real missing things are a microsd card slot and a HDMI port. The trackpad is nice, with multitouch working flawlessly in e.g. Firefox. You can only do right click by clicking on the right edge rather than clicking with two fingers, but that's probably a software limitation (of Windows?). The all glass trackpad surface is a bit of a fingerprint magnet, though.

There are two choices for the screen, either FullHD or 4k. I took the latter because once you have experienced retina, you'll never go back. This reduces battery life, but even the 4k version gets 4-8 hours of battery life, which is more than I need. The screen itself is really, really nice apart from the fact that it is extremely glossy, almost like a mirror. Colors are very vibrant (to the point of being almost too saturated in some videos) and bright. Merely looking at the OS desktop background and app icons feels nice because the image is so sharp and calm. As a negative point just looking at Youtube videos makes the fan spin up. 

The touchscreen and pen work as expected, though pen input is broken in Windows Krita by default. You need to change the input protocol from the default to the other option (whose actual name I don't remember).

When it comes to laptop keyboards, I'm very picky. I really like the 2015-era MBPro and Thinkpad keyboards. This keyboard is not either of those two but it is very good. The key travel is slightly shallower and the resistance is crisper. It feels pleasant to type on.

Linux support

This is ... not good. Fedora live USBs do not even boot, and a Ubuntu 20/10 live USB has a lot of broken stuff, but surprisingly wifi works nicely. Things that are broken include:
  • Touchscreen
  • 3D acceleration (it uses LLVM softpipe instead)
  • Trackpad
  • Pen
The trackpad bug is strange. Clicking works, but motion does not unless you push it at a very, very, very specific amount pressure that is incredibly close to the strength needed to activate the click. Once click activates, motion breaks again. In practice it is unusable.

All of these are probably due to the bleeding-edgeness of the hardware and will probably be fixed in the future. For the time being, though, it is not really usable as a Linux laptop.

In conclusion

This is the best laptop I have ever owned. It may even be the best one I have ever used.

Monday, December 28, 2020

Some things a potential Git replacement probably needs to provide

Recently there has been renewed interest in revision control systems. This is great as improvements to tools are always welcome. Git is, sadly, extremely entrenched and trying to replace will be an uphill battle. This is not due to technical but social issues. What this means is that approaches like "basically Git, but with a mathematically proven model for X" are not going to fly. While having this extra feature is great in theory, in practice is it not sufficient. The sheer amount of work needed to switch a revision control system and the ongoing burden of using a niche, nonstandard system is just too much. People will keep using their existing system.

What would it take, then, to create a system that is compelling enough to make the change? In cases like these you typically need a "big design thing" that makes the new system 10× better in some way and which the old system can not do. Alternatively the new system needs to have many small things that are better but then the total improvement needs to be something like 20× because the human brain perceives things nonlinearly. I have no idea what this "major feature" would be, but below is a list of random things that a potential replacement system should probably handle.

Better server integration

One of Git's design principles was that everyone should have all the history all the time so that every checkout is fully independent. This is a good feature to have and one that should be supported by any replacement system. However it is not revision control systems are commonly used. 99% of the time developers are working on some sort of a centralised server, be it Gitlab, Github or the a corporation's internal revision control server. The user interface should be designed so that this common case is as smooth as possible.

As an example let's look at keeping a feature branch up to date. In Git you have to rebase your branch and then force push it. If your branch had any changes you don't have in your current checkout (because they were done on a different OS, for example), they are now gone. In practice you can't have more than one person working on a feature branch because of this (unless you use merges, which you should not do). This should be more reliable. The system should store, somehow, that a rebase has happened and offer to fix out-of-date checkouts automatically. Once the feature branch gets to trunk, it is ok to throw this information away. But not before that.

Another thing one could do is that repository maintainers could mandate things like "pull requests must not contain merges from trunk to the feature branch" and the system would then automatically prohibit these. Telling people to remove merges from their pull requests and to use rebase instead is something I have to do over and over again. It would be nice to be able to prohibit the creation of said merges rather than manually detecting and fixing things afterwards.

Keep rebasing as a first class feature

One of the reasons Git won was that it embraced rebasing. Competing systems like Bzr and Mercurial did not and advocated merges instead. It turns out that people really want their linear history and that rebasing is a great way to achieve that. It also helps code review as fixes can be done in the original commits rather than new commits afterwards. The counterargument to this is that rebasing loses history. This is true, but on the other hand is also means that your commit history gets littered with messages like "Some more typo fixes #3, lol." In practice people seem to strongly prefer the former to the latter.

Make it scalable

Git does not scale. The fact that Git-LFS exists is proof enough. Git only scales in the original, narrow design spec of "must be scalable for a process that only deals in plain text source files where the main collaboration method is sending patches over email" and even then it does not do it particularly well. If you try to do anything else, Git just falls over. This is one of the main reasons why game developers and the like use other revision control systems. The final art assets for a single level in a modern game can be many, many times bigger than the entire development history of the Linux kernel.

A replacement system should handle huge repos like these effortlessly. By default a checkout should only download those files that are needed, not the entire development history. If you need to do something like bisection, then files missing from your local cache (and only those) should be downloaded transparently during checkout operations. There should be a command to download the entire history, of course, but it should not be done by default.

Further, it should be possible to do only partial checkouts. People working on low level code should be able to get just their bits and not have to download hundreds of gigs of textures and videos they don't need to do their work.

Support file locking

This is the one feature all coders hate: the ability to lock a file in trunk so that no-one else can edit it. It is disruptive, annoying and just plain wrong. It is also necessary. Practice has shown that artists at large either can not or will not use revision control systems. There are many studios where the revision control system for artists is a shared network drive, with file names like character_model_v3_final_realfinal_approved.mdl. It "works for them" and trying to mandate a more process heavy revision control system can easily lead to an open revolt.

Converting these people means providing them with a better work flow. Something like this:
  1. They open their proprietary tool, be it Photoshop, Final Cut Pro or whatever.
  2. Click on GUI item to open a new resource.
  3. A window pops up where they can browse the files directly from the server as if they were local.
  4. They open a file.
  5. They edit it.
  6. They save it. Changes go directly in trunk.
  7. They close the file.
There might be a review step as well, but it should be automatic. Merge requests should be filed and kept up to date without the need to create a branch or to even know that such a thing exists. Anything else will not work. Specifically doing any sort of conflict resolution does not work, even if it were the "right" thing to do. The only way around this (that we know of) is to provide file locking. Obviously this should only be limitable to binary files.

Provide all functionality via a C API

The above means that you need to be able to deeply integrate the revision control system with existing artist tools. This means plugins written in native code using a stable plain C API. The system can still be implemented in whatever SuperDuperLanguage you want, but its one true entry point must be a C API. It should be full-featured enough that the official command line client should be implementable using only functions in the public C API.

Provide transparent Git support

Even if a project would want to move to something else, the sad truth is that for the time being the majority of contributors only know Git. They don't want to learn a whole new tool just to contribute to the project. Thus the server should serve its data in two different formats: once in its native format and once as a regular Git endpoint. Anyone with a Git client should be able to check out the code and not even know that the actual backend is not Git. They should be able to even submit merge requests, though they might need to jump through some minor hoops for that. This allows you to do incremental upgrades, which is the only feasible way to get changes like these done.

Friday, November 27, 2020

How Apple might completely take over end users' computers

Many people are concerned about Apple's ongoing attempts to take more and more control of end user machines from their users. Some go so far as to say that Apple won't be happy until they have absolute and total control over all programs running on end user devices, presumably so that they can enforce their 30% tax on every piece of software. Whether this is true or not we don't really know.

What we can do instead is a thought experiment. If that was their end goal, how would they achieve it? What steps would they take to obtain this absolute control? Let's speculate.

Web apps

A common plan against tightening app store requirements is to provide a web app instead. You can do a lot of cool things with WebAssembly and its state is continuously improving. Thus it must be blocked. This is trivial: require that web browsers may only run WASM programs that are notarized by Apple. This is an easy change to sell, all it needs is a single tear jerking please-think-of-the-children presentation about the horrible dangers of online predators, Bitcoin miners and the like. Notarization adds security and who wouldn't want to have more of that?

There is stilll the problem that you can run an alternative browser like Chrome or Firefox. This can be solved simply by adding a requirement that third party browsers can only get notarized if they block all non-notarized web apps. On iOS this is of course already handled by mandating that all browsers must use the system's browser engine. At some point this might get brought over to macOS as well. For security.

Javascript

Blocking WASM still leaves the problem of Javascript. There is a lot of it and even Apple can not completely block non-notarized JS from running. Here you have to run the long game. An important step is, surprisingly, to drive the adoption of WebAssembly. There are many ways of doing this, the simplest is to stop adding any new JS functionality and APIs that can instead be done in WebAssembly. This forces app developers to either drop Apple support or switch to WASM. This transition can be boosted by stopping all development and maintenance on the existing JS engine and letting it bitrot. Old web pages will get worse and worse over time and since Apple won't fix their browser, site operators will be forced to upgrade to technologies like WASM that come with mandatory notarization. For security.

Scripting languages

Scripting languages such as Perl and Python can be used to run arbitrary programs so they must go. First they are removed from the core install so people have to download and install them separately. That is only an inconvenience, though. To achieve total control notarization requirements must again be updated. Any program that loads "external code" must add a check that the code it is running is notarized by Apple. At first you will of course allow locally written script files to be run, as long as you first hunt through system security settings to add run permissions to the script file. This must be done with physical human interaction like a mouse or touchpad. It must not be automatable to prevent exploits. The obtained permissions are of course revoked every time the file is edited. For security.

Compilers

There is still a major hole in this scheme: native compilers. It might be tedious, but it is possible to compile even something as big as Firefox from scratch and run the result. Therefore this must be blocked, and notarization is again the key. This can be done by requiring all binaries, even self-built ones, to be notarized. This is again easy to sell, because it blocks a certain class malware injection attacks. Following iOS's lead you have to get a developer certificate from Apple to sign your own code to run on your own machine.

Once the basic scheme is in place you have to tighten security and block the signing from any compiler except the Apple provided system one. This has to be done for security, because existing third party compilers may have bugs and features that could be used to circumvent notarization requirements somehow. Only Apple can get this right as the system provider. There must be one, and only one, way of going from source code to executable binaries and that path must be fully controlled by Apple. For security.

Total separation of development and use

Even with all this you can still compile and run your own code, meaning people will find ways of getting around these requirements and doing what they want to do rather than what Apple permits them to do. This means that even tighter reins are required. The logical end result is to split the macOS platform into two separate entities. The first one is the "end user" system that can only run Apple-notarized apps and nothing else. The second is the "dev platform" that runs only XCode, Safari (in some sort of a restricted mode) and one other program that has to have been fully compiled on the current physical machine. Remote compilation servers are forbidden as they are a security risk. This is roughly how iOS development and things like game console dev kits already work. The precedent is there, waiting for the final logical step to be taken.

This has the side effect that every developer who wants to support Apple platforms now has to buy two different Apple laptops, one for development and one for testing. But let us be absolutely clear about one thing. This is not done to increase sales and thus profits. No. Not under any circumstances! It is for a higher purpose: for security.

Saturday, November 21, 2020

Adding (very) preliminary support for C++ modules in Meson

One of the most common questions people ask about Meson is why does it not yet have support for building C++ modules. Up until now the answer has been simple: no compiler really supports it yet. However Visual Studio has added sufficient functionality in their latest 2019 developer preview that an implementation in Meson has become feasible. The actual code can be found in this merge request for those brave enough to try it out.

The basic problem with C++ modules is the same as with Fortran modules: you can no longer build source files in an arbitrary order. Instead you have to scan the contents of files, see what modules each source file generates and consumes and orchestrate the build so that all source files that produce modules are built before any source files that consume them. This requires dynamic dependency generation that has been added to Ninja only fairly recently.

The original idea was that compiler toolchain vendors would provide scanner binaries because parsing C++ code is highly unreliable due to C preprocessor macro shenanigans. It turns out that a "toolchain provided" dependency scanner can not obtain all necessary data reliably, because it requires higher level knowledge about the project setup. This can only be done reliably by the build system. An alternative would be to pass all this information to the compiler/scanner via compiler flags but that turns out to be a terrible thing to define and maintain stable over changes. It also has the downside that you'd need to spawn a single process for every file, which is fairly slow on Windows. Meson's approach is to write a custom dependency scanner. Yes, it is based on regexes, so it is not 100% reliable but on the other hand you only need to spawn one process per build target (exe, shared lib, static lib) as opposed to one per source file.

Still, the end result does work for simple projects. It does not handle things like module partitions but those can be added later. Even this simple project and test has brought about several notes and questions:

  • Where should the generated module files be put? In the target private dir? In a global dir? If the latter, what happens if two unrelated parts in the code base specify the same module?
  • Microsoft has not documented the module compiler flags and cl /? does not even list them. Because of this all module files get dumped to the build directory root.
  • Only ixx files are supported. VS does not enforce file name extensions. I would really want to enforce module file name extensions to only one. We can't change change legacy code and force everyone to use a single extension for C++ source, but we totally should do that for new ones. Having to support many file name extensions for the same thing is madness.
Sadly I don't have any numbers on how much modules improve compilation speed. Feel free to try it out yourself, though. Bug reports and especially fixes are welcome.

Monday, November 16, 2020

The Nine Phases of an Open Source Project Maintainer

There is more to running an open source project than writing code. In fact most of all work has to do with something else. This places additional requirements to project maintainers that are often not talked about. In this post we'll briefly go over nine distinct phases each with a different hat one might have to wear. These can be split into two stages based on the lifetime and popularity of the project.

Stage One: The Project Is Mostly for Yourself

Almost all projects start either with just one person or a small team of just a few people. At the start doing things is easy. Breaking changes can be made on a whim. Programming languages and frameworks can be changed. It is even possible to pivot to something completely different without a care in the world. As there are only a few stakeholders and they typically have similar ideologies and thus it is easy to get consensus. It is even possible to ignore consensus altogether and "just do it".

Phase One: The Inventor

Everything starts with an idea: how something could be done differently or in a more efficient way. This is the part that tends to get fetishised by journalists and even some inventors themselves. The typical narrative explains how a single genius managed to create a revolutionary new thing all on their own in a basement somewhere. The reality is not quite as glamorous, as almost all ideas are ones that many, many other people have already come up with. Some people go as far as to say that ideas are worthless, only execution matters. This is a bit extreme but nevertheless coming up with ideas is an important skill.

Phase Two: The MVP Implementer

Once an idea is chosen, some sort of a prototype needs to be written. This is the most fun part of coding. There are vast green fields where you can do whatever, design the architecture as you want and get to solve interesting problems that form the core of the eventual proudct. This phase is the main reason why people become programmers. Getting to create something completely new is a joyful experience. Still, not everything is wine and roses, as it is important to focus enough to get the first version finished rather than going off on all sorts of tangents and eventually losing interest.

Phase Three: The Ditch Digger

Once the first version exists and is found usable, the next step is to make it production ready. This is where the nature of project work takes a very sharp turn. Whereas the previous stage could be described as fun, this phase is tedious. It consists of making the end product reliable and smooth in the face of real world input and usage. This typically exposes bugs and design flaws in the original implementation that need to be fixed or rewritten. It is easy to get discouraged in this phase because the outcome of days of coding might be "the same as before, but also handles this one rare corner case".

The work profile is similar to digging a ditch with a shovel. It's dirty, heavy and taxing work and there are not that many rewards to be had. After all, a half dug ditch is about as useless as a completely undigged ditch. It's only when you reach the end and water starts flowing that you get any benefits. The difference between physical ditches and sotware is that there is no reliable way of estimating how much more you still have to dig. This is a very risky phase of any project as it carries the potential for burnout.

Phase Four: The Documentation Writer

Every new project needs documentation, but some projects need it more than others. Programmers are typically not very eager to write documentation or to keep it up to date. Telling users to "read the source" to find out how to do things is not enough, because people don't want to have to learn about implementation details of your project, they just want to use it. Sometimes it is possible to get other people to write documentation, but usually that only happens after the project has "made it big".

One way of looking at documentation is that it is a competitive advantage. If there are multiple competing projects for the same thing and one of them has better documentation, it has a higher chance of winning (all other things being equal). Writing end user documentation requires a completely different approach and skill set than writing code. This is especially true for things like tutorials as opposed to reference documentation.

Phase Five: The Marketer

Build a better mousetrap and the world will ignore you, tell you that their mouse trap situation is perfectly fine thankyouverymuch and why don't you get a real job rather than wasting your time on this whateveritis, as it will never work. If you want to make them change their mind, you need marketing. Lots of it.

There are many different ways of making your project more known: writing blog posts, presenting at conferences, general online advocacy and so on. This requires, again, a new set of skills, such as talking to a large group of people in public. This is especially true for programmers who are mostly introverted, but sadly the meek don't inherit the earth. It tends to go to those who can make the most noise about their solution.

Stage Two: The Project Is Mostly for Other People

As the project becomes bigger and more used, eventually another tipping point is reached. Here the program is no longer catering to the needs of the original creator but to the community at large. The rate of change reduces dramatically. Breaking changes can no longer be made at a quick pace or possibly at all. It is also possible that the landscape has changed and the project is now being used in a different way or for different ends than was originally planned. All of this means that the project runner needs to spend more and more time solving issues that does not directly benefit themselves. This may cause friction if, for example, the project leader works for a company that has other priorities and does not want the person to spend time on things that don't benefit them directly.

Phase Six: The Recruiter

A project that does not keep refreshing and growing its developer base is a dead one. Typically a project needs to have a sizable chunk of users before other people start contributing to it in a major way. Sometimes people become involved voluntarily, but it's even better if you can somehow actively encourage them to contribute. That is only part of the story, though, since they need to be trained and taught the processes and so on. Paradoxically getting new contributors slows down development at first, but eventually makes things faster as the workload can be split among multiple people.

Phase Seven: The Culture Cultivator

Every project has its own set of unspoken guidelines. These get established quite early on and include things like requiring tests for every new feature, not using coding patterns X, Y or Z but use H, J and K instead and so on. People are generally quite good at detecting these and doing the same thing as everyone else. As the pool of contributors grows, this becomes less and less common. Contributions tend to become more lax. This is not due to malice, but simply because people are not aware of the requirements.

It is very easy to slip on these requirements little by little. It is the job of the project leader to make sure this does not happen. This requires both leading by example and also by noting out these issues in code review and other discussions. 

Phase Eight: The Overseer

This phase begins when the project maintainer realizes that they are no longer the person who knows most about the code base. Other people have done most of the coding work for so long that they are the actual experts on it. This causes yet another change in the type of work one needs to do. Up until now the work has been about solving problems and making decisions on things you are intimately familiar with. As an overseer you need to make decisions on things you don't really know about. Earlier decisions were based on code and implementation details, but now decisions are based mostly on what other people say in their merge requests and design discussions.

This is something nobody really prepares you for. Making big decisions based on imperfect information can be really difficult for someone who has gotten used to going through every detail. Once a project gets over a certain size this is just not possible as the human brain is incapable of holding that many details in active memory at the same time. Even if it could, having a single person review everything would be a huge bottleneck. It is (more than) a full time job, and getting someone to pay for a full time maintainer review job is very rare.

Finally, even if this were possible, reviewing is a very tiring job that very few people can keep on doing as their only task for very long. Eventually the mind will start screaming for something else, even for a while. Finally even if someone could do that, contributors would eventually get very annoyed by getting micromanaged to death and just leave.

Phase Nine: The Emeritus

All good things eventually come to an end and so will open source project maintainership. Eventually the project will either become irrelevant or the torch will be passed to someone else. This is, in a way, the greatest thing a project maintainer could hope for: being able to create a new entity that will keep on being used even after you have stopped working on it.

Open source maintainership is a relatively young field and most projects at the end of their life cycle either become unmaintained zombies or get replaced by a new project written from scratch. Ee don't have that much experience on what emerituses do. Based on other fields these may range from "nothing" to doing conference talks, advising current maintainers on thorny issues.