Saturday, November 21, 2020

Adding (very) preliminary support for C++ modules in Meson

One of the most common questions people ask about Meson is why does it not yet have support for building C++ modules. Up until now the answer has been simple: no compiler really supports it yet. However Visual Studio has added sufficient functionality in their latest 2019 developer preview that an implementation in Meson has become feasible. The actual code can be found in this merge request for those brave enough to try it out.

The basic problem with C++ modules is the same as with Fortran modules: you can no longer build source files in an arbitrary order. Instead you have to scan the contents of files, see what modules each source file generates and consumes and orchestrate the build so that all source files that produce modules are built before any source files that consume them. This requires dynamic dependency generation that has been added to Ninja only fairly recently.

The original idea was that compiler toolchain vendors would provide scanner binaries because parsing C++ code is highly unreliable due to C preprocessor macro shenanigans. It turns out that a "toolchain provided" dependency scanner can not obtain all necessary data reliably, because it requires higher level knowledge about the project setup. This can only be done reliably by the build system. An alternative would be to pass all this information to the compiler/scanner via compiler flags but that turns out to be a terrible thing to define and maintain stable over changes. It also has the downside that you'd need to spawn a single process for every file, which is fairly slow on Windows. Meson's approach is to write a custom dependency scanner. Yes, it is based on regexes, so it is not 100% reliable but on the other hand you only need to spawn one process per build target (exe, shared lib, static lib) as opposed to one per source file.

Still, the end result does work for simple projects. It does not handle things like module partitions but those can be added later. Even this simple project and test has brought about several notes and questions:

  • Where should the generated module files be put? In the target private dir? In a global dir? If the latter, what happens if two unrelated parts in the code base specify the same module?
  • Microsoft has not documented the module compiler flags and cl /? does not even list them. Because of this all module files get dumped to the build directory root.
  • Only ixx files are supported. VS does not enforce file name extensions. I would really want to enforce module file name extensions to only one. We can't change change legacy code and force everyone to use a single extension for C++ source, but we totally should do that for new ones. Having to support many file name extensions for the same thing is madness.
Sadly I don't have any numbers on how much modules improve compilation speed. Feel free to try it out yourself, though. Bug reports and especially fixes are welcome.

Monday, November 16, 2020

The Nine Phases of an Open Source Project Maintainer

There is more to running an open source project than writing code. In fact most of all work has to do with something else. This places additional requirements to project maintainers that are often not talked about. In this post we'll briefly go over nine distinct phases each with a different hat one might have to wear. These can be split into two stages based on the lifetime and popularity of the project.

Stage One: The Project Is Mostly for Yourself

Almost all projects start either with just one person or a small team of just a few people. At the start doing things is easy. Breaking changes can be made on a whim. Programming languages and frameworks can be changed. It is even possible to pivot to something completely different without a care in the world. As there are only a few stakeholders and they typically have similar ideologies and thus it is easy to get consensus. It is even possible to ignore consensus altogether and "just do it".

Phase One: The Inventor

Everything starts with an idea: how something could be done differently or in a more efficient way. This is the part that tends to get fetishised by journalists and even some inventors themselves. The typical narrative explains how a single genius managed to create a revolutionary new thing all on their own in a basement somewhere. The reality is not quite as glamorous, as almost all ideas are ones that many, many other people have already come up with. Some people go as far as to say that ideas are worthless, only execution matters. This is a bit extreme but nevertheless coming up with ideas is an important skill.

Phase Two: The MVP Implementer

Once an idea is chosen, some sort of a prototype needs to be written. This is the most fun part of coding. There are vast green fields where you can do whatever, design the architecture as you want and get to solve interesting problems that form the core of the eventual proudct. This phase is the main reason why people become programmers. Getting to create something completely new is a joyful experience. Still, not everything is wine and roses, as it is important to focus enough to get the first version finished rather than going off on all sorts of tangents and eventually losing interest.

Phase Three: The Ditch Digger

Once the first version exists and is found usable, the next step is to make it production ready. This is where the nature of project work takes a very sharp turn. Whereas the previous stage could be described as fun, this phase is tedious. It consists of making the end product reliable and smooth in the face of real world input and usage. This typically exposes bugs and design flaws in the original implementation that need to be fixed or rewritten. It is easy to get discouraged in this phase because the outcome of days of coding might be "the same as before, but also handles this one rare corner case".

The work profile is similar to digging a ditch with a shovel. It's dirty, heavy and taxing work and there are not that many rewards to be had. After all, a half dug ditch is about as useless as a completely undigged ditch. It's only when you reach the end and water starts flowing that you get any benefits. The difference between physical ditches and sotware is that there is no reliable way of estimating how much more you still have to dig. This is a very risky phase of any project as it carries the potential for burnout.

Phase Four: The Documentation Writer

Every new project needs documentation, but some projects need it more than others. Programmers are typically not very eager to write documentation or to keep it up to date. Telling users to "read the source" to find out how to do things is not enough, because people don't want to have to learn about implementation details of your project, they just want to use it. Sometimes it is possible to get other people to write documentation, but usually that only happens after the project has "made it big".

One way of looking at documentation is that it is a competitive advantage. If there are multiple competing projects for the same thing and one of them has better documentation, it has a higher chance of winning (all other things being equal). Writing end user documentation requires a completely different approach and skill set than writing code. This is especially true for things like tutorials as opposed to reference documentation.

Phase Five: The Marketer

Build a better mousetrap and the world will ignore you, tell you that their mouse trap situation is perfectly fine thankyouverymuch and why don't you get a real job rather than wasting your time on this whateveritis, as it will never work. If you want to make them change their mind, you need marketing. Lots of it.

There are many different ways of making your project more known: writing blog posts, presenting at conferences, general online advocacy and so on. This requires, again, a new set of skills, such as talking to a large group of people in public. This is especially true for programmers who are mostly introverted, but sadly the meek don't inherit the earth. It tends to go to those who can make the most noise about their solution.

Stage Two: The Project Is Mostly for Other People

As the project becomes bigger and more used, eventually another tipping point is reached. Here the program is no longer catering to the needs of the original creator but to the community at large. The rate of change reduces dramatically. Breaking changes can no longer be made at a quick pace or possibly at all. It is also possible that the landscape has changed and the project is now being used in a different way or for different ends than was originally planned. All of this means that the project runner needs to spend more and more time solving issues that does not directly benefit themselves. This may cause friction if, for example, the project leader works for a company that has other priorities and does not want the person to spend time on things that don't benefit them directly.

Phase Six: The Recruiter

A project that does not keep refreshing and growing its developer base is a dead one. Typically a project needs to have a sizable chunk of users before other people start contributing to it in a major way. Sometimes people become involved voluntarily, but it's even better if you can somehow actively encourage them to contribute. That is only part of the story, though, since they need to be trained and taught the processes and so on. Paradoxically getting new contributors slows down development at first, but eventually makes things faster as the workload can be split among multiple people.

Phase Seven: The Culture Cultivator

Every project has its own set of unspoken guidelines. These get established quite early on and include things like requiring tests for every new feature, not using coding patterns X, Y or Z but use H, J and K instead and so on. People are generally quite good at detecting these and doing the same thing as everyone else. As the pool of contributors grows, this becomes less and less common. Contributions tend to become more lax. This is not due to malice, but simply because people are not aware of the requirements.

It is very easy to slip on these requirements little by little. It is the job of the project leader to make sure this does not happen. This requires both leading by example and also by noting out these issues in code review and other discussions. 

Phase Eight: The Overseer

This phase begins when the project maintainer realizes that they are no longer the person who knows most about the code base. Other people have done most of the coding work for so long that they are the actual experts on it. This causes yet another change in the type of work one needs to do. Up until now the work has been about solving problems and making decisions on things you are intimately familiar with. As an overseer you need to make decisions on things you don't really know about. Earlier decisions were based on code and implementation details, but now decisions are based mostly on what other people say in their merge requests and design discussions.

This is something nobody really prepares you for. Making big decisions based on imperfect information can be really difficult for someone who has gotten used to going through every detail. Once a project gets over a certain size this is just not possible as the human brain is incapable of holding that many details in active memory at the same time. Even if it could, having a single person review everything would be a huge bottleneck. It is (more than) a full time job, and getting someone to pay for a full time maintainer review job is very rare.

Finally, even if this were possible, reviewing is a very tiring job that very few people can keep on doing as their only task for very long. Eventually the mind will start screaming for something else, even for a while. Finally even if someone could do that, contributors would eventually get very annoyed by getting micromanaged to death and just leave.

Phase Nine: The Emeritus

All good things eventually come to an end and so will open source project maintainership. Eventually the project will either become irrelevant or the torch will be passed to someone else. This is, in a way, the greatest thing a project maintainer could hope for: being able to create a new entity that will keep on being used even after you have stopped working on it.

Open source maintainership is a relatively young field and most projects at the end of their life cycle either become unmaintained zombies or get replaced by a new project written from scratch. Ee don't have that much experience on what emerituses do. Based on other fields these may range from "nothing" to doing conference talks, advising current maintainers on thorny issues.

Saturday, November 7, 2020

Proposal for target-private directories for compilers

One of the greatest strengths of the classical C compiler model is that all compile jobs are fully isolated. This means that they can be run perfectly in parallel. This same feature is also one of its greatest weaknesses. There are no ways for individual compile jobs to communicate with each other even if they wanted to. This could be useful for things like caching. As an example a compiler might transparently create "precompiled headers" of sorts during one compilation and use them in other compilations if needed. This might also be useful for languages that require scanning steps before building such as Fortran and C++ using modules.

This is not idle speculation. Clang's thinLTO does use caching to speed up incremental builds. Since there is no existing standard for this, they did the usual thing and created a new compiler flag for specifying the location of the cache directory. Or, to be precise, they created four of them:

  • gold (as of LLVM 4.0): -Wl,-plugin-opt,cache-dir=/path/to/cache
  • ld64 (support in clang 3.9 and Xcode 8): -Wl,-cache_path_lto,/path/to/cache
  • ELF lld (as of LLVM 5.0): -Wl,--thinlto-cache-dir=/path/to/cache
  • COFF lld-link (as of LLVM 6.0): /lldltocache:/path/to/cache

For one option this is tedious but for many it becomes completely unworkable. Clearly something better is needed.

The basic idea: each build target gets its own private directory

Basically what one should be able to do is this:

gcc -c -o foo/bar.o bar.c -fprivate-dir=some_path

The private directory would have the following properties:

  • The build system guarantees that it is set to the same value for all compilations of a single target (executable, shared library, static library, etc)
  • Every build target gets its own unique private directory
  • The contents of the directory may persist over successive invocations (i.e. its contents may be deleted at any time, but most of the time won't be)
  • The compiler can create anything it wants in the private dir but should also tolerate other usages (typically you'd also want to put the target's object files in this dir)
  • The contents of the private dir are transitory, they have no backwards or forwards compatibility guarantees. Any compiler update would invalidate all files.
If, for example, compilers wanted to create pipes or Unix domain sockets in the private dir for communicating between compiler instances, they could do that behind the scenes.

Work needed for tooling

Meson and CMake already to pretty much exactly this as they store object files in special directories. I don't know enough about Autotools to know how much work it would be, though it does have the concept of higher level build targts. Handwritten Makefiles would need to be tweaked by hand as with every change. Visual Studio solutions are already split up to per-target project files so adding new flags there should be fairly simple.

The best part is that this change would be fully backwards compatible. If the private dir argument is not used, the compilers would behave in exactly the same way they do now.

Monday, November 2, 2020

You wanted Boost via Meson subprojects? You got it! (sorta)

In the previous blog post we saw a way to build SDL transparently as a Meson subproject. In the discussion that followed I got a question on whether you could consume Boost in the same way. This is an interesting question, because Boost is a, let's say, challenging dependency. It is very big and set up in an unusual way. As an example I would estimate that the single fact that they don't ship Pkg-Config files has cost Meson developers tens of hours of unnecessary troubleshooting. Having something simpler and more reliable would be welcome.

To test this out I created an MVP program that uses Boost's flat map from the container library and then added dependencies until it worked. The actual code can be downloaded here (tested on Linux, VS and Mac). The main program's basic build definition is as simple as the SDL program's was:

boost_container_dep = dependency('boost-container')
executable('boosttest', 'boosttest.cpp',
           dependencies: [boost_container_dep])

The Boost container dep is specified in the container library's build file:

boost_container_dep = declare_dependency(
  include_directories: 'include',
  dependencies: [...])

As this is a header-only library, the only thing it needs to do is to expose the header dir. The dependencies keyword argument lists all the other dependencies that are needed to build code that uses the container library. These are move, assert, static_assert, intrusive, core and config. Their build files are almost identical to this one. No code changes were needed. The total LoC of files for this entire setup is 42. Which is apt.

Making it better

The main reason for this low line count is the fact that the Meson build definition do a lot less than the original ones. They do a lot are highly configurable, which might also explain why Boost's conversion to CMake has taken several years and is still ongoing. A lot of that effort is taken by things like documentation, but the actual build is also more complicated as it provides for more stuff. Here are two examples and an outline of how they would be implemented in Meson.

Built libraries

Some Boost libraries are header-only, others require some code to be built and linked against. Suppose we have a header-only dependency. Its dependency object would be defined like this:

foo_dep = declare_dependency(include_directories: 'include')

Converting that to contain a library component would be done like this:

foo_lib = library(...)
foo_dep = declare_dependency(include_directories: 'include',
                             link_with: foo_lib)

Basically you build the library and then tell users to link to that. This is pretty much what the SDL build definitions did. The library can be shared or static, they both work the same way.

Compiler flags needed for using the dependency

Some of the libraries seem to require that users of the library must specify some compiler command line arguments. These might or might not be the same ones that are used to build the library components themselves. This is natively supported.

foo_dep = declare_dependency(include_directories: 'include',
                             compile_args: ['-DUSING_FOO'])

How much work would it be to convert all of Boost?

It depends, but quite a lot in any case. Boost is very big. If one attempts to reach feature parity with the current build system it would be a very, very big effort. I'm told that there are parts of Boost that have circular dependencies between projects and Meson does not support those (as in: they are inexpressible). Meson's HP-UX support is also still a work in progress (or, to be more exact, a work not in progress, at least as far as I'm aware of).

Doing a simple conversion that only needs to deal with the code on common platforms, on the other hand, would be doable. It would require a small team of dedicated people, because trying to do it alone would just lead to a massive burnout, but it could be done.