Saturday, November 21, 2020

Adding (very) preliminary support for C++ modules in Meson

One of the most common questions people ask about Meson is why does it not yet have support for building C++ modules. Up until now the answer has been simple: no compiler really supports it yet. However Visual Studio has added sufficient functionality in their latest 2019 developer preview that an implementation in Meson has become feasible. The actual code can be found in this merge request for those brave enough to try it out.

The basic problem with C++ modules is the same as with Fortran modules: you can no longer build source files in an arbitrary order. Instead you have to scan the contents of files, see what modules each source file generates and consumes and orchestrate the build so that all source files that produce modules are built before any source files that consume them. This requires dynamic dependency generation that has been added to Ninja only fairly recently.

The original idea was that compiler toolchain vendors would provide scanner binaries because parsing C++ code is highly unreliable due to C preprocessor macro shenanigans. It turns out that a "toolchain provided" dependency scanner can not obtain all necessary data reliably, because it requires higher level knowledge about the project setup. This can only be done reliably by the build system. An alternative would be to pass all this information to the compiler/scanner via compiler flags but that turns out to be a terrible thing to define and maintain stable over changes. It also has the downside that you'd need to spawn a single process for every file, which is fairly slow on Windows. Meson's approach is to write a custom dependency scanner. Yes, it is based on regexes, so it is not 100% reliable but on the other hand you only need to spawn one process per build target (exe, shared lib, static lib) as opposed to one per source file.

Still, the end result does work for simple projects. It does not handle things like module partitions but those can be added later. Even this simple project and test has brought about several notes and questions:

  • Where should the generated module files be put? In the target private dir? In a global dir? If the latter, what happens if two unrelated parts in the code base specify the same module?
  • Microsoft has not documented the module compiler flags and cl /? does not even list them. Because of this all module files get dumped to the build directory root.
  • Only ixx files are supported. VS does not enforce file name extensions. I would really want to enforce module file name extensions to only one. We can't change change legacy code and force everyone to use a single extension for C++ source, but we totally should do that for new ones. Having to support many file name extensions for the same thing is madness.
Sadly I don't have any numbers on how much modules improve compilation speed. Feel free to try it out yourself, though. Bug reports and especially fixes are welcome.

Monday, November 16, 2020

The Nine Phases of an Open Source Project Maintainer

There is more to running an open source project than writing code. In fact most of all work has to do with something else. This places additional requirements to project maintainers that are often not talked about. In this post we'll briefly go over nine distinct phases each with a different hat one might have to wear. These can be split into two stages based on the lifetime and popularity of the project.

Stage One: The Project Is Mostly for Yourself

Almost all projects start either with just one person or a small team of just a few people. At the start doing things is easy. Breaking changes can be made on a whim. Programming languages and frameworks can be changed. It is even possible to pivot to something completely different without a care in the world. As there are only a few stakeholders and they typically have similar ideologies and thus it is easy to get consensus. It is even possible to ignore consensus altogether and "just do it".

Phase One: The Inventor

Everything starts with an idea: how something could be done differently or in a more efficient way. This is the part that tends to get fetishised by journalists and even some inventors themselves. The typical narrative explains how a single genius managed to create a revolutionary new thing all on their own in a basement somewhere. The reality is not quite as glamorous, as almost all ideas are ones that many, many other people have already come up with. Some people go as far as to say that ideas are worthless, only execution matters. This is a bit extreme but nevertheless coming up with ideas is an important skill.

Phase Two: The MVP Implementer

Once an idea is chosen, some sort of a prototype needs to be written. This is the most fun part of coding. There are vast green fields where you can do whatever, design the architecture as you want and get to solve interesting problems that form the core of the eventual proudct. This phase is the main reason why people become programmers. Getting to create something completely new is a joyful experience. Still, not everything is wine and roses, as it is important to focus enough to get the first version finished rather than going off on all sorts of tangents and eventually losing interest.

Phase Three: The Ditch Digger

Once the first version exists and is found usable, the next step is to make it production ready. This is where the nature of project work takes a very sharp turn. Whereas the previous stage could be described as fun, this phase is tedious. It consists of making the end product reliable and smooth in the face of real world input and usage. This typically exposes bugs and design flaws in the original implementation that need to be fixed or rewritten. It is easy to get discouraged in this phase because the outcome of days of coding might be "the same as before, but also handles this one rare corner case".

The work profile is similar to digging a ditch with a shovel. It's dirty, heavy and taxing work and there are not that many rewards to be had. After all, a half dug ditch is about as useless as a completely undigged ditch. It's only when you reach the end and water starts flowing that you get any benefits. The difference between physical ditches and sotware is that there is no reliable way of estimating how much more you still have to dig. This is a very risky phase of any project as it carries the potential for burnout.

Phase Four: The Documentation Writer

Every new project needs documentation, but some projects need it more than others. Programmers are typically not very eager to write documentation or to keep it up to date. Telling users to "read the source" to find out how to do things is not enough, because people don't want to have to learn about implementation details of your project, they just want to use it. Sometimes it is possible to get other people to write documentation, but usually that only happens after the project has "made it big".

One way of looking at documentation is that it is a competitive advantage. If there are multiple competing projects for the same thing and one of them has better documentation, it has a higher chance of winning (all other things being equal). Writing end user documentation requires a completely different approach and skill set than writing code. This is especially true for things like tutorials as opposed to reference documentation.

Phase Five: The Marketer

Build a better mousetrap and the world will ignore you, tell you that their mouse trap situation is perfectly fine thankyouverymuch and why don't you get a real job rather than wasting your time on this whateveritis, as it will never work. If you want to make them change their mind, you need marketing. Lots of it.

There are many different ways of making your project more known: writing blog posts, presenting at conferences, general online advocacy and so on. This requires, again, a new set of skills, such as talking to a large group of people in public. This is especially true for programmers who are mostly introverted, but sadly the meek don't inherit the earth. It tends to go to those who can make the most noise about their solution.

Stage Two: The Project Is Mostly for Other People

As the project becomes bigger and more used, eventually another tipping point is reached. Here the program is no longer catering to the needs of the original creator but to the community at large. The rate of change reduces dramatically. Breaking changes can no longer be made at a quick pace or possibly at all. It is also possible that the landscape has changed and the project is now being used in a different way or for different ends than was originally planned. All of this means that the project runner needs to spend more and more time solving issues that does not directly benefit themselves. This may cause friction if, for example, the project leader works for a company that has other priorities and does not want the person to spend time on things that don't benefit them directly.

Phase Six: The Recruiter

A project that does not keep refreshing and growing its developer base is a dead one. Typically a project needs to have a sizable chunk of users before other people start contributing to it in a major way. Sometimes people become involved voluntarily, but it's even better if you can somehow actively encourage them to contribute. That is only part of the story, though, since they need to be trained and taught the processes and so on. Paradoxically getting new contributors slows down development at first, but eventually makes things faster as the workload can be split among multiple people.

Phase Seven: The Culture Cultivator

Every project has its own set of unspoken guidelines. These get established quite early on and include things like requiring tests for every new feature, not using coding patterns X, Y or Z but use H, J and K instead and so on. People are generally quite good at detecting these and doing the same thing as everyone else. As the pool of contributors grows, this becomes less and less common. Contributions tend to become more lax. This is not due to malice, but simply because people are not aware of the requirements.

It is very easy to slip on these requirements little by little. It is the job of the project leader to make sure this does not happen. This requires both leading by example and also by noting out these issues in code review and other discussions. 

Phase Eight: The Overseer

This phase begins when the project maintainer realizes that they are no longer the person who knows most about the code base. Other people have done most of the coding work for so long that they are the actual experts on it. This causes yet another change in the type of work one needs to do. Up until now the work has been about solving problems and making decisions on things you are intimately familiar with. As an overseer you need to make decisions on things you don't really know about. Earlier decisions were based on code and implementation details, but now decisions are based mostly on what other people say in their merge requests and design discussions.

This is something nobody really prepares you for. Making big decisions based on imperfect information can be really difficult for someone who has gotten used to going through every detail. Once a project gets over a certain size this is just not possible as the human brain is incapable of holding that many details in active memory at the same time. Even if it could, having a single person review everything would be a huge bottleneck. It is (more than) a full time job, and getting someone to pay for a full time maintainer review job is very rare.

Finally, even if this were possible, reviewing is a very tiring job that very few people can keep on doing as their only task for very long. Eventually the mind will start screaming for something else, even for a while. Finally even if someone could do that, contributors would eventually get very annoyed by getting micromanaged to death and just leave.

Phase Nine: The Emeritus

All good things eventually come to an end and so will open source project maintainership. Eventually the project will either become irrelevant or the torch will be passed to someone else. This is, in a way, the greatest thing a project maintainer could hope for: being able to create a new entity that will keep on being used even after you have stopped working on it.

Open source maintainership is a relatively young field and most projects at the end of their life cycle either become unmaintained zombies or get replaced by a new project written from scratch. Ee don't have that much experience on what emerituses do. Based on other fields these may range from "nothing" to doing conference talks, advising current maintainers on thorny issues.

Saturday, November 7, 2020

Proposal for target-private directories for compilers

One of the greatest strengths of the classical C compiler model is that all compile jobs are fully isolated. This means that they can be run perfectly in parallel. This same feature is also one of its greatest weaknesses. There are no ways for individual compile jobs to communicate with each other even if they wanted to. This could be useful for things like caching. As an example a compiler might transparently create "precompiled headers" of sorts during one compilation and use them in other compilations if needed. This might also be useful for languages that require scanning steps before building such as Fortran and C++ using modules.

This is not idle speculation. Clang's thinLTO does use caching to speed up incremental builds. Since there is no existing standard for this, they did the usual thing and created a new compiler flag for specifying the location of the cache directory. Or, to be precise, they created four of them:

  • gold (as of LLVM 4.0): -Wl,-plugin-opt,cache-dir=/path/to/cache
  • ld64 (support in clang 3.9 and Xcode 8): -Wl,-cache_path_lto,/path/to/cache
  • ELF lld (as of LLVM 5.0): -Wl,--thinlto-cache-dir=/path/to/cache
  • COFF lld-link (as of LLVM 6.0): /lldltocache:/path/to/cache

For one option this is tedious but for many it becomes completely unworkable. Clearly something better is needed.

The basic idea: each build target gets its own private directory

Basically what one should be able to do is this:

gcc -c -o foo/bar.o bar.c -fprivate-dir=some_path

The private directory would have the following properties:

  • The build system guarantees that it is set to the same value for all compilations of a single target (executable, shared library, static library, etc)
  • Every build target gets its own unique private directory
  • The contents of the directory may persist over successive invocations (i.e. its contents may be deleted at any time, but most of the time won't be)
  • The compiler can create anything it wants in the private dir but should also tolerate other usages (typically you'd also want to put the target's object files in this dir)
  • The contents of the private dir are transitory, they have no backwards or forwards compatibility guarantees. Any compiler update would invalidate all files.
If, for example, compilers wanted to create pipes or Unix domain sockets in the private dir for communicating between compiler instances, they could do that behind the scenes.

Work needed for tooling

Meson and CMake already to pretty much exactly this as they store object files in special directories. I don't know enough about Autotools to know how much work it would be, though it does have the concept of higher level build targts. Handwritten Makefiles would need to be tweaked by hand as with every change. Visual Studio solutions are already split up to per-target project files so adding new flags there should be fairly simple.

The best part is that this change would be fully backwards compatible. If the private dir argument is not used, the compilers would behave in exactly the same way they do now.

Monday, November 2, 2020

You wanted Boost via Meson subprojects? You got it! (sorta)

In the previous blog post we saw a way to build SDL transparently as a Meson subproject. In the discussion that followed I got a question on whether you could consume Boost in the same way. This is an interesting question, because Boost is a, let's say, challenging dependency. It is very big and set up in an unusual way. As an example I would estimate that the single fact that they don't ship Pkg-Config files has cost Meson developers tens of hours of unnecessary troubleshooting. Having something simpler and more reliable would be welcome.

To test this out I created an MVP program that uses Boost's flat map from the container library and then added dependencies until it worked. The actual code can be downloaded here (tested on Linux, VS and Mac). The main program's basic build definition is as simple as the SDL program's was:

boost_container_dep = dependency('boost-container')
executable('boosttest', 'boosttest.cpp',
           dependencies: [boost_container_dep])

The Boost container dep is specified in the container library's build file:

boost_container_dep = declare_dependency(
  include_directories: 'include',
  dependencies: [...])

As this is a header-only library, the only thing it needs to do is to expose the header dir. The dependencies keyword argument lists all the other dependencies that are needed to build code that uses the container library. These are move, assert, static_assert, intrusive, core and config. Their build files are almost identical to this one. No code changes were needed. The total LoC of meson.build files for this entire setup is 42. Which is apt.

Making it better

The main reason for this low line count is the fact that the Meson build definition do a lot less than the original ones. They do a lot are highly configurable, which might also explain why Boost's conversion to CMake has taken several years and is still ongoing. A lot of that effort is taken by things like documentation, but the actual build is also more complicated as it provides for more stuff. Here are two examples and an outline of how they would be implemented in Meson.

Built libraries

Some Boost libraries are header-only, others require some code to be built and linked against. Suppose we have a header-only dependency. Its dependency object would be defined like this:

foo_dep = declare_dependency(include_directories: 'include')

Converting that to contain a library component would be done like this:

foo_lib = library(...)
foo_dep = declare_dependency(include_directories: 'include',
                             link_with: foo_lib)

Basically you build the library and then tell users to link to that. This is pretty much what the SDL build definitions did. The library can be shared or static, they both work the same way.

Compiler flags needed for using the dependency

Some of the libraries seem to require that users of the library must specify some compiler command line arguments. These might or might not be the same ones that are used to build the library components themselves. This is natively supported.

foo_dep = declare_dependency(include_directories: 'include',
                             compile_args: ['-DUSING_FOO'])

How much work would it be to convert all of Boost?

It depends, but quite a lot in any case. Boost is very big. If one attempts to reach feature parity with the current build system it would be a very, very big effort. I'm told that there are parts of Boost that have circular dependencies between projects and Meson does not support those (as in: they are inexpressible). Meson's HP-UX support is also still a work in progress (or, to be more exact, a work not in progress, at least as far as I'm aware of).

Doing a simple conversion that only needs to deal with the code on common platforms, on the other hand, would be doable. It would require a small team of dedicated people, because trying to do it alone would just lead to a massive burnout, but it could be done.

Friday, October 30, 2020

How to build dependencies as Meson subprojects using SDL as an example

Today we released version 0.56.0 of the Meson build system. This is an especially important release as it marks the 10 000th commit since the start of the project. A huge thank you to everyone who has contributed their time and effort, this project would not exist without all of you. However in this post we are not going to talk about that, those interested can find further details in the release notes. Instead we are going to be talking about how to build your dependencies from source on every platform without needing anything other than Meson. 

Last month I had a lightning talk at CppCon about this way of managing dependencies:

Since then there have been many improvements to the workflow for a smoother experience. To demonstrate this I upgraded the sample program to use SDL Mixer and SDL Image instead of relying on plain SDL. The code is available in this Github repo (only tested on Windows because I ran out of time to do proper multiplatform testing)  The core of the build definition is this:

sdl2_dep = dependency('sdl2')
sdl2_image_dep = dependency('sdl2_image')
sdl2_mixer_dep = dependency('sdl2_mixer')
executable('sdltestapp', 'main.cpp',
  dependencies : [sdl2_image_dep, sdl2_mixer_dep, sdl2_dep],
  win_subsystem: 'windows')

This has always worked on Linux and other platforms that provide system dependencies via Pkg-Config. As of the latest release of Meson and newest code from WrapDB, this works transparently on all platforms. Basically for every dependency you want to build yourself, you need a wrap file, like this:

The contents consist mostly of download links, hashes and build meta info. Upstream sources are not checked in your repo (unless you want to) so it remains small. The actual files are in the repository linked above. When you start building and the project needs some dependency, Meson will use info in wrap files to download, patch and build the dependencies as subprojects as needed. This is what it looks like when configuring on Windows using MSVC without any external dependency providers.

Then you can build it. Here I'm using Visual Studio Code because why not.

The end result runs directly as a native Windows application (text annotations not part of the original program) using SDL's Direct3D accelerated rendering.

There are three different source files loaded using SDL Image: a png file, a jpg file and an LZW compressed tif file. The app is also playing sound effects in Ogg Vorbis and wav formats using SDL Mixer. Dependencies of dependencies work automatically as one would expect. All dependency libraries are statically linked as it is the easiest way to get things working under Windows (at least until your project gets too big).

If you want to try it yourself, here's what you need to do:

  1. Install Visual Studio, Meson and Ninja (not strictly required, you can use the VS backend instead if you wish)
  2. Open the VS x64 dev tools shell.
  3. Clone the repo and cd inside it.
  4. Run meson build
  5. Run ninja -C build
The executable should now be at the root of the build dir ready to run.

Contributing to the WrapDB

The biggest problem facing WrapDB has at the moment is its selection of available libraries is fairly small. Fortunately it is easy to contribute new dependencies. There are two main cases.

Submitting projects that already build with Meson

  1. Make sure your project gets all its dependencies via the dependency function rather than invoking subproject directly and provide a suitable dependency object via declare_dependency.
  2. Create an upstream.wrap file with the necessary info. See the documentation or, if you prefer examples, how it is done in libtiff.
  3. Request that your project be added to WrapDB as described in the documentation linked above.
That's pretty much it. Once your project is in the WrapDB it can be immediately used by any other Meson project.

Contributing projects that do not build with Meson

The basic outline is the same as above, except that first you need to rewrite the project's build system in Meson. I'm not going to lie: sometimes this is a fair bit of work. Sometimes it is not the most pleasant thing in the world. No pain, no gain and all that.

Given that there are thousands upon thousands of projects available, which ones should be converted first? The most obvious answer is the ones that you personally need, because then you are doing work that directly benefits yourself. After that it gets trickier. One good approach is to look at established cross platform open source projects. Many of them have a directory where they store copies of their third party dependencies. Any libraries there are most likely needed by other programs as well. Here are links to some of them:

Monday, October 26, 2020

The Meson Manual: Good News, Bad News and Good News

 Starting with good news, the Meson Manual has been updated to a third edition. In addition to the usual set of typo fixes, there is an entirely new chapter on converting projects from an existing build system to Meson. Not only are there tips and tricks on each part of the conversion, there is even guidance on how to get it done on projects that are too big to be converted in one go.

Unfortunately there are also bad news, which boils down to this graph.

This is the monthly sales amounts (in euros) since the beginning of this year. As you can tell it follows a typical exponential decay with a peak at the beginning and then a steady decline. Sales for this month are expected to be noticeably smaller than last month. Unfortunately keeping online sales for a product like this ongoing requires both time and money (to pay for the various services needed). Unfortunately the numbers show that this is not financially worthwhile.

Thus I must inform everyone that the manual will be discontinued. You can still buy it and the download links will stay valid until the end of this year. Thank you to everyone who bought the book and especially to those who sent me feedback and typo fixes.

To end on a positive note, there has been a final price reduction and the manual can now be bought for just €19.95.

FAQ: Will the manual be available under a free license after sales end?

No. At least not in the foreseeable future.

Tuesday, October 20, 2020

Cargo-style dependency management for C, C++ and other languages with Meson

My previous blog post about modern C++ got a surprising amount of feedback. Some people even reimplemented the program in other languages, including one in Go, two different ones in Rust and even this slightly brain bending C++ reimplementation as a declarative style pipeline. It also got talked about on Reddit and Hacker news. Two major comments that kept popping up were the following.

  1. There are potential bugs if the program is ever extended to support non-ASCII text
  2. It is hard to use external dependencies in C++ (and by extension C)
Let's solve both of these problems at the same time. There are many "lite" solutions for doing Unicode aware text processing, but we're going to use the 10 000 kilogram hammer: International Components for Unicode. The actual code changes are not that big, interested parties can go look up the details in this Github repo. The thing that is relevant for this blog post is the build and dependency management. The Meson build definition is quite short:

project('cppunicode', 'cpp',
        default_options: ['cpp_std=c++17',
                          'default_library=static'])
             
icu_dep = dependency('icu-i18n')
thread_dep = dependency('threads')
executable('cppunicode', 'cppunicode.cpp',
           dependencies: [icu_dep, thread_dep])

The threads dependency is for the multithreaded parts (see end of this post). I developed this on Linux and used the convenient system provided ICU. Windows and Mac do not provide system libs so we need to build ICU from scratch on those platforms. This is achieved by running the following command in your project's source root:

$ meson wrap install icu
Installed icu branch 67.1 revision 1

This contacts Meson's WrapDB server and downloads build definition files for ICU. That is all you need to do. The build files do not need any changes. When you start building the project, Meson will automatically download and build the dependency. Here is a screenshot of the download step:

Here it is building in Visual Studio:

And here is the final built result running on macOS:

One notable downside of this approach is that WrapDB does not have all that many packages yet. However I have been told that given the next Meson release (in a few weeks) and some upstream patches, it is possible to build the entire GTK widget toolkit as a subproject, even on Windows. 

If anyone wants to contribute to the project, contributions are most welcome. You can for example convert existing projects and submit them to wrapdb or become a reviewer. The Meson web site has the relevant documentation

Appendix: making it parallel

Several people pointed out that while the original program worked fine, it only uses one thread. This may be a bottleneck and that "in C++ it is hard to execute work in parallel". This is again one of those things that has gotten a lot better in the last few years. The "correct" solution would be to use the parallel version of transform_reduce. Unfortunately most parallel STL implementations are still in the process of being implemented so we can't use those in multiplatform code. We can, however, roll our own fairly easily, without needing to create or lock a single mutex by hand. The code has the actual details, but the (slightly edited) gist of of it is this:

for(const auto &e:
    std::filesystem::recursive_directory_iterator(".")) {
    futures.emplace_back(std::async(std::launch::async,
                                    count_word_files,
                                    e));
    if(futures.size() > num_threads) {
        pop_future(word_counts, futures);
    }
}
while(!futures.empty()) {
   pop_future(word_counts, futures);
}

Here the count_word_files function calculates the number of words in a single file and the pop_future function joins individual results to the final result. By using a share-nothing architecture, pure functions and value types all business logic code can be written as if it was single threaded and the details of thread and mutex management can be left to library code. Haskell fans would be proud (or possibly horrified, not really sure).