Friday, November 27, 2020

How Apple might completely take over end users' computers

Many people are concerned about Apple's ongoing attempts to take more and more control of end user machines from their users. Some go so far as to say that Apple won't be happy until they have absolute and total control over all programs running on end user devices, presumably so that they can enforce their 30% tax on every piece of software. Whether this is true or not we don't really know.

What we can do instead is a thought experiment. If that was their end goal, how would they achieve it? What steps would they take to obtain this absolute control? Let's speculate.

Web apps

A common plan against tightening app store requirements is to provide a web app instead. You can do a lot of cool things with WebAssembly and its state is continuously improving. Thus it must be blocked. This is trivial: require that web browsers may only run WASM programs that are notarized by Apple. This is an easy change to sell, all it needs is a single tear jerking please-think-of-the-children presentation about the horrible dangers of online predators, Bitcoin miners and the like. Notarization adds security and who wouldn't want to have more of that?

There is stilll the problem that you can run an alternative browser like Chrome or Firefox. This can be solved simply by adding a requirement that third party browsers can only get notarized if they block all non-notarized web apps. On iOS this is of course already handled by mandating that all browsers must use the system's browser engine. At some point this might get brought over to macOS as well. For security.

Javascript

Blocking WASM still leaves the problem of Javascript. There is a lot of it and even Apple can not completely block non-notarized JS from running. Here you have to run the long game. An important step is, surprisingly, to drive the adoption of WebAssembly. There are many ways of doing this, the simplest is to stop adding any new JS functionality and APIs that can instead be done in WebAssembly. This forces app developers to either drop Apple support or switch to WASM. This transition can be boosted by stopping all development and maintenance on the existing JS engine and letting it bitrot. Old web pages will get worse and worse over time and since Apple won't fix their browser, site operators will be forced to upgrade to technologies like WASM that come with mandatory notarization. For security.

Scripting languages

Scripting languages such as Perl and Python can be used to run arbitrary programs so they must go. First they are removed from the core install so people have to download and install them separately. That is only an inconvenience, though. To achieve total control notarization requirements must again be updated. Any program that loads "external code" must add a check that the code it is running is notarized by Apple. At first you will of course allow locally written script files to be run, as long as you first hunt through system security settings to add run permissions to the script file. This must be done with physical human interaction like a mouse or touchpad. It must not be automatable to prevent exploits. The obtained permissions are of course revoked every time the file is edited. For security.

Compilers

There is still a major hole in this scheme: native compilers. It might be tedious, but it is possible to compile even something as big as Firefox from scratch and run the result. Therefore this must be blocked, and notarization is again the key. This can be done by requiring all binaries, even self-built ones, to be notarized. This is again easy to sell, because it blocks a certain class malware injection attacks. Following iOS's lead you have to get a developer certificate from Apple to sign your own code to run on your own machine.

Once the basic scheme is in place you have to tighten security and block the signing from any compiler except the Apple provided system one. This has to be done for security, because existing third party compilers may have bugs and features that could be used to circumvent notarization requirements somehow. Only Apple can get this right as the system provider. There must be one, and only one, way of going from source code to executable binaries and that path must be fully controlled by Apple. For security.

Total separation of development and use

Even with all this you can still compile and run your own code, meaning people will find ways of getting around these requirements and doing what they want to do rather than what Apple permits them to do. This means that even tighter reins are required. The logical end result is to split the macOS platform into two separate entities. The first one is the "end user" system that can only run Apple-notarized apps and nothing else. The second is the "dev platform" that runs only XCode, Safari (in some sort of a restricted mode) and one other program that has to have been fully compiled on the current physical machine. Remote compilation servers are forbidden as they are a security risk. This is roughly how iOS development and things like game console dev kits already work. The precedent is there, waiting for the final logical step to be taken.

This has the side effect that every developer who wants to support Apple platforms now has to buy two different Apple laptops, one for development and one for testing. But let us be absolutely clear about one thing. This is not done to increase sales and thus profits. No. Not under any circumstances! It is for a higher purpose: for security.

Saturday, November 21, 2020

Adding (very) preliminary support for C++ modules in Meson

One of the most common questions people ask about Meson is why does it not yet have support for building C++ modules. Up until now the answer has been simple: no compiler really supports it yet. However Visual Studio has added sufficient functionality in their latest 2019 developer preview that an implementation in Meson has become feasible. The actual code can be found in this merge request for those brave enough to try it out.

The basic problem with C++ modules is the same as with Fortran modules: you can no longer build source files in an arbitrary order. Instead you have to scan the contents of files, see what modules each source file generates and consumes and orchestrate the build so that all source files that produce modules are built before any source files that consume them. This requires dynamic dependency generation that has been added to Ninja only fairly recently.

The original idea was that compiler toolchain vendors would provide scanner binaries because parsing C++ code is highly unreliable due to C preprocessor macro shenanigans. It turns out that a "toolchain provided" dependency scanner can not obtain all necessary data reliably, because it requires higher level knowledge about the project setup. This can only be done reliably by the build system. An alternative would be to pass all this information to the compiler/scanner via compiler flags but that turns out to be a terrible thing to define and maintain stable over changes. It also has the downside that you'd need to spawn a single process for every file, which is fairly slow on Windows. Meson's approach is to write a custom dependency scanner. Yes, it is based on regexes, so it is not 100% reliable but on the other hand you only need to spawn one process per build target (exe, shared lib, static lib) as opposed to one per source file.

Still, the end result does work for simple projects. It does not handle things like module partitions but those can be added later. Even this simple project and test has brought about several notes and questions:

  • Where should the generated module files be put? In the target private dir? In a global dir? If the latter, what happens if two unrelated parts in the code base specify the same module?
  • Microsoft has not documented the module compiler flags and cl /? does not even list them. Because of this all module files get dumped to the build directory root.
  • Only ixx files are supported. VS does not enforce file name extensions. I would really want to enforce module file name extensions to only one. We can't change change legacy code and force everyone to use a single extension for C++ source, but we totally should do that for new ones. Having to support many file name extensions for the same thing is madness.
Sadly I don't have any numbers on how much modules improve compilation speed. Feel free to try it out yourself, though. Bug reports and especially fixes are welcome.

Monday, November 16, 2020

The Nine Phases of an Open Source Project Maintainer

There is more to running an open source project than writing code. In fact most of all work has to do with something else. This places additional requirements to project maintainers that are often not talked about. In this post we'll briefly go over nine distinct phases each with a different hat one might have to wear. These can be split into two stages based on the lifetime and popularity of the project.

Stage One: The Project Is Mostly for Yourself

Almost all projects start either with just one person or a small team of just a few people. At the start doing things is easy. Breaking changes can be made on a whim. Programming languages and frameworks can be changed. It is even possible to pivot to something completely different without a care in the world. As there are only a few stakeholders and they typically have similar ideologies and thus it is easy to get consensus. It is even possible to ignore consensus altogether and "just do it".

Phase One: The Inventor

Everything starts with an idea: how something could be done differently or in a more efficient way. This is the part that tends to get fetishised by journalists and even some inventors themselves. The typical narrative explains how a single genius managed to create a revolutionary new thing all on their own in a basement somewhere. The reality is not quite as glamorous, as almost all ideas are ones that many, many other people have already come up with. Some people go as far as to say that ideas are worthless, only execution matters. This is a bit extreme but nevertheless coming up with ideas is an important skill.

Phase Two: The MVP Implementer

Once an idea is chosen, some sort of a prototype needs to be written. This is the most fun part of coding. There are vast green fields where you can do whatever, design the architecture as you want and get to solve interesting problems that form the core of the eventual proudct. This phase is the main reason why people become programmers. Getting to create something completely new is a joyful experience. Still, not everything is wine and roses, as it is important to focus enough to get the first version finished rather than going off on all sorts of tangents and eventually losing interest.

Phase Three: The Ditch Digger

Once the first version exists and is found usable, the next step is to make it production ready. This is where the nature of project work takes a very sharp turn. Whereas the previous stage could be described as fun, this phase is tedious. It consists of making the end product reliable and smooth in the face of real world input and usage. This typically exposes bugs and design flaws in the original implementation that need to be fixed or rewritten. It is easy to get discouraged in this phase because the outcome of days of coding might be "the same as before, but also handles this one rare corner case".

The work profile is similar to digging a ditch with a shovel. It's dirty, heavy and taxing work and there are not that many rewards to be had. After all, a half dug ditch is about as useless as a completely undigged ditch. It's only when you reach the end and water starts flowing that you get any benefits. The difference between physical ditches and sotware is that there is no reliable way of estimating how much more you still have to dig. This is a very risky phase of any project as it carries the potential for burnout.

Phase Four: The Documentation Writer

Every new project needs documentation, but some projects need it more than others. Programmers are typically not very eager to write documentation or to keep it up to date. Telling users to "read the source" to find out how to do things is not enough, because people don't want to have to learn about implementation details of your project, they just want to use it. Sometimes it is possible to get other people to write documentation, but usually that only happens after the project has "made it big".

One way of looking at documentation is that it is a competitive advantage. If there are multiple competing projects for the same thing and one of them has better documentation, it has a higher chance of winning (all other things being equal). Writing end user documentation requires a completely different approach and skill set than writing code. This is especially true for things like tutorials as opposed to reference documentation.

Phase Five: The Marketer

Build a better mousetrap and the world will ignore you, tell you that their mouse trap situation is perfectly fine thankyouverymuch and why don't you get a real job rather than wasting your time on this whateveritis, as it will never work. If you want to make them change their mind, you need marketing. Lots of it.

There are many different ways of making your project more known: writing blog posts, presenting at conferences, general online advocacy and so on. This requires, again, a new set of skills, such as talking to a large group of people in public. This is especially true for programmers who are mostly introverted, but sadly the meek don't inherit the earth. It tends to go to those who can make the most noise about their solution.

Stage Two: The Project Is Mostly for Other People

As the project becomes bigger and more used, eventually another tipping point is reached. Here the program is no longer catering to the needs of the original creator but to the community at large. The rate of change reduces dramatically. Breaking changes can no longer be made at a quick pace or possibly at all. It is also possible that the landscape has changed and the project is now being used in a different way or for different ends than was originally planned. All of this means that the project runner needs to spend more and more time solving issues that does not directly benefit themselves. This may cause friction if, for example, the project leader works for a company that has other priorities and does not want the person to spend time on things that don't benefit them directly.

Phase Six: The Recruiter

A project that does not keep refreshing and growing its developer base is a dead one. Typically a project needs to have a sizable chunk of users before other people start contributing to it in a major way. Sometimes people become involved voluntarily, but it's even better if you can somehow actively encourage them to contribute. That is only part of the story, though, since they need to be trained and taught the processes and so on. Paradoxically getting new contributors slows down development at first, but eventually makes things faster as the workload can be split among multiple people.

Phase Seven: The Culture Cultivator

Every project has its own set of unspoken guidelines. These get established quite early on and include things like requiring tests for every new feature, not using coding patterns X, Y or Z but use H, J and K instead and so on. People are generally quite good at detecting these and doing the same thing as everyone else. As the pool of contributors grows, this becomes less and less common. Contributions tend to become more lax. This is not due to malice, but simply because people are not aware of the requirements.

It is very easy to slip on these requirements little by little. It is the job of the project leader to make sure this does not happen. This requires both leading by example and also by noting out these issues in code review and other discussions. 

Phase Eight: The Overseer

This phase begins when the project maintainer realizes that they are no longer the person who knows most about the code base. Other people have done most of the coding work for so long that they are the actual experts on it. This causes yet another change in the type of work one needs to do. Up until now the work has been about solving problems and making decisions on things you are intimately familiar with. As an overseer you need to make decisions on things you don't really know about. Earlier decisions were based on code and implementation details, but now decisions are based mostly on what other people say in their merge requests and design discussions.

This is something nobody really prepares you for. Making big decisions based on imperfect information can be really difficult for someone who has gotten used to going through every detail. Once a project gets over a certain size this is just not possible as the human brain is incapable of holding that many details in active memory at the same time. Even if it could, having a single person review everything would be a huge bottleneck. It is (more than) a full time job, and getting someone to pay for a full time maintainer review job is very rare.

Finally, even if this were possible, reviewing is a very tiring job that very few people can keep on doing as their only task for very long. Eventually the mind will start screaming for something else, even for a while. Finally even if someone could do that, contributors would eventually get very annoyed by getting micromanaged to death and just leave.

Phase Nine: The Emeritus

All good things eventually come to an end and so will open source project maintainership. Eventually the project will either become irrelevant or the torch will be passed to someone else. This is, in a way, the greatest thing a project maintainer could hope for: being able to create a new entity that will keep on being used even after you have stopped working on it.

Open source maintainership is a relatively young field and most projects at the end of their life cycle either become unmaintained zombies or get replaced by a new project written from scratch. Ee don't have that much experience on what emerituses do. Based on other fields these may range from "nothing" to doing conference talks, advising current maintainers on thorny issues.

Saturday, November 7, 2020

Proposal for target-private directories for compilers

One of the greatest strengths of the classical C compiler model is that all compile jobs are fully isolated. This means that they can be run perfectly in parallel. This same feature is also one of its greatest weaknesses. There are no ways for individual compile jobs to communicate with each other even if they wanted to. This could be useful for things like caching. As an example a compiler might transparently create "precompiled headers" of sorts during one compilation and use them in other compilations if needed. This might also be useful for languages that require scanning steps before building such as Fortran and C++ using modules.

This is not idle speculation. Clang's thinLTO does use caching to speed up incremental builds. Since there is no existing standard for this, they did the usual thing and created a new compiler flag for specifying the location of the cache directory. Or, to be precise, they created four of them:

  • gold (as of LLVM 4.0): -Wl,-plugin-opt,cache-dir=/path/to/cache
  • ld64 (support in clang 3.9 and Xcode 8): -Wl,-cache_path_lto,/path/to/cache
  • ELF lld (as of LLVM 5.0): -Wl,--thinlto-cache-dir=/path/to/cache
  • COFF lld-link (as of LLVM 6.0): /lldltocache:/path/to/cache

For one option this is tedious but for many it becomes completely unworkable. Clearly something better is needed.

The basic idea: each build target gets its own private directory

Basically what one should be able to do is this:

gcc -c -o foo/bar.o bar.c -fprivate-dir=some_path

The private directory would have the following properties:

  • The build system guarantees that it is set to the same value for all compilations of a single target (executable, shared library, static library, etc)
  • Every build target gets its own unique private directory
  • The contents of the directory may persist over successive invocations (i.e. its contents may be deleted at any time, but most of the time won't be)
  • The compiler can create anything it wants in the private dir but should also tolerate other usages (typically you'd also want to put the target's object files in this dir)
  • The contents of the private dir are transitory, they have no backwards or forwards compatibility guarantees. Any compiler update would invalidate all files.
If, for example, compilers wanted to create pipes or Unix domain sockets in the private dir for communicating between compiler instances, they could do that behind the scenes.

Work needed for tooling

Meson and CMake already to pretty much exactly this as they store object files in special directories. I don't know enough about Autotools to know how much work it would be, though it does have the concept of higher level build targts. Handwritten Makefiles would need to be tweaked by hand as with every change. Visual Studio solutions are already split up to per-target project files so adding new flags there should be fairly simple.

The best part is that this change would be fully backwards compatible. If the private dir argument is not used, the compilers would behave in exactly the same way they do now.

Monday, November 2, 2020

You wanted Boost via Meson subprojects? You got it! (sorta)

In the previous blog post we saw a way to build SDL transparently as a Meson subproject. In the discussion that followed I got a question on whether you could consume Boost in the same way. This is an interesting question, because Boost is a, let's say, challenging dependency. It is very big and set up in an unusual way. As an example I would estimate that the single fact that they don't ship Pkg-Config files has cost Meson developers tens of hours of unnecessary troubleshooting. Having something simpler and more reliable would be welcome.

To test this out I created an MVP program that uses Boost's flat map from the container library and then added dependencies until it worked. The actual code can be downloaded here (tested on Linux, VS and Mac). The main program's basic build definition is as simple as the SDL program's was:

boost_container_dep = dependency('boost-container')
executable('boosttest', 'boosttest.cpp',
           dependencies: [boost_container_dep])

The Boost container dep is specified in the container library's build file:

boost_container_dep = declare_dependency(
  include_directories: 'include',
  dependencies: [...])

As this is a header-only library, the only thing it needs to do is to expose the header dir. The dependencies keyword argument lists all the other dependencies that are needed to build code that uses the container library. These are move, assert, static_assert, intrusive, core and config. Their build files are almost identical to this one. No code changes were needed. The total LoC of meson.build files for this entire setup is 42. Which is apt.

Making it better

The main reason for this low line count is the fact that the Meson build definition do a lot less than the original ones. They do a lot are highly configurable, which might also explain why Boost's conversion to CMake has taken several years and is still ongoing. A lot of that effort is taken by things like documentation, but the actual build is also more complicated as it provides for more stuff. Here are two examples and an outline of how they would be implemented in Meson.

Built libraries

Some Boost libraries are header-only, others require some code to be built and linked against. Suppose we have a header-only dependency. Its dependency object would be defined like this:

foo_dep = declare_dependency(include_directories: 'include')

Converting that to contain a library component would be done like this:

foo_lib = library(...)
foo_dep = declare_dependency(include_directories: 'include',
                             link_with: foo_lib)

Basically you build the library and then tell users to link to that. This is pretty much what the SDL build definitions did. The library can be shared or static, they both work the same way.

Compiler flags needed for using the dependency

Some of the libraries seem to require that users of the library must specify some compiler command line arguments. These might or might not be the same ones that are used to build the library components themselves. This is natively supported.

foo_dep = declare_dependency(include_directories: 'include',
                             compile_args: ['-DUSING_FOO'])

How much work would it be to convert all of Boost?

It depends, but quite a lot in any case. Boost is very big. If one attempts to reach feature parity with the current build system it would be a very, very big effort. I'm told that there are parts of Boost that have circular dependencies between projects and Meson does not support those (as in: they are inexpressible). Meson's HP-UX support is also still a work in progress (or, to be more exact, a work not in progress, at least as far as I'm aware of).

Doing a simple conversion that only needs to deal with the code on common platforms, on the other hand, would be doable. It would require a small team of dedicated people, because trying to do it alone would just lead to a massive burnout, but it could be done.

Friday, October 30, 2020

How to build dependencies as Meson subprojects using SDL as an example

Today we released version 0.56.0 of the Meson build system. This is an especially important release as it marks the 10 000th commit since the start of the project. A huge thank you to everyone who has contributed their time and effort, this project would not exist without all of you. However in this post we are not going to talk about that, those interested can find further details in the release notes. Instead we are going to be talking about how to build your dependencies from source on every platform without needing anything other than Meson. 

Last month I had a lightning talk at CppCon about this way of managing dependencies:

Since then there have been many improvements to the workflow for a smoother experience. To demonstrate this I upgraded the sample program to use SDL Mixer and SDL Image instead of relying on plain SDL. The code is available in this Github repo (only tested on Windows because I ran out of time to do proper multiplatform testing)  The core of the build definition is this:

sdl2_dep = dependency('sdl2')
sdl2_image_dep = dependency('sdl2_image')
sdl2_mixer_dep = dependency('sdl2_mixer')
executable('sdltestapp', 'main.cpp',
  dependencies : [sdl2_image_dep, sdl2_mixer_dep, sdl2_dep],
  win_subsystem: 'windows')

This has always worked on Linux and other platforms that provide system dependencies via Pkg-Config. As of the latest release of Meson and newest code from WrapDB, this works transparently on all platforms. Basically for every dependency you want to build yourself, you need a wrap file, like this:

The contents consist mostly of download links, hashes and build meta info. Upstream sources are not checked in your repo (unless you want to) so it remains small. The actual files are in the repository linked above. When you start building and the project needs some dependency, Meson will use info in wrap files to download, patch and build the dependencies as subprojects as needed. This is what it looks like when configuring on Windows using MSVC without any external dependency providers.

Then you can build it. Here I'm using Visual Studio Code because why not.

The end result runs directly as a native Windows application (text annotations not part of the original program) using SDL's Direct3D accelerated rendering.

There are three different source files loaded using SDL Image: a png file, a jpg file and an LZW compressed tif file. The app is also playing sound effects in Ogg Vorbis and wav formats using SDL Mixer. Dependencies of dependencies work automatically as one would expect. All dependency libraries are statically linked as it is the easiest way to get things working under Windows (at least until your project gets too big).

If you want to try it yourself, here's what you need to do:

  1. Install Visual Studio, Meson and Ninja (not strictly required, you can use the VS backend instead if you wish)
  2. Open the VS x64 dev tools shell.
  3. Clone the repo and cd inside it.
  4. Run meson build
  5. Run ninja -C build
The executable should now be at the root of the build dir ready to run.

Contributing to the WrapDB

The biggest problem facing WrapDB has at the moment is its selection of available libraries is fairly small. Fortunately it is easy to contribute new dependencies. There are two main cases.

Submitting projects that already build with Meson

  1. Make sure your project gets all its dependencies via the dependency function rather than invoking subproject directly and provide a suitable dependency object via declare_dependency.
  2. Create an upstream.wrap file with the necessary info. See the documentation or, if you prefer examples, how it is done in libtiff.
  3. Request that your project be added to WrapDB as described in the documentation linked above.
That's pretty much it. Once your project is in the WrapDB it can be immediately used by any other Meson project.

Contributing projects that do not build with Meson

The basic outline is the same as above, except that first you need to rewrite the project's build system in Meson. I'm not going to lie: sometimes this is a fair bit of work. Sometimes it is not the most pleasant thing in the world. No pain, no gain and all that.

Given that there are thousands upon thousands of projects available, which ones should be converted first? The most obvious answer is the ones that you personally need, because then you are doing work that directly benefits yourself. After that it gets trickier. One good approach is to look at established cross platform open source projects. Many of them have a directory where they store copies of their third party dependencies. Any libraries there are most likely needed by other programs as well. Here are links to some of them:

Monday, October 26, 2020

The Meson Manual: Good News, Bad News and Good News

 Starting with good news, the Meson Manual has been updated to a third edition. In addition to the usual set of typo fixes, there is an entirely new chapter on converting projects from an existing build system to Meson. Not only are there tips and tricks on each part of the conversion, there is even guidance on how to get it done on projects that are too big to be converted in one go.

Unfortunately there are also bad news, which boils down to this graph.

This is the monthly sales amounts (in euros) since the beginning of this year. As you can tell it follows a typical exponential decay with a peak at the beginning and then a steady decline. Sales for this month are expected to be noticeably smaller than last month. Unfortunately keeping online sales for a product like this ongoing requires both time and money (to pay for the various services needed). Unfortunately the numbers show that this is not financially worthwhile.

Thus I must inform everyone that the manual will be discontinued. You can still buy it and the download links will stay valid until the end of this year. Thank you to everyone who bought the book and especially to those who sent me feedback and typo fixes.

To end on a positive note, there has been a final price reduction and the manual can now be bought for just €19.95.

FAQ: Will the manual be available under a free license after sales end?

No. At least not in the foreseeable future.