Sunday, September 28, 2025

In C++ modules globally unique module names seem to be unavoidable, so let's use that fact for good instead of complexshittification

Writing out C++ module files and importing them is awfully complicated. The main cause for this complexity is that the C++ standard can not give requirements like "do not engage in Vogon-level stupidity, as that is not supported". As a result implementations have to support anything and everything under the sun. For module integration there are multiple different approaches ranging from custom on-the-fly generated JSON files (which neither Ninja nor Make can read so you need to spawn an extra process per file just to do the data conversion, but I digress) to custom on-the-fly spawned socket server daemons that do something. It's not really clear to me what.

Instead of diving to that hole, let's instead approach the problem from first principles from the opposite side.

The common setup

A single project consists of a single source tree. It consists of a single executable E and a bunch of libraries L1 to L99, say. Some of those are internal to the project and some are external dependencies. For simplicity we assume that they are embedded as source within the parent project. All libraries are static and are all linked to the executable E.

With a non-module setup each library can have its own header/source pair with file names like utils.hpp and utils.cpp. All of those can be built and linked in the same executable and, assuming their symbol names won't clash, work just fine. This is not only supported, but in fact quite common.

What people actually want going forward

The dream, then, is to convert everything to modules and have things work just as they used to.

If all libraries were internal, it could be possible to enforce that the different util libraries get different module names. If they are external, you clearly can't. The name is whatever upstream chooses it to be. There are now two modules called utils in the build and it is the responsibility of someone (typically the build system, because no-one else seems to want to touch this) to ensure that the two module files are exposed to the correct compilation commands in the correct order.

This is complex and difficult, but once you get it done, things should just work again. Right?

That is what I thought too, but that is actually not the case. This very common setup does not work, and can not be made to work. You don't have to take my word for it, here is a quote from the GCC bug tracker:

This is already IFNDR, and can cause standard ODR-like issues as the name of the module is used as the discriminator for module-linkage entities and the module initialiser function.  Of course that only applies if both these modules get linked into the same executable;

IFNDR (ill-formed, no diagnostic required) is a technical term for "if this happens to you, sucks to be you". The code is broken and the compiler is allowed to whatever it wants with it.

What does it mean in practice?

According to my interpretation of thiscomment (which, granted, might be incorrect as I am not a compiler implementer) if you have an executable and you link into it any code that has multiple modules with the same name, the end result is broken. It does not matter how the same module names get in, the end result is broken. No matter how much you personally do not like this and think that it should not happen, it will happen and the end result is broken.

At a higher level this means that this property forms a namespace. Not a C++ namespace, but a sort of a virtual name space. This contains all "generally available" code, which in practice means all open source library code. As that public code can be combined in arbitrary ways it means that if you want things to work, module names must be globally unique in that set of code (and also in every final executable). Any duplicates will break things in ways that can only be fixed by renaming all but one of the clashing modules.

Globally unique modules names is thus not a "recommendation", "nice to have" or "best practice". It is a technical requirement that comes directly from the compiler and standard definition.

The silver lining

If we accept this requirement and build things on top of it, things suddenly get a lot simpler. The build setup for modules reduces to the following for projects that build all of their own modules:

  • At the top of the build dir is a single directory for modules (GCC already does this, its directory is called gcm.cache)
  • All generated module files are written in that directory, as they all have unique names they can not clash
  • All module imports are done from that directory
  • Module mappers and all related complexity can be dropped to the floor and ignored
Importing modules from the system might take some more work (maybe copy Fortran and have a -J flag for module search paths). However at the time of writing GCC and Clang module files are not stable and do not work between different compiler versions or even when compiler flags differ between export and import. Thus prebuilt libraries can not be imported as modules from the system until that is fixed. AFAIK there is no timeline for when that will be implemented.

So now you have two choices:

  1. Accept reality and implement a system that is simple, reliable and working.
  2. Reject reality and implement a system that is complicated, unreliable and broken.
[Edit, fixed quote misattribution.]

Saturday, September 6, 2025

Trying out import std

Since C++ compilers are starting to support import std, I ran a few experiments to see what the status of that is. GCC 15 on latest Ubuntu was used for all of the following.

The goal

One of the main goals of a working module implementation is to be able to support the following workflow:

  • Suppose we have an executable E
  • It uses a library L
  • L and E are made by different people and have different Git repositories and all that
  • We want to take the unaltered source code of L, put it inside E and build the whole thing (in Meson parlance this is known as subproject)
  • Build files do not need to be edited, i.e. the source of L is immutable
  • Make the build as fast as reasonably possible

The simple start

We'll start with a helloworld example.

This requires two two compiler invocations.

g++-15 -std=c++26 -c -fmodules -fmodule-only -fsearch-include-path bits/std.cc

g++-15 -std=c++26 -fmodules standalone.cpp -o standalone

The first invocation compiles the std module and the second one uses it. There is already some wonkiness here. For example the documentation for -fmodule-only says that it only produces the module output, not an object file. However it also tries to link the result into an executable so you have to give it the -c argument to tell it to only create the object file, which the other flag then tells it not to create.

Building the std module takes 3 seconds and the program itself takes 0.65 seconds. Compiling without modules takes about a second, but only 0.2 seconds if you use iostream instead of println.

The module file itself goes to a directory called gcm.cache in the current working dir:

All in all this is fairly painless so far.

So ... ship it?

Not so fast. Let's see what happens if you build the module with a different standards version than the consuming executable.

It detects the mismatch and errors out. Which is good, but also raises questions. For example what happens if you build the module without definitions but the consuming app with -DNDEBUG. In my testing it worked, but is it just a case of getting lucky with the UB slot machine? I don't know. What should happen? I don't know that either. Unfortunately there is an even bigger issue lurking about.

Clash of File Name Clans

If you are compiling with Ninja (and you should) all compiler invocations are made from the same directory (the build tree root). GCC also does not seem to provide a compiler flag to change the location of the gcm.cache directory (or at least it does not seem to be in the docs). Thus if you have two targets that both use import std, their compiled modules get the same output file name. They would clobber each other, so Ninja will refuse to build them (Make probably ignores this, so the files end up clobbering each other and, if you are very lucky, only causes a build error).

Assuming that you can detect this and deduplicate building the std module, the end result still has a major limitation. You can only ever have one standard library module across all of your build targets. Personally I would be all for forcing this over the entire build tree, but sadly it is a limitation that can't really be imposed on existing projects. Sadly I know this from experience. People are doing weird things out there and they want to keep on weirding on. Sometimes even for valid technical reasons.

Even if this issue was fixed, it does not really help. As you could probably tell, this clashing will happen for all modules. So if your ever has two modules called utils, no matter where they are or who wrote them, they will both try to write gcm.cache/utils.gcm and either fail to build, fail on import or invoke UB.

Having the build system work around this by changing the working directory to implicitly make the cache directory go elsewhere (and repoint all paths at the same time) is not an option. All process invocations must be doable from the top level directory. This is the hill I will die on if I must!

Instead what is needed is something like the target private directory proposal I made ages ago. With that you'd end up with command line arguments roughly like this:

g++-15 <other args> --target-private-dir=path/to/foo.priv --project-private-dir=toplevel.priv

The build system would guarantee that all compilations for a single target (library, exe, etc) have the same target private directory and all compilations in the build tree get the same top level private directory. This allows the compiler to do some build optimizations behind the scenes. For example if it needs to build a std module, it could copy it in the top level private directory and other targets could copy it from there instead of building it from scratch (assuming it is compatible and all that).