Writing out C++ module files and importing them is awfully complicated. The main cause for this complexity is that the C++ standard can not give requirements like "do not engage in Vogon-level stupidity, as that is not supported". As a result implementations have to support anything and everything under the sun. For module integration there are multiple different approaches ranging from custom on-the-fly generated JSON files (which neither Ninja nor Make can read so you need to spawn an extra process per file just to do the data conversion, but I digress) to custom on-the-fly spawned socket server daemons that do something. It's not really clear to me what.
Instead of diving to that hole, let's instead approach the problem from first principles from the opposite side.
The common setup
A single project consists of a single source tree. It consists of a single executable E and a bunch of libraries L1 to L99, say. Some of those are internal to the project and some are external dependencies. For simplicity we assume that they are embedded as source within the parent project. All libraries are static and are all linked to the executable E.
With a non-module setup each library can have its own header/source pair with file names like utils.hpp and utils.cpp. All of those can be built and linked in the same executable and, assuming their symbol names won't clash, work just fine. This is not only supported, but in fact quite common.
What people actually want going forward
The dream, then, is to convert everything to modules and have things work just as they used to.
If all libraries were internal, it could be possible to enforce that the different util libraries get different module names. If they are external, you clearly can't. The name is whatever upstream chooses it to be. There are now two modules called utils in the build and it is the responsibility of someone (typically the build system, because no-one else seems to want to touch this) to ensure that the two module files are exposed to the correct compilation commands in the correct order.
This is complex and difficult, but once you get it done, things should just work again. Right?
That is what I thought too, but that is actually not the case. This very common setup does not work, and can not be made to work. You don't have to take my word for it, here is a quote from Jonathan Wakely:
This is already IFNDR, and can cause standard ODR-like issues as the name of the module is used as the discriminator for module-linkage entities and the module initialiser function. Of course that only applies if both these modules get linked into the same executable;
IFNDR (ill-formed, no diagnostic required) is a technical term for "if this happens to you, sucks to be you". The code is broken and the compiler is allowed to whatever it wants with it (including s.
What does it mean in practice?
According to my interpretation of Jonathan's comment (which, granted, might be incorrect as I am not a compiler implementer) if you have an executable and you link into it any code that has multiple modules with the same name, the end result is broken. It does not matter how the same module names get in, the end result is broken. No matter how much you personally do not like this and think that it should not happen, it will happen and the end result is broken.
At a higher level this means that this property forms a namespace. Not a C++ namespace, but a sort of a virtual name space. This contains all "generally available" code, which in practice means all open source library code. As that public code can be combined in arbitrary ways it means that if you want things to work, module names must be globally unique in that set of code (and also in every final executable). Any duplicates will break things in ways that can only be fixed by renaming all but one of the clashing modules.
Globally unique modules names is thus not a "recommendation", "nice to have" or "best practice". It is a technical requirement that comes directly from the compiler and standard definition.
The silver lining
If we accept this requirement and build things on top of it, things suddenly get a lot simpler. The build setup for modules reduces to the following for projects that build all of their own modules:
- At the top of the build dir is a single directory for modules (GCC already does this, its directory is called gcm.cache)
- All generated module files are written in that directory, as they all have unique names they can not clash
- All module imports are done from that directory
- Module mappers and all related complexity can be dropped to the floor and ignored
So now you have two choices:
- Accept reality and implement a system that is simple, reliable and working.
- Reject reality and implement a system that is complicated, unreliable and broken.
> module files are not stable and do not work between different compiler versions or even when compiler flags differ
ReplyDelete> AFAIK there is no timeline for when that will be implemented.
This is not a goal. Module files are essentially better PCH, they are not meant to be a stable artifact. Consumers compile the module files from the library's interface files as needed.
That depends on who you ask. With Visual Studio the module output files have a stable format. In any case many people think that module files will completely replace headers for prebuilt libs, which is why I wrote that piece to make it clear it will not be happening in the near future (if at all).
Delete