lauantai 8. helmikuuta 2020

Trying to build a slightly larger slice of Libreoffice with Meson

One of the few comments I got on my previous blog post was that building only the sal library is uninteresting because it is so small. So let's go deeper and build the base platform of LO, which is called Uno. Based on docs and slides from conference presentations, it is roughly the marked area in LO's full dependency graph.

The gloves come off

A large fraction of the code is generated from IDL files. So one might imagine that if one has a file like com/sun/foo/bar.idl then one would convert that into a header file com/sun/foo/bar.hpp using a program called idlc. One would be wrong. That is not what happens at all.

When trying to understand what humongous Make builds are doing, the best approach is not to even try. Do not look at the Makefiles, or the macro code they include. Instead run the build, capture all executed commands with Make's verbose mode and reverse engineer what the sysetm is doing based on them. Implementing the same functionality is a lot easier this way, because you know exactly what command lines you must end up with. In this particular case we find that the idl files are compiled to an intermediate binary blob using a program called unoidl-write.  After that the header files (yes, multiple per idl file) are created from this blob with a different program called cppumaker.

This has the unfortunate side effect that when you run the header generator you can not know in advance what files the program will generate and what sort of a directory hierarchy they are put in without a side channel. This makes it very difficult to integrate with build systems, because they need to know outputs exactly in order to make build steps reliable. Java has the exact same problem. Because of the way it handles inner classes, the output set of a Java compilation is unknowable beforehand. If you ever find yourself designing a code generator tool, please make sure you don't have this Sunism in your program, as it makes integration needlessly complicated and unreliable.

Even if you manage to recreate the command lines with a different system, things may still fail in interesting ways. In LO's case the internal SAL file system library has a policy that callers are not allowed to create temporary files in a directory with a relative pathname. This limitation is reported with the helpful error message of "could not open  for writing". (In case blog aggregators break the formatting, there are two spaces between the words "open" and "for".) It is also possible that relative paths appear to succeed but produce error messages such as "can not read file in legacy format" later in the process.

Other fun stuff

If one were to look at the command lines that eventually get invoked, they would look like this:

This may seem overwhelming, so let's add some markers.

An interesting question is how many processes does this spawn per source file? I don't know the correct answer, but a reasonable guess would be either 4 or 7 (5 or 8 if you count the outer shell).


The code can be obtained via Github. It's Linux only and some bits are fairly ugly. It does generate the UNO code and build all the libs, but unit tests, Java and other pieces are missing and dependency tracking does not work for code generators. The build definitions consist of ~550 lines of Meson. There are a few Python helper programs in addition to these. They have a lot of duplicated code, but for a prototype such as this one it's tolerable.

I tried to go a bit further and build basegfx, but then I got stuck. The generated code has #ifdef toggles that define whether some variables are defined as ints or enums. Other libraries fail to build either when the define is set or when it is unset. For some reason basegfx has multiple files that fail both when the define is set and when it is unset (with different error messages, though). Either the code generation step goes awry or there are even more magic defines that have to be set in order for things to work.

1 kommentti:

  1. Good for you! Come hang out on libreoffice-dev on IRC if you want to ask questions. Lots of magic there, resulting from 20+ years of development and lots of backward-compat stuff necessitating #ifdef magic. I believe I am responsible for some of the enum magic :-)