After building the basics of LO on Windows and macOS the obvious next step is to build all of it. This is just grunt work and quite boring at that actually. I almost got it done apart from the fact that at the end I got lazy and skipped those bits that require weird dependencies (of which more later).

The end result has approximately 7800 individual compilation and linking steps. It takes about 30 minutes on a 16 core Ryzen 3700X CPU using Windows and Visual Studio. On a 7 year old 4 core Macbook it takes around 3 hours.

What is still needed to make it work?

Quite a bit, actually. The most pertinent would be all the configuration files that get installed. There are a lot of them and they need to be exactly right, otherwise the end result fails to start with cryptic error messages, if any. Unlike code compilation there is no easy way to know in advance what should be done or how things should behave. Ideally you'd get help with people who know the innards of the program and all the configgen bits.

The next bit is Java. You can compile LO without Java at all and, from what I can tell, there are plans to replace all Java with native code. Unfortunately that will take some time so in the mean time something would need to be done with this. Meson does have native Java support but, as far as I know, it has never been tested with huge projects so there would probably be some teething problems.

There are also many parts of existing native code that has not been ported due to missing dependencies. There many plugins for different SQL server connectors. They all need the respective client libs. These have not been ported.

All of that would handle perhaps 80% of the work. The remaining 80% would be split evenly across a bunch of places from deployment scripts to CI config to workspace changes and so on. There is arguably a third 80% bit, namely convincing people to adopt the new system.

What horrors interesting features were found lurking in the dependencies?

Oh boy, were there some. In the interest of fairness their names have been anonymized for obvious reasons.

One of the dependencies has not had a release in six years. It builds only with Autotools, generates a config.h header and all that. During porting it seemed that sometimes changes to the configuration header are not picked up. After enough digging it turned out that the header is not included anywhere. It is not even force included with a compiler flag. It is 100% useless.

This is problematic because spawning external processes to do all SIZEOF_INT check and the like are by far the slowest part of Meson and not something that can be easily sped up. Even worse, 99% of the time those checks would still be useless today. If you need integers of a specific size, use stdint.h instead. If your library has its own named integer types, define those via entries provided by stdint.h instead of hacking things by hand. It is faster, more reliable and less effort. It even works on Visual Studio since several years ago. There are still reasons to do the manual thing, but they are extremely rare. So unless you are doing something very low level like GLib or have to support Ultrix from 1992 then there really is no reason not to use stdint.h.

Another dependency claims to support Windows but it does not have any symbol visibility exports but instead relies on GCC exporting all symbols by default so it can't be build as a shared library using Visual Studio, only MinGW.

While Autotools is peculiar in many ways, one of the nicest things about it is that the config.h header it creates has a very rigid form. Thus it is a lot easier to create a tool that reads that file and works out backwards what checks it has done. Meson has a tool for this and converting Autoconf checks with it is actually quite simple. Sadly sometimes people feel the need to reinvent their own. One of LO's dependencies does exactly that. Its config header template has the regular defines. It also has lines like this:

${FOOBAR__O_SOMETHINGSOMETHING}

After a lot of inscrutable macro code written in a Certain Make-based system splattered about the source tree, the end result eventually gets expanded to this:

#define internalname func_name_in_current_libc

It is unclear why this was done and what the advantage was, but the downside is clear. There is no automatic way to convert this, it can only be reverse engineered and reimplemented by a human being. This is slow.

In general there are a lot of weird checks in various projects all of which take time to do. For example there is a fairly modernish C++ dependency that checks for the sleep C function rather than using the builtin this_thread::sleep_for function that has existed for over 11 years now. There are also checks for whether the printf function is available (because, you know, some of your users might be living in the year 1974, one can never be sure about these things). There is even a case where this test fails on the Visual Studio compiler.

All of this is to say that I have now gone with this about as far as I'm willing to do on my own. If there are people who want to see this conversion properly happen, now would be the time to get involved.

Nibble Stew

Wednesday, February 2, 2022

Compiling LibreOffice with Meson even further

What is still needed to make it work?

What horrors interesting features were found lurking in the dependencies?

1 comment: