Tuesday, June 14, 2022

Attempting to create an aesthetic global line breaking algorithm

The Knuth-Plass line breaking algorithm is one of the cornerstones of TeX and why its output looks so pleasing to read (even to people who do not like the look of Computer Modern). While most text editors do line breaking with a quick & dirty algorithm that looks at each line in isolation, TeX does something fancier called minimum raggedness. The basic algorithm defines a global metric over the entire chapter and then chooses line breaks that minimize it. The basic function is the following:

For each line measure the difference between the desired width and the actual width and square the value. Then add these values together.

As you can easily tell, line breaks made at the beginning of the chapter affect the potential line breaks you can do later. Sometimes it is worth it to make a locally non-optimal choice at the beginning to get a better line break possibility much later. Evaluating a global metric like this can be potentially slow, which is why interactive programs like LibreOffice do not use this method.

The classical way of solving this problem is to use dynamic programming. It has the requirement that the problem must conform to a requirement called the Bellman optimality condition (or, if you are into rocketry, the Pontryagin maximum principle). This is perhaps best illustrated with an example: suppose you are in Paris and want to drive to Venice. This requires picking some path to drive that is "optimal" for your requirements. Now suppose we know that Zürich is along the path of this optimal route. The requirement basically says, then, that the optimal route you take from Paris to Zürich does not in any way affect the optimal route from Zürich to Venice. That is, the two paths can be routed independently of each other. This is true for the basic form of Knuth-Plass line breaking.

It is not true for line breaking in practice.

As an example there is an aesthetic requirement that there should not be three or more consecutive lines that end with a hyphen. Suppose you have split the problem in two and that in the top part the last two lines end with a dash and that the first line of the bottom part also ends with a dash. Each of the two parts is optimal in isolation but when combined they'd get the additional penalty of three consecutive hyphens and thus said solution might not be globally optimal.

So then what?

Computers today are a fair bit faster than in the late 70s/early 80s when TeX was developed. The problem size is also fairly small, the average text chapter only contains a few dozen lines (unless you are James Joyce). This leads to the obvious question of "couldn't you just work harder rather than smarter and try all the options?" Sadly the deities of combinatorics say you can't. There are just too many possibilities.

If you are a bit smarter about it, though, you can get most of the way there. For any given point in the raw text there are reasonably only a few places where you could place the optimal line break since every line must be "fairly smooth". The main split point is the one "closest" to the chapter width and then you can try one or two potential split points around it. These choices can be examined recursively fairly easily. So this is what I implemented as a test.

It even worked fairly well for a small sample text and created a good looking set of line breaks in a fraction of a second. Then I tried it with a different sample text that was about twice as long. The program then froze taking 100% CPU and producing no results. Foiled by algorithmic complexity once again!

After a bunch of optimizations what eventually ended up working was to store, for each split point, the N paths with the smallest penalties up to that point. Every time we enter that point the penalty of the current path is evaluated and compared to the list. If the penalty is larger than the worst option then search is abandoned. The resulting algorithm is surprisingly fast and could possibly even be used in real time.

The GUI app

Ideally you'd want to have tests for this functionality. This is tricky, since there is no golden correct answer, only what "looks good". Thus I wrote an application that can be used to examine the behaviour of the program with different texts, fonts and other parameters.

On the left you have the raw editable text, the middle shows how it would get rendered and on the right are the various statistics and parameters to twiddle. If we run the optimization on this piece of text the result looks like this:

For comparison here's what it looks like in LibreOffice:

And in Scribus:

No sample picture of TeX provided because I have neither the patience nor the skills to find out how to make it use Gentium.

While the parameters are not exactly the same in all three cases, we can clearly see that the new implementation produces more uniform results than the existing ones. One thing to note is that in some cases the new method creates lines that are a bit wider than the target box, where the other two never do. This causes the lines to be squished when justified and it looks really bad if done even a bit too much. The optimization function would probably need to be changed to penalize wide lines more than narrow ones.

The code

Get it here. It uses Gtk 4 and a bunch of related tech so getting it to work on anything else than Linux is probably a challenge.

There are a bunch of optimizations one could do, for example optical margin alignment or stretching individual letters on lines that fall far from the target width.

Thanks to our sponsor

This blog post was brought to you in part by two weeks of sick leave due to a dislocated shoulder. Special thanks to the paramedics on call and the fentanyl they administered to me.

Tuesday, June 7, 2022

Creating your own math-themed jigsaw puzzle from scratch

 Don't you just hate it when you get nerd sniped?

I don't either. It is usually quite fun. Case in point, some time ago I came upon this YouTube video:

It is about how a "500 piece puzzle" usually does not have 500 pieces, but instead slightly more to make manufacturing easier (see the video for the actual details, they are actually quite interesting). As I was watching the video I came up with an idea for my own math-themed jigsaw puzzle.

You can probably guess where this is going.

The idea would not leave me alone so I had to yield to temptation and get the damn thing implemented. This is where problems started. The puzzle required special handling and tighter tolerances than the average jigsaw puzzle made from a custom photo. As a taste of things to come, the final puzzle will only have two kind of pieces, namely these:

For those who already deciphered what the final result will look like: good job.

As you can probably tell, the printed pattern must be aligned very tightly to the cut lines. If it shifts by even a couple of millimeters, which is common in printing, then the whole thing breaks apart. Another requirement is that I must know the exact piece count beforehand so that I can generate an output image that matches the puzzle cut.

I approached several custom jigsaw puzzle manufacturers and they all told me that what I wanted was impossible and that their manufacturing processes are not capable of such precision. One went so far as to tell me that their print tolerances are corporate trade secrets and so is the cut used. Yes, the cut. Meaning the shapes of the resulting pieces. The one feature that is the same on every custom jigsaw puzzle and thus is known by anyone who has ever bought one of them. That is a trade secret. No, it makes no sense to me either.

Regardless it seemed like the puzzle could not be created. But, as the old saying goes, all problems are solvable with a sufficient application of public libraries and lasers.

This is a 50 Watt laser cutter and engraver that is freely usable in my local library. This nicely steps around the registration issues because printing and cutting are done at the same time and the machine is actually incredibly accurate (sub-millimeter). The downside is that you can't use color in the image. Color is created by burning so you can only create grayscale images and the shade is not particularly precise, though the shapes are very accurate.

After figuring this out the procedure got simple. All that was needed was some Python, Cairo and 3mm plywood. Here is the machine doing the engraving.

After the image had been burned, it was time to turn the laser to FULL POWER and cut the pieces. First sideways

then lengthwise.

And here is the final result all assembled up.

This is a 256 piece puzzle showing a Hilbert Curve. It is a space filling curve, that is, it travels through each "pixel" in the image exactly once in a continuous fashion and never intersects itself. As you can (hopefully) tell, there is also a gradient so that the further along the curve you get the lighter the printing gets. So in theory you could assemble this jigsaw puzzle by first ordering the pieces from darkest to lightest and then just joining the pieces one after the other.

The piece cut in this puzzle is custom. The "knob" shape is parameterized by a bunch of variables and each cut between two pieces has been generated by picking random values for said parameters. So in theory you could generate an arbitrarily large jigsaw puzzle with this method (it does need to be a square with the side length being a power of two, though).

Tuesday, May 24, 2022

A look in the game design of choose-your-own-adventure books

After a long covid pause I visited my parents' house. I took some things back with me, such as this:

This is a choose your own adventure book that originally belonged to my big sister. For those who don't speak Finnish, the book is called The Haunted Railway Book and it uses the characters from Enid Blyton's Famous Five, which is a well known series of adventure books for children (or maybe YA, don't know, I haven't actually read any of them).

The game itself is fairly involved. Instead of just doing random choices you have multiple items like a map and a compass that you can obtain and lose during the game and even hit points that are visualised as, obviously, food. I figured it might be interesting to work out how the game has been designed. Like all adventure game books the story has been split into different numbered chapters and you progress in the story by going from one chapter to another according to the rules.

In other words the game is a directed acyclic graph. Going through the entire book and writing it out as a Graphviz file, this is what the adventure looks like. The start point is at the top and to win you need to get to the second to last node without losing all your hit points and having the code book in your possession. The gv file is available in this repo for those who want to examine it themselves.

The red nodes are those where you lose one hit point. As you can tell the game really does not go easy on the player. The basic game design principle is simple. At first the decision tree fans out to give a sense of freedom and then chokes back down to a single node or few where important plot points are presented. This pattern then repeats until the story ends. In this way the game designers know that no matter how the player gets to the endgame, they have been told certain facts. Given that this book was designed in the late 70s or early 80s when computers were not that easily accessible this approach  makes designing the game a lot easier as each story segment is fully isolated from all others.

Looking closer we can see how the hit point mechanism works.

This is a repeating pattern. The game checks if you have a given item. If you don't have it you lose one hit point. In either case the plot goes on to the same point. Note how the child nodes have very different numbers. One assumes that if they were consecutive the player might read both branches and feel cheated that the event had no real influence on the story.

One thing that I wanted to find out was whether the game had any bugs or shortcuts you could take. There do not seem to be any. When I first wrote the graph down there was indeed one path that jumped past several plot nodes. I verified that it was not a transcription error and it did not seem to be. So either the original game had a bug or the Finnish translation had a typo.

Then I re-verified the transcription and found the issue.

In this chapter you need to use a code book to decipher a secret message, which seems to be the letters EJR that translate to "go to chapter 100". However the black triangly thing that is completely separate from the rest of the text (helpful arrow added by me) is actually part of the message and changes the content to "go to chapter 110". With this fixed the transition no longer jumps over plot nodes.  (The chapter text does say that the message is split over two separate bricks, but who reads instructions anyway.)

So what can we learn from this? Graphic design is important and too much whitespace can lead to smugglers escaping the long arm of the law and the short arm of child detectives.

Sunday, May 22, 2022

Sometimes the real world gives you unexpected comedy gold

(FTR: I did not write either one of those tweets, these are just two consecutive items that showed up on my timeline.)

Monday, April 25, 2022

Of snaps and stratagem

My desktop machine is running Kubuntu for a while now. As the new LTS came out I decided to update it. One major change in this release is that Firefox is no longer provided natively, instead it is a Snap package. If you have any experience with computers you might guess that this will cause issues. It surely did.

First the upgrade failed on Firefox and left me with broken packages. After fixing that by hand I did have Firefox but it could not be launched. The launcher app only shows a "Find Firefox" entry but not the browser itself. It can only be run from the command line. For comparison, all apps installed via Flatpak show up properly.

I could go on about this and Snap's other idiosyncrasies (like its insistence to pollute the output of mount with unnecessary garbage) but there is a wider, reoccurring issue here. Over the past years the following dance has taken place several times.

  1. A need for some sort of new functionality appears.
  2. A community project (or several) is started to work on it.
  3. Canonical starts their own project aiming to do the same.
  4. Canonical makes the project require a CLA or equivalent and make the license inconvenient (GPLv3 or stronger).
  5. Everyone else focuses on the shared project, which eventually gains critical mass and wins.
  6. Canonical adopts the community project and quietly drops their own.

This happened with Bzr (which is unfortunate since it was kind of awesome) vs Git. This happened with Mir vs Wayland. This happened with Upstart vs systemd. This happened with Unity vs Gnome/KDE. The details of how and why things went this way vary for each project but the "story arc" was approximately the same.

The interesting step is #4. We don't really know why they do that, presumably so that they can control the platform and/or license the code out under licenses for money. This seems to cause the community at large to decide that they really don't want Canonical to control such a major piece of core infrastructure and double down on step 5.

Will this also happen with Snaps? The future is uncertain, as always, but there are signs that this is already happening. For example the app delivery setup on the Steam Deck is done with Flatpaks. This makes business sense. If I was Valve I sure as hell would not want to license such core functionality from a third party company because it would mean a) recurring license fees and b) no direct control of the direction of the software stack and c) if things do go south you can't even fork the project and do your own thing because of the license.

If this ends up happening and Snaps eventually lose out to Flatpaks then I'd like to send a friendly suggestion to the people who make these sorts of decisions at Canonical. The approach you have used for the past ten plus years clearly does not seem to be working in the long term. Maybe you should consider a different tactic the next time this issue pops up. There's nothing wrong with starting your own new projects as such. That is, after all, how all innovation works, but if the outcomes have no chance of getting widely adopted because of issues like these then you are just wasting a lot of effort. There may also be unintended side effects. For example the whole Mir debacle probably delayed Wayland adoption by several years. 

A more condensed version would go something like this:

History shows us that those who oppose cooperation have eventually ended up losing.

Full disclosure bit: In the past I have worked for Canonical but I did not have anything to do with these decisions and I didn't even know where and how they were made. I still don't. Everything in this blog post is my own speculation based on public information.

Tuesday, April 12, 2022

Getting web proxys and certificates working on Linux or "if it's all the same to you, I'd rather take a thousand years of the Sarlacc pit, thankyouverymuch"

In my day job I'm a consultant. Every now and then my customer changes. This means setting up a new development environment and all that. Recently I started working for a Very Big customer who have a Very Corporative network setup. Basically:

  • All network traffic must go through a corporate HTTP proxy. This means absolutely everything. Not one bit gets routed outside it.
  • Said proxy has its own SSL certificate that all clients must trust and use for all traffic. Yes, this is a man-in-the-middle attack, but a friendly one at that, so it's fine.

This seems like a simple enough problem to solve. Add the proxy to the system settings, import the certificate to the global cert store and be done with it.

As you could probably guess by the title, this is not the case. At all. The journey to get this working (which I still have not been able to do, just so you know) is a horrible tale of never ending misery, pain and despair. Everything about this is so incredibly broken and terrible that it makes a duct taped Gentoo install from 2004 look like the highest peak of usability ever to have graced us mere mortals with its presence.

The underlying issue

When web proxies originally came to being, people added support for them in the least invasive and most terrible way possible: using environment variables. Enter http_proxy, https_proxy and their kind. Then the whole Internet security thing happened and people realised that this was far too a convenient way to steal all your traffic. So programs stopped using those envvars.

Oh, I'm sorry, that's not how it went at all.

Some programs stopped using those envvars whereas other did not. New programs were written and they, too, either used those envvars or didn't, basically at random. Those that eschewed envvars had a problem because proxy support is important, so they did the expected thing: everyone and their dog invented their own way of specifying a proxy. Maybe they created a configuration file, maybe they hid the option somewhere deep in the guts of their GUI configuration menus. Maybe they added their own envvars. Who's to say what is the correct way?

This was, obviously, seen as a bad state of things so modern distros have a centralised proxy setting in their GUI configurator and now everyone uses that.

Trololololololooooo! Of course they don't. I mean, some do, others don't. There is no logic which do or don't. For example you might thing that GUI apps would obey the GUI option whereas command line programs would not but in reality it's a complete crapshoot. There is no way to tell. There does not even seem to be any consensus on what the value of said option string should be (as we shall see later).

Since things were not broken enough already, the same thing happened with SSL certificates. Many popular applications will not use the system's cert store at all. Instead they prefer to provide their own artisanal hand-crafted certificates because the ones provided by the operating system have cooties. The official reason is probably "security" because as we all know if someone has taken over your computer to the extent that they can insert malicious security certificates into root-controlled locations, sticking to your own hand-curated certificate set is enough to counter any other attacks they could possibly do.

What does all of this cause then?


More specifically the kind of pain where you need to do the same change in a gazillion different places in different ways and hope you get it right. When you don't, anything can and will happen or not happen. By my rough estimate, in order to get a basic development environment running, I had to manually alter proxy and certificate settings in roughly ten different applications. Two of these were web browsers (actually six, because I tried regular, snap and flatpak versions of both) and let me tell you that googling how to add proxies and certificates to browsers so you could access the net is slightly complicated by the fact that until you get them absolutely correct you can't get to Google.

Apt obeys the proxy envvars but lately Canonical has started replacing application debs with Snaps and snapd obviously does not obey those envvars, because why would you. Ye olde google says that it should obey either /etc/environment or snap set system proxy.http=. Experimental results would seem to indicate that it does neither. Or maybe it does and there exists a second, even more secret set of config settings somewhere.

Adding a new certificate requires that it is in a specific DERP format as opposed to the R.E.M. format out there in the corner. Or maybe the other way around. In any case you have to a) know what format your cert blob is and b) manually convert it between the two using the openssl command line program. If you don't the importer script will just mock you for getting it wrong (and not telling you what the right thing would have been) instead of doing the conversion transparently (which it could do, since it is almost certainly using OpenSSL behind the scenes).

Even if every single option you can possibly imagine seems to be correct, 99% of the time Outlook webmail (as mandated by the customer) forces Firefox into an eternal login loop. The same settings work on a coworker's machine without issues.

This is the actual error message Outlook produces. I kid you not.

Flatpak applications do not seem to inherit any network configuration settings from the host system. Chromium does not have a setting page for proxies (Firefox does) but instead has a button to launch the system proxy setting app, which does not launch the system proxy setting app. Instead it shows a page saying that the Flatpak version obeys system settings while not actually obeying said settings. If you try to be clever and start the Flatpak with a custom command, set proxy envvars and then start Chromium manually, you find that it just ignores the system settings it said it would obey and thus you can't actually tell it to use a custom proxy.

Chromium does have a way to import new root signatures but it then marks them as untrusted and refuses to use them. I could not find a menu option to change their state. So it would seem the browser has implemented a fairly complex set of functionality that can't be used for the very purpose it was created.

The text format for the environment variables looks like https_proxy=http://my-proxy.corporation.com:80. You can also write this in the proxy configuration widget in system settings. This will cause some programs to completely fail. Some, like Chromium, fail silently whereas others, like fwupdmgr, fail with Could not determine address for server "http". If there is a correct format for this string, the entry widget does not validate it.

There were a bunch of other funnities like these but I have fortunately forgotten them. Some of the details above might also be slightly off because I have been battling with this thing for about a week already. Also, repeated bashes against the desk may have caused me head bairn damaeg.

How should things work instead?

There are two different kinds of programs. The first are those that only ever use their own certificates and do not provide any way to add new ones. These can keep on doing their own thing. For some use cases that is exactly what you want and doing anything else would be wrong. The second group does support new certificates. These should, in addition to their own way of adding new certificates, also use certificates that have been manually added to the system cert store as if they had been imported in the program itself.

There should be one, and only one, place for setting both certs and proxies. You should be able to open that widget, set the proxies and import your certificate and immediately after that every application should obey these new rules. If there is ever a case that an application does not use the new settings by default, it is always a bug in the application.

For certificates specifically the imported certificate should go to a separate group like "certificates manually added by the system administrator". In this way browsers and the like could use their own certificates and only bring in the ones manually added rather than the whole big ball of mud certificate clump from the system. There are valid reasons not to autoimport large amounts of certificates from the OS so any policy that would mandate that is DoA.

In this way the behaviour is the same, but the steps needed to make it happen are shorter, simpler, more usable, easier to document and there is only one of them. As an added bonus you can actually uninstall certificates and be fairly sure that copies of them don't linger in any of the tens of places they were shoved into.

Counters to potential arguments

In case this blog post gets linked to the usual discussion forums, there are a couple of kneejerk responses that people will post in the comments, typically without even reading the post. To save everyone time and effort, here are counterarguments to the most obvious issues raised.

"The problem is that the network architecture is wrong and stupid and needs to be changed"

Yes indeed. I will be more than happy to bring you in to meet the people in charge of this Very Big Corporation's IT systems so you can work full time on convincing them to change their entire network infrastructure. And after the 20+ years it'll take I shall be the first one in line to shake your hand and tell everyone that you were right.

"This guy is clearly an incompetent idiot."

Yes I am. I have gotten this comment every time my blog has been linked to news forums so let's just accept that as a given and move on.

"You have to do this because of security."

The most important thing in security is simplicity. Any security system that depends on human beings needing to perform needlessly complicated things is already broken. People are lazy, they will start working around things they consider wrong and stupid and in so doing undermine security way more than they would have done with a simpler system. In other words:

And finally

Don't even get me started on the VPN.

Sunday, April 3, 2022

Looking at building some parts of the Unreal engine with Meson

Previously we have lookedbuilding the O3DE and Godot game engines with Meson. To keep with the trend let's now look at building the Unreal engine. Unfortunately, as Unreal is not open source, I can't give out any actual code. The license permits sharing snippets, though, so we're going to have to make do with those.

This post is just a very shallow look in the engine. It does not even attempt to be comprehensive, it just has a bunch of things that I noted along the way. You should especially note that I don't make any claims of fitness or quality of the implementation. Any such implications are the result of your own imagination. I used the release branch, which seems to contain UE4.

Before we begin

Let's get started with a trivia question: What major game engine was first shipped a commercial game that was built with Meson? Surprisingly the answer is Unreal Engine. Some years ago at a conference I was told that a game company made a multiplayer game with a dedicated Linux server (I don't know if users could run it or whether they only ran it in their own data centers). For the latter the development team ported the engine to build with Meson. They did it because Unreal's build tooling was, to paraphrase their words, not good.

Sadly I don't remember what title of the game actually was. If one of the readers of this post worked on the game, please add a comment below.

The Unreal build tool

Like most big projects, Unreal has created its own build tooling from scratch. It is basically written in C# and build definitions are C# source files with some functions that get invoked to define build targets and dependencies. The latter are defined as strings and presumably the build tool will parse all them out, convert them to a DAG and then invoke the compilations. This aspect is roughly similar to how tools like Bazel work.

The downside is that trying to reimplement this is challenging because you can't easily get things like compiler flags and defines that are used for the final compiler invocations. Most build systems use a backend like Make which makes it easy to run the commands in verbose mode and swipe all flags for a given source file. UBT does not do that, it invokes the compiler directly. Thus to get the compiler invocations you might run the tool (it ships as C# blob directly inside the repo) with --help. If you do this you'll discover that UBT does not have command line help at all. Determining whether you can get the actual compiler invocations would require either diving in UBT's source code or fiddling with strace. I chose not to and instead just went through the build definition files.

When you set up a build, UBT also creates Makefiles and a CMake project for you to use. They are actually not useful for building. The Makefile just calls UBT and the CMake definitions have one target with all the sources and all the flags. Presumably this is so that you get code completion for IDEs that support the compilation database Don't try to invoke the build, though. the Ninja file it generates has 14319 compilation commands whose command strings are 390 kB long each and one linker command that is 1.6 MB long.

No love for GCC

The engine can only be compiled with MSVC and Clang. There are no #error directives that would give meaningful errors for unsupported compilers, it just fails with undecipherable error messages. Here is an example:

#if !defined(__clang__)
#       include <intrin.h>
#       if defined(_M_ARM)
#               include <armintr.h>
#       elif defined(_M_ARM64)
#               include <arm64intr.h>
#       endif

This behaviour is, roughly, "if you are not compiling with Clang, inlude the Visual Studio intrinsic headers". If your toolchain is neither, interesting things happen.

UBT will in fact download a full prebuilt Clang toolchain that it uses to do the build. It is up to your own level of paranoia how good of an idea you think this is. I used system Clang instead, it seemed to work fine and was also a few releases newer.


The code is most definitely not warning-clean. When I got the core building started, Qt Creator compiled around ten files and reported some 3000 warnings. A lot of them are things like inconsistent overrides which could be fixed automatically with clang-tidy. (it is unclear whether UBT can be made to generate a compile_commands.json so you could actually run it, though). Once you disable all the noisy warnings (Professional driver. Closed circuit. Do not attempt!) all sorts of interesting things start showing up. Such as:

../Engine/Source/Runtime/Core/Private/HAL/MallocBinnedGPU.cpp:563:30: warning: result of comparison of constant 256 with expression of type 'uint8' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
        check(ArenaParams.PoolCount <= 256);

External dependencies

The engine has a lot of dependencies as could be expected. First I looked at how Zlib is built. Apparently with this shell script:


set -x
tar xzf zlib-1.2.5.tar.gz

cd zlib-1.2.5
CFLAGS=-fPIC ./configure
cp libz.a ../libz_fPIC.a

make distclean
cp libz.a ../

echo "Success!"

set +x

I chose not to examine how the remaining dependencies are built.

Internal dependencies and includes

The source code is divided up into separate logical subdirectories like Runtime, ThirdParty, Developer and Programs. This is a reasonable way of splitting up your code. The advantages are fairly obvious, but there are also downsides. There is code in the Runtime directory that depends on things in the Developer directory and vice versa. Similarly you need a lot of code to build things like UnrealHeaderTool in the Programs directory, but it is then used in Runtime directory for code generation.

This means that the dependencies between directories are circular and can go kinda wild. This is a common thing to happen in projects that use a string-based dependency matching. If you can use any dependency from anywhere then that is what people tend to do. For example the last time (which was, granted, years and years ago) I looked up how different directories depend on each other in Google's Abseil project, the end result looked a lot like the subway map of Tokio.

In Meson you can only refer to dependencies that have already been defined (as opposed to lazy evaluation that happens with strings) this issue does not arise but the downside is that you need to organize your source tree to make it possible.

Each "module" within those top level dirs has the same layout. There is a Public directory with headers, Private directory with the rest of the stuff and a build definition source. Thus they are isolated from each other, or at least most of the time. Typically you include things from public directories with something like #include<foo/foo.h>. This is not always the case, though. There are also includes like #include"FramePro.h" to include a file in Public/FramePro/Framepro.h, so just adding the Public dir is not enough. Sometimes developers have not even done that, but instead #include<Runtime/Launch/Resources/Version.h>. This means that in order to build you need to have the root directory of the entire engine's source tree in the header include path which means that any source file can include any header they want directly.


A big part of converting any project is getting all the #defines right. Unreal does not seem to generate a configuration header but will instead add all flags on the command line of the compiler. Unreal has a lot of defines including things like __UNREAL__, which is straight up undefined behaviour. All tokens starting with two underscores (or an underscore and a capital letter) are reserved for the toolchain. Developers are not allowed to use them.

Not having a configuration header and hiding the compiler command lines has its own set of problems. The code has proper visibility export macros so that, for example, all functions exported from the core library are tagged with CORE_API. If you try to grep for that token you'll find that it is not actually defined anywere. This leads to one of two possibilities, either the token is defined via magic macro expansion from a common definition "somewhere" or it is set on the command line. To get around this I added a -DCORE_API= argument to make it work. If that is how it is actually supposed to work then on Windows you'd need to set it to something like -DCORE_API=__declspec(dllexport). Just be sure to quote it properly when you do.

This is where my journey eventually ended. When building the header tool I got this error:

../Engine/Source/Programs/UnrealHeaderTool/Private/IScriptGeneratorPluginInterface.cpp:27:3: error: use of undeclared identifier 'FError'; did you mean 'Error'?
                FError::Throwf(TEXT("Unrecognized EBuildModuleType name: %s"), Value);

What this most likely means is that some define is either set when it should be unset, unset when it should be set or set to the wrong value.

Friday, April 1, 2022

The C++ best practices game jam sample project Meson conversion

Today is April Fool's day and also when the C++ best practices game jam begins. This is interesting because the Jam provides a modern multiplatform C++ starter project setup using multiple dependencies. I though it would be illuminating to convert it to build with Meson and compare the results with a modern build setup as opposed to one that has gathered unfortunate cruft for years. The original project can be found in this repo whereas my Meson conversion is here (I only tested it on Linux, though it "should work" on all platforms).

The original project builds with CMake and interestingly uses two different methods of obtaining external dependencies. Catch2, spdlog and docopt are obtained via Conan whereas FTXUI is downloaded with CMake's FetchContent. The relevant lines for the latter are the following:


  GIT_REPOSITORY https://github.com/ArthurSonzogni/ftxui
  GIT_TAG v2.0.0

  add_subdirectory(${ftxui_SOURCE_DIR} ${ftxui_BINARY_DIR} EXCLUDE_FROM_ALL)

The Meson version gets all the dependencies directly from WrapDB. The main advantage is that you don't need any third party dependency manager or provider, everything happens directly within Meson. The build definitions needed to get all these deps is just this:

docopt_dep = dependency('docopt')
ftxui_dep = dependency('ftxui-component')
spdlog_dep = dependency('spdlog')
catch2_dep = dependency('catch2')

All the nitty gritty details needed to download the dependencies are stored in the wrap files, which are created automatically with commands like following:

meson wrap install catch2

Thus for this project we find at least that Meson setup is both simpler and does not require external tooling.

Sunday, February 20, 2022

Please provide tarball releases of your projects

A recent trend in open source projects seems to be to avoid releasing proper release archives (whether signed with GPG or not). Instead people add Git tags for release commits and call it a day.

A long and arduous debate could be had whether this is "wrong", "right" and whether Git hashes are equivalent to proper tarballs or not, or if --depth=1 is a good thing or not. We're not going to get into that at all.

Instead I'd like to kindly ask that all projects that do releases of any kind to provide actual release tarballs for the following two reasons:

  1. It takes very little effort on your part.
  2. Proper release archives make things easier for many people consuming your project.
This makes sense just from a pure numbers game perspective: a little work on your part saves a lot of work for many other people. So please do it.

What about Github automatic archive generation?

Github does provide links to download any project commit (and thus release) as an archive. This is basically a good thing but it has two major issues.

  1. The filenames are of type v1.0.0.tar.gz. So from a file name you can't tell what it contains and further if you have two dependencies with the same version number, the archive files will have the same name and thus clash. Murphy's law says that this is inevitable.
  2. The archives that Github generates are not stable. Thus if you redownload them the files may have different checksums. This is bad because doing supply chain verification by comparing hashes will give you random failures.
The latter might get fixed if Github changes their policy to guaranteed reproducible downloads. The former problem still exists.

A simple webui-only way to do it

If you don't want to use git archive to generate your releases for whatever reason, there is a straightforward way of doing the release using only the web ui.

  1. Create your release by tagging as usual.
  2. Download the Github autogenerated tarball with a browser (it does not matter whether you choose zip or tar as the format, either one is fine).
  3. Rename the v1.0.0.tar.gz file to something like myproject-1.0.0.tar.gz.
  4. Go to the project tags page, click on "create a new release from tag".
  5. Upload the file from step #3 as a release file.

Saturday, February 12, 2022

Supporting external modules in Godot game engine with Meson

The disclaimer

None of this is in upstream Godot yet. It is only a proposal. The actual code can be obtained from the meson2 branch of this repository. Further discussion on the issue should be posted here.

The problem

Godot's code base is split into independent modules that can be enabled and disabled at will. However many games require custom native code and thus need to define their own modules. The simplest way to do this (which, I'm told, game developers quite often do) is to fork the upstream repo and put your your code in it. This works and is a good solution for one-off projects that write all extra code by themselves. This approach also has its downsides. The two major ones are that updating to a newer version of the engine can turn into a rebasing hell and that it is difficult to combine multiple third party modules.

Ideally what you'd want to do is to take upstream Godot and then take a third party module for, say, physics and a second module written by completely different people that does sound effects processing, combine all three and have things just work. Typically those modules are developed in their own repositories. Thus we'd end up with the following kind of a dependency graph.

This would indicate a circular dependency between the two repositories: in order to build the external module you need to depend on the Godot repo and in order to build the Godot you need to depend on the external repository. This is bad. If there's one thing you don't want in your source code it is circular dependencies.

Solving it using Meson's primitives

If you look at the picture in more detail you can tell that there is no circular dependencies between individual targets. Thus to solve the problem you need some way to tell the external dependency how to get the core libraries and headers it needs and conversely a way for the main build to extract from the external project what modules it has built and where they are. As Meson subprojects are built in isolation one can't just blindly poke the innards of other projects as one can do if everything is in a single megaproject.

The way Godot's current build is set up is that first it defines the core libraries, then all the modules and finally things on top of that (like the editor and main executable). We need to extend this so that external modules are set up at the same time as internal modules and then joined into one. Thus the final link won't even be able to tell the difference between external and internal modules.

First we need to set up the dependency info for the core libraries, which is done like this:

godotcore_dep = declare_dependency(include_directories: INCDIRS,
                                   compile_args: CPP_ARGS,
                                   link_args: LINK_ARGS)
meson.override_dependency('godotcore', godotcore_dep)

First we set up a dependency object that encapsulates everything needed to build extension modules and then specify that whenever a dependency called godotcore is looked up, Meson will return the newly defined object. This even works inside subprojects that are otherwise isolated from the master project.

Assuming we have a list of external module subprojects available, we can go through them one by one and build them.

foreach extmod : external_modules
    sp = subproject(extmod)
    MODULE_DEPENDENCIES += sp.get_variable('module_dep')
    MODULES_ENABLED += sp.get_variable('module_names')

The first line runs the subproject, the latter two are ignored for now, we'll come back to them. The subproject's meson.build file starts by getting the dependency info.

godotcore_dep = dependency('godotcore')

Then it does whatever it needs to build the extension module. Finally it defines the information that the main Godot application needs to use the module:

module_dep = declare_dependency(link_with: lib_module_tga,
  include_directories: '.')
module_names = ['tga']

In this case I have converted Godot's internal tga module to build as an external module hence the name. This concludes the subproject and execution resumes in the master project and the two remaining lines grab the module build information and name and append them to the list of modules to use.

This is basically it. There are obviously more details needed like integrating with Godot's documentation system for modules but the basic principle for those is the same. With this approach the integration of multiple external modules is simple: you need to place them in the main project's subprojects directory and add their names to the list of external module subprojects. All of this can be done with no code changes to the main Godot repo so updating it to the newest upstream version is just a matter of doing a git pull.

Thursday, February 10, 2022

Typesetting an Entire Book Part IV: The Content

In previous blog posts (such as seals this one) we looked into typesetting a book with various FOSS tools. Those have used existing content from Project Gutenberg. However it would be a whole lot nicer to do this with your own content, especially since a pandemic quarantine has traditionally been a fruitful time to write books. Thus for completeness I ventured out to write my own. After a fair bit of time typing, retyping, typesetting, imposing, printing, gluing, sandpapering and the like, here is the 244 page product that eventually emerged from the pipeline.

As you can probably tell, this was the first time i did gouache lettering with a brush. It behaved differently from what I expected and due to reasons I could only do this after the cover had been attached to the text block, so there was no going back or doing it over.

What's its name in English and what genre does it represent?

The first one of these is actually quite a difficult question. I spent a lot of time trying to come up with a working translation but failed. Not only is the title a pun, it contains an intentional spelling error. A literal translation would be something like The First Transmission Giltch, though that misses the fact that the original name contains the phrase First Contact. The working title for the book was Office Space in Space.

As you can probably tell, the book is a sci-fi satire about humanity's first contact with alien civilisations, but it can be seen as an allegory of a software startup company. ISO standardisation also plays a minor part, and so do giant doughnuts, unexpected gravities, a cosplay horse and even space sex (obviously).

At this point many of you have probably asked the obvious question, namely isn't this just a blatant Hitchhiker's Guide to the Galaxy ripoff?

Not really. Or at least I'd like to think that it isn't. This can be explained with an analogy: what Hitchhiker's is to Star Wars, this book is to Star Trek. More specifically it is not high fantasy, but more technical, down to earth and gritty, for lack of a better term.

Can I read it?

You almost certainly can not.

In fact, let's be scientific and estimate how unlikely it would be. The first hurdle is getting the book published. Statistics say that only one book out of a thousand offered to publishers actually gets published. Even if it did get published and you had a physical copy in your hands, you probably still could not read it, since it is written in Finnish, a language that is understood only by 0.1 percent of the planet's population. If we estimate how many people who could read it actually would read it then the chances are again roughly one of a thousand.

Putting all these together and assuming a planetary population of 7 billion we find that the book will only be read by around seven people. Thus far I have convinced five of my friends to read the preview version so there are only two slots available. Your chances of being one of the two are thus quite slim. On the other hand if you are an extreme hipster, the kind who likes to boast to their friends that they only read books in languages they don't speak and which require their readers to understand both French and Latin in order to get some of the jokes within, you may have found your Citizen Kane.

If you can find it, that is.

Friday, February 4, 2022

Converting Godot game engine to Meson, how you can help

There has long been interest in switching the Godot game engine from its current SCons build system to Meson. The actual details are here, but as a quick summary:

  • The original port was written by community members.
  • It got stale, so I have been keeping it rebased against current master
  • The original plan was to review and possibly merge it immediately after the 4.0 alpha release.
Unfortunately the people who should do the review are currently very busy with other things and can't spend much time on this. It would be important to get the change in before the 4.0 release as changing build systems mid-release is apparently a difficult task.

How can you participate?

As the actual devs are busy, what we can do is try to make things as easy for them as possible. In other words the port needs review and testing. The simplest way is to help is to check out the code from the meson branch of this repo, compiling it and running it. Then report your success (or failure) in this Github discussion thread. I will keep rebasing the branch against upstream master at regular intervals.

Note that if you encounter bugs, please do not file them against the upstream project unless you have verified that they also occur with a regular SCons build.

Currently it compiles and runs for me on Linux, Windows and macOS, but the more people can verify it the better.

It would be especially useful if people could test it on Android. FWICT the original uses Android Studio for the Java bits and somehow uses SCons to build a shared library that is used. The Meson build does build the shared lib, but it has not been tested. If someone with Android dev experience could set up and test the whole pipeline it would be great.

The same goes for iOS, though I know even less how it should be set up as I have not really done iOS development. It should build for iOS (there is a cross file in the repo) but FWICT it has never been tested beyond that.

The eventual goal

The aim is to make the Meson port as polished and tested as possible so that once upstream makes their final decision it should be as easy as possible for them (regardless of what they eventually end up choosing). The more people have tried it, the more confidence there is that the port can actually do all the things that are needed.

Wednesday, February 2, 2022

Compiling LibreOffice with Meson even further

After building the basics of LO on Windows and macOS the obvious next step is to build all of it. This is just grunt work and quite boring at that actually. I almost got it done apart from the fact that at the end I got lazy and skipped those bits that require weird dependencies (of which more later).

The end result has approximately 7800 individual compilation and linking steps. It takes about 30 minutes on a 16 core Ryzen 3700X CPU using Windows and Visual Studio. On a 7 year old 4 core Macbook it takes around 3 hours.

What is still needed to make it work?

Quite a bit, actually. The most pertinent would be all the configuration files that get installed. There are a lot of them and they need to be exactly right, otherwise the end result fails to start with cryptic error messages, if any. Unlike code compilation there is no easy way to know in advance what should be done or how things should behave. Ideally you'd get help with people who know the innards of the program and all the configgen bits.

The next bit is Java. You can compile LO without Java at all and, from what I can tell, there are plans to replace all Java with native code. Unfortunately that will take some time so in the mean time something would need to be done with this. Meson does have native Java support but, as far as I know, it has never been tested with huge projects so there would probably be some teething problems.

There are also many parts of existing native code that has not been ported due to missing dependencies. There many plugins for different SQL server connectors. They all need the respective client libs. These have not been ported.

All of that would handle perhaps 80% of the work. The remaining 80% would be split evenly across a bunch of places from deployment scripts to CI config to workspace changes and so on. There is arguably a third 80% bit, namely convincing people to adopt the new system.

What horrors interesting features were found lurking in the dependencies?

Oh boy, were there some. In the interest of fairness their names have been anonymized for obvious reasons.

One of the dependencies has not had a release in six years. It builds only with Autotools, generates a config.h header and all that. During porting it seemed that sometimes changes to the configuration header are not picked up. After enough digging it turned out that the header is not included anywhere. It is not even force included with a compiler flag. It is 100% useless.

This is problematic because spawning external processes to do all SIZEOF_INT check and the like are by far the slowest part of Meson and not something that can be easily sped up. Even worse, 99% of the time those checks would still be useless today. If you need integers of a specific size, use stdint.h instead. If your library has its own named integer types, define those via entries provided by stdint.h instead of hacking things by hand. It is faster, more reliable and less effort. It even works on Visual Studio since several years ago. There are still reasons to do the manual thing, but they are extremely rare. So unless you are doing something very low level like GLib or have to support Ultrix from 1992 then there really is no reason not to use stdint.h.

Another dependency claims to support Windows but it does not have any symbol visibility exports but instead relies on GCC exporting all symbols by default so it can't be build as a shared library using Visual Studio, only MinGW.

While Autotools is peculiar in many ways, one of the nicest things about it is that the config.h header it creates has a very rigid form. Thus it is a lot easier to create a tool that reads that file and works out backwards what checks it has done. Meson has a tool for this and converting Autoconf checks with it is actually quite simple. Sadly sometimes people feel the need to reinvent their own. One of LO's dependencies does exactly that. Its config header template has the regular defines. It also has lines like this:


After a lot of inscrutable macro code written in a Certain Make-based system splattered about the source tree, the end result eventually gets expanded to this:

#define internalname func_name_in_current_libc

It is unclear why this was done and what the advantage was, but the downside is clear. There is no automatic way to convert this, it can only be reverse engineered and reimplemented by a human being. This is slow.

In general there are a lot of weird checks in various projects all of which take time to do. For example there is a fairly modernish C++ dependency that checks for the sleep C function rather than using the builtin this_thread::sleep_for function that has existed for over 11 years now. There are also checks for whether the printf function is available (because, you know, some of your users might be living in the year 1974, one can never be sure about these things). There is even a case where this test fails on the Visual Studio compiler.

All of this is to say that I have now gone with this about as far as I'm willing to do on my own. If there are people who want to see this conversion properly happen, now would be the time to get involved.

Saturday, January 29, 2022

Porting the LO Windows build to macOS

In the last blog post we looked at compiling a subsection of LibreOffice on Windows using nothing but Meson and WrapDB. Now that it is working (somewhat) the obvious followup question is how much work would it be to make that build on macOS as well.

Not all that much, it turns out. The work consisted mostly of adding some defines, adding platform-specific source files and the like. It took me less than a day. A sizable fraction of that was spent waiting for my old 4-core laptop to finish compiling the code.

Thursday, January 27, 2022

Building a part of LibreOffice on Windows using only Meson and WrapDB

In earlier posts (starting from this one) I ported LibreOffice's build system to Meson. The aim has not been to be complete, but to compile and link the main executables. On Linux this is fairly easy as you can use the package manager to install all dependencies (and there are quite a few of them).

One of the goals of WrapDB has been to provide external dependencies automatically on platforms lacking system dependencies (and even on Linux if you need newer dependency versions than your distro provides). It is already being used by many people and from what I've been told it works fairly well. When it comes to bigger projects like LO, there have been two major opposing view:

  1. That just can't work.
  2. Even if it did work, converting all dependencies would be way too much work so it could never be done.
There is only one way to counter opinions such as these and that is to do the actual work. So I set out to build LO Writer and all its dependencies using nothing but Visual Studio, Meson and WrapDB. There is to be no MinGW, Msys, Cygwin or any other unix compatibility layer.

The porting work was straightforward. Start compilation, wait for an error, typically due to missing dependencies. Then port that to Meson, submit it to WrapDB and continue.

Did it succeed?

Yes. In a way. Here's a screen shot of the extracted subprojects that were downloaded via WrapDB.

Does it run?

Lol no. It did not even run on Linux properly, because LO requires a ton of configuration files to be installed "just right" in order to start and that part had never been compiled.

Does it compile?

It does on my machine. It probably won't do so on yours. Some of the deps I used could not be added to WrapDB yet or are missing review. If you want to try, the code is here.

The problematic (from a build system point of view) part of compiling an executable and then running it to generate source code for a different target works without problems. In theory you should be able to generate VS project files and build it with those, but I only used Ninja because it is much faster.

What was hard?

The nemesis of any porting effort of LO is the i18npool subdirectory. It builds programs that convert hyphenation rules from XML files to code. It uses the ICU library for that. The basic problem of Windows is that there is no concept of RPATH (unless you fake it in) so if your binaries use shared libraries then you can't just run them. Fortunately Meson handles this transparently by wrapping binary invocations and does all the needed PATH magling needed to make things work.

However ICU's hyphenation programs are special. They also need to access some data files. On a system-wide install they are read from the common directories, but they are not available when building yourself. There are command line options to point the programs to the proper place but at the time I got frustrated and just copied the pregenerated source file from a Linux build and called it a day.

I had to do the same thing for the outputs of Flex, Bison and gperf for similar reasons. These are all fixable, but some of the generator bits also use cthulhuan shell pipelines to do "stuff". These would need to be converted to Python for portability (and also readability).


LO uses a lot of Boost. I suspected this to be a problem but fortunately it did not. Most code uses the header-only parts so those all get set by a single declare_dependency. There were a couple of uses of libraries that require actual code. One of them was for Boost Filesystem. Assuming the code does not do anything weird, that could probably be fixed to use std::filesystem instead.

The Boost code is copied in the LO repo for now. It is not added to WrapDB yet as it is quite incomplete and only builds for this use case. Still, Boost is a popular dependency so maybe having it in WrapDB would be useful, even in an incomplete state.

Could a full port be made?

Let's say that thus far there has been nothing to indicate that it would not work. The downside is that it would be a fair amount of work and it is not the cool kind where you get to write new features but instead it is the equivalent of ditch digging. Even more problematically it probably could not be done by "one person on their own" but would instead require buy-in and cooperation from a large group of developers. As people are perennially busy, getting the necessary resources would probably be challenging.

All of that being said, there is a GSoC project for doing a porting experiment. So if you are the sort of person who won't shy away from a challenge, you might consider applying.

Bonus question

How many XML parsers does LO have?

The first one is libxml. The second one is Expat. The third one is Boost's Property tree, which has its own parser (according to the docs at least, dunno if it is used in this code). The fourth one is the bunch of Awk regexps that are used in the build scripts buried inside Makefiles.

There may be more.

Saturday, January 8, 2022

Portability is not sufficient for portability

A forenote

This blog post has some examples of questionable quality. This should not be meant as an attack on those projects. The issues listed here are fairly widespread, these are just the examples I ran into while doing other work.

What is meant by portability?

Before looking into portable software, let's first examine portability from a hardware perspective. When you ask most people what they consider a "portable computer", they'll probably think of laptops or possibly even a modern smartphone. But what about this:

This is most definitely a computer (the one I'm using to write this blog post, in fact), but not portable. It weighs something on the order of 10 kilos and it is too big to comfortably wrap your hands around.

And yet, I have carried this computer from one end of the Helsinki metropolitan region to another. It took over an hour on a train and a subway. When I finally got it home my arms were so exhausted that for a while I could not even lift them up and all muscles in them were sore for several days. I did use a helper carry strap, but it did not help much.

So in a way, yes, this computer is portable. It's not really designed for it, the actual transport process is a long painful slog and if you make an accidental misstep and bump it against a wall you run the risk of breaking everything inside. But it is a "portable computer" as a single person can carry it from one place to another using nothing but their own muscles.

Tying this to software portability

There is a lot of software out there that claims to be "portable" but can only be said to be that in the same way as the computer shown above is "portable". For the rest of the post we're only going to focus on portability to Windows.

Let's say a project has a parser that is built with Lex and Bison. Thus you need to have those programs during compilation. Building them from source is problematic on Windows (because of, among other things, Autotools) so it would be nice to get some prebuilt binaries for them. After a bit of googling you might find this page which provides Windows binaries. That has last been updated in 2004. So no.

You could also install Msys2 and get the binaries with Pacman. If you are using Visual Studio and just want to build the thing, installing a whole separate userspace system and package manager just to get two executables seems like bit of an overkill. Thinking about it further you might realize that you could install Msys on some other machine, get the executables, copy them and their direct dependency DLLs to your machine and put them in PATH. If you try this, the binaries segfault on run, probably because they can't access their localisation files that are "somewhere". 

Is this piece of software portable to Windows? Yes it is, in the "desktop PC is portable" sense, but definitely not in the "a laptop is portable" sense.

As an another example let's look at the ICU project. It claims to be highly portable, and it kind of is, here is a random snippet from their highly portable Makefile.

I don't know about you, but just looking at that … thing gives me a headache. If you ever need to do something the existing functionality does not provide, then trying to decipher what that thing is doing is an exercise in masochism. For Windows this is relevant, because Visual Studio only ships with nmake, which is not at all compatible with Make so you need to decrypt absolutely everything.

Again, this is portable to Windows, you just need to prebuild it with MinGW or using the provided VS solution files, copying the libraries from one place to another and using them. This is very much "desktop PC portable" again.

Sometimes you don't get even that. Take for example the liblangtag project. It is a fairly typical dependency library that provides a single shared library. It even hides its symbols and only exports those belonging to the public API. Sadly it does this using Libtool magic postprocessing. On Windows you have to annotate exported symbols with magic markers. Thus it is actually impossible to build a shared library properly on VS without making source code changes[1]. Thus you have to go the Mingw build route here. But that is again "portable" as in if you spend a ton of time and effort then you can sorta kinda make it work in a rubegoldbergesque way.

Being more specific

Due to various reason I have had to deal with the innards of libraries of different vintage. Fairly often the experience has been similar to dragging my desktop computer across town: arduous, miserable and exhausting. It is something I would wish upon my worst enemy and also upon most of my lesser enemies. It would serve them right.

In my personal opinion saying that some piece of code is portable should imply some basic ease of use. If you need to spend time fighting with it to make it work on an uncommon platform, toolchain or configuration then the thing is not really portable. It also blocks adoption, because if some library is a massive pain to use, people will prefer to reimplement the functionality or use some other library, just to get away from the pain.

Since changing the generally accepted meanings of words is unlikely to work, this won't happen. So in the mean time when you are talking about portability with someone else, do be sure to specify whether you mean "portable as in a desktop gaming PC" or "portable as in a laptop".

How this blog post came about

Some years ago I ported a sizable fraction of LibreOffice to build with Meson. It worked only on Linux as it used system dependencies. I rebased it to current trunk and tried to see if it could be built using nothing but Visual Studio by getting dependencies via the WrapDB. This repo contains the code, which now actually does build some code including dependencies like libxml, zlib and icu.

The code that is there is portable in the laptop sense. You only need to do a git checkout and start the build in a VS x64 dev tools prompt. It does cheat in some points, such as using pregenerated flex + bison sources, but it's not meant to be production quality, just an experiment.

[1] The project in question seems to have more preprocessor macro magic definitions than actual code, so it is possible there is some combination of defines that makes this work. If so, I did not manage to find it. This is typical in many old school C projects.