Monday, April 25, 2022

Of snaps and stratagem

My desktop machine is running Kubuntu for a while now. As the new LTS came out I decided to update it. One major change in this release is that Firefox is no longer provided natively, instead it is a Snap package. If you have any experience with computers you might guess that this will cause issues. It surely did.

First the upgrade failed on Firefox and left me with broken packages. After fixing that by hand I did have Firefox but it could not be launched. The launcher app only shows a "Find Firefox" entry but not the browser itself. It can only be run from the command line. For comparison, all apps installed via Flatpak show up properly.

I could go on about this and Snap's other idiosyncrasies (like its insistence to pollute the output of mount with unnecessary garbage) but there is a wider, reoccurring issue here. Over the past years the following dance has taken place several times.

  1. A need for some sort of new functionality appears.
  2. A community project (or several) is started to work on it.
  3. Canonical starts their own project aiming to do the same.
  4. Canonical makes the project require a CLA or equivalent and make the license inconvenient (GPLv3 or stronger).
  5. Everyone else focuses on the shared project, which eventually gains critical mass and wins.
  6. Canonical adopts the community project and quietly drops their own.

This happened with Bzr (which is unfortunate since it was kind of awesome) vs Git. This happened with Mir vs Wayland. This happened with Upstart vs systemd. This happened with Unity vs Gnome/KDE. The details of how and why things went this way vary for each project but the "story arc" was approximately the same.

The interesting step is #4. We don't really know why they do that, presumably so that they can control the platform and/or license the code out under licenses for money. This seems to cause the community at large to decide that they really don't want Canonical to control such a major piece of core infrastructure and double down on step 5.

Will this also happen with Snaps? The future is uncertain, as always, but there are signs that this is already happening. For example the app delivery setup on the Steam Deck is done with Flatpaks. This makes business sense. If I was Valve I sure as hell would not want to license such core functionality from a third party company because it would mean a) recurring license fees and b) no direct control of the direction of the software stack and c) if things do go south you can't even fork the project and do your own thing because of the license.

If this ends up happening and Snaps eventually lose out to Flatpaks then I'd like to send a friendly suggestion to the people who make these sorts of decisions at Canonical. The approach you have used for the past ten plus years clearly does not seem to be working in the long term. Maybe you should consider a different tactic the next time this issue pops up. There's nothing wrong with starting your own new projects as such. That is, after all, how all innovation works, but if the outcomes have no chance of getting widely adopted because of issues like these then you are just wasting a lot of effort. There may also be unintended side effects. For example the whole Mir debacle probably delayed Wayland adoption by several years. 

A more condensed version would go something like this:

History shows us that those who oppose cooperation have eventually ended up losing.

Full disclosure bit: In the past I have worked for Canonical but I did not have anything to do with these decisions and I didn't even know where and how they were made. I still don't. Everything in this blog post is my own speculation based on public information.

Tuesday, April 12, 2022

Getting web proxys and certificates working on Linux or "if it's all the same to you, I'd rather take a thousand years of the Sarlacc pit, thankyouverymuch"

In my day job I'm a consultant. Every now and then my customer changes. This means setting up a new development environment and all that. Recently I started working for a Very Big customer who have a Very Corporative network setup. Basically:

  • All network traffic must go through a corporate HTTP proxy. This means absolutely everything. Not one bit gets routed outside it.
  • Said proxy has its own SSL certificate that all clients must trust and use for all traffic. Yes, this is a man-in-the-middle attack, but a friendly one at that, so it's fine.

This seems like a simple enough problem to solve. Add the proxy to the system settings, import the certificate to the global cert store and be done with it.

As you could probably guess by the title, this is not the case. At all. The journey to get this working (which I still have not been able to do, just so you know) is a horrible tale of never ending misery, pain and despair. Everything about this is so incredibly broken and terrible that it makes a duct taped Gentoo install from 2004 look like the highest peak of usability ever to have graced us mere mortals with its presence.

The underlying issue

When web proxies originally came to being, people added support for them in the least invasive and most terrible way possible: using environment variables. Enter http_proxy, https_proxy and their kind. Then the whole Internet security thing happened and people realised that this was far too a convenient way to steal all your traffic. So programs stopped using those envvars.

Oh, I'm sorry, that's not how it went at all.

Some programs stopped using those envvars whereas other did not. New programs were written and they, too, either used those envvars or didn't, basically at random. Those that eschewed envvars had a problem because proxy support is important, so they did the expected thing: everyone and their dog invented their own way of specifying a proxy. Maybe they created a configuration file, maybe they hid the option somewhere deep in the guts of their GUI configuration menus. Maybe they added their own envvars. Who's to say what is the correct way?

This was, obviously, seen as a bad state of things so modern distros have a centralised proxy setting in their GUI configurator and now everyone uses that.

Trololololololooooo! Of course they don't. I mean, some do, others don't. There is no logic which do or don't. For example you might thing that GUI apps would obey the GUI option whereas command line programs would not but in reality it's a complete crapshoot. There is no way to tell. There does not even seem to be any consensus on what the value of said option string should be (as we shall see later).

Since things were not broken enough already, the same thing happened with SSL certificates. Many popular applications will not use the system's cert store at all. Instead they prefer to provide their own artisanal hand-crafted certificates because the ones provided by the operating system have cooties. The official reason is probably "security" because as we all know if someone has taken over your computer to the extent that they can insert malicious security certificates into root-controlled locations, sticking to your own hand-curated certificate set is enough to counter any other attacks they could possibly do.

What does all of this cause then?

Pain.

More specifically the kind of pain where you need to do the same change in a gazillion different places in different ways and hope you get it right. When you don't, anything can and will happen or not happen. By my rough estimate, in order to get a basic development environment running, I had to manually alter proxy and certificate settings in roughly ten different applications. Two of these were web browsers (actually six, because I tried regular, snap and flatpak versions of both) and let me tell you that googling how to add proxies and certificates to browsers so you could access the net is slightly complicated by the fact that until you get them absolutely correct you can't get to Google.

Apt obeys the proxy envvars but lately Canonical has started replacing application debs with Snaps and snapd obviously does not obey those envvars, because why would you. Ye olde google says that it should obey either /etc/environment or snap set system proxy.http=. Experimental results would seem to indicate that it does neither. Or maybe it does and there exists a second, even more secret set of config settings somewhere.

Adding a new certificate requires that it is in a specific DERP format as opposed to the R.E.M. format out there in the corner. Or maybe the other way around. In any case you have to a) know what format your cert blob is and b) manually convert it between the two using the openssl command line program. If you don't the importer script will just mock you for getting it wrong (and not telling you what the right thing would have been) instead of doing the conversion transparently (which it could do, since it is almost certainly using OpenSSL behind the scenes).

Even if every single option you can possibly imagine seems to be correct, 99% of the time Outlook webmail (as mandated by the customer) forces Firefox into an eternal login loop. The same settings work on a coworker's machine without issues.

This is the actual error message Outlook produces. I kid you not.

Flatpak applications do not seem to inherit any network configuration settings from the host system. Chromium does not have a setting page for proxies (Firefox does) but instead has a button to launch the system proxy setting app, which does not launch the system proxy setting app. Instead it shows a page saying that the Flatpak version obeys system settings while not actually obeying said settings. If you try to be clever and start the Flatpak with a custom command, set proxy envvars and then start Chromium manually, you find that it just ignores the system settings it said it would obey and thus you can't actually tell it to use a custom proxy.

Chromium does have a way to import new root signatures but it then marks them as untrusted and refuses to use them. I could not find a menu option to change their state. So it would seem the browser has implemented a fairly complex set of functionality that can't be used for the very purpose it was created.


The text format for the environment variables looks like https_proxy=http://my-proxy.corporation.com:80. You can also write this in the proxy configuration widget in system settings. This will cause some programs to completely fail. Some, like Chromium, fail silently whereas others, like fwupdmgr, fail with Could not determine address for server "http". If there is a correct format for this string, the entry widget does not validate it.

There were a bunch of other funnities like these but I have fortunately forgotten them. Some of the details above might also be slightly off because I have been battling with this thing for about a week already. Also, repeated bashes against the desk may have caused me head bairn damaeg.

How should things work instead?

There are two different kinds of programs. The first are those that only ever use their own certificates and do not provide any way to add new ones. These can keep on doing their own thing. For some use cases that is exactly what you want and doing anything else would be wrong. The second group does support new certificates. These should, in addition to their own way of adding new certificates, also use certificates that have been manually added to the system cert store as if they had been imported in the program itself.

There should be one, and only one, place for setting both certs and proxies. You should be able to open that widget, set the proxies and import your certificate and immediately after that every application should obey these new rules. If there is ever a case that an application does not use the new settings by default, it is always a bug in the application.

For certificates specifically the imported certificate should go to a separate group like "certificates manually added by the system administrator". In this way browsers and the like could use their own certificates and only bring in the ones manually added rather than the whole big ball of mud certificate clump from the system. There are valid reasons not to autoimport large amounts of certificates from the OS so any policy that would mandate that is DoA.

In this way the behaviour is the same, but the steps needed to make it happen are shorter, simpler, more usable, easier to document and there is only one of them. As an added bonus you can actually uninstall certificates and be fairly sure that copies of them don't linger in any of the tens of places they were shoved into.

Counters to potential arguments

In case this blog post gets linked to the usual discussion forums, there are a couple of kneejerk responses that people will post in the comments, typically without even reading the post. To save everyone time and effort, here are counterarguments to the most obvious issues raised.

"The problem is that the network architecture is wrong and stupid and needs to be changed"

Yes indeed. I will be more than happy to bring you in to meet the people in charge of this Very Big Corporation's IT systems so you can work full time on convincing them to change their entire network infrastructure. And after the 20+ years it'll take I shall be the first one in line to shake your hand and tell everyone that you were right.

"This guy is clearly an incompetent idiot."

Yes I am. I have gotten this comment every time my blog has been linked to news forums so let's just accept that as a given and move on.

"You have to do this because of security."

The most important thing in security is simplicity. Any security system that depends on human beings needing to perform needlessly complicated things is already broken. People are lazy, they will start working around things they consider wrong and stupid and in so doing undermine security way more than they would have done with a simpler system. In other words:

And finally

Don't even get me started on the VPN.

Sunday, April 3, 2022

Looking at building some parts of the Unreal engine with Meson

Previously we have lookedbuilding the O3DE and Godot game engines with Meson. To keep with the trend let's now look at building the Unreal engine. Unfortunately, as Unreal is not open source, I can't give out any actual code. The license permits sharing snippets, though, so we're going to have to make do with those.

This post is just a very shallow look in the engine. It does not even attempt to be comprehensive, it just has a bunch of things that I noted along the way. You should especially note that I don't make any claims of fitness or quality of the implementation. Any such implications are the result of your own imagination. I used the release branch, which seems to contain UE4.

Before we begin

Let's get started with a trivia question: What major game engine was first shipped a commercial game that was built with Meson? Surprisingly the answer is Unreal Engine. Some years ago at a conference I was told that a game company made a multiplayer game with a dedicated Linux server (I don't know if users could run it or whether they only ran it in their own data centers). For the latter the development team ported the engine to build with Meson. They did it because Unreal's build tooling was, to paraphrase their words, not good.

Sadly I don't remember what title of the game actually was. If one of the readers of this post worked on the game, please add a comment below.

The Unreal build tool

Like most big projects, Unreal has created its own build tooling from scratch. It is basically written in C# and build definitions are C# source files with some functions that get invoked to define build targets and dependencies. The latter are defined as strings and presumably the build tool will parse all them out, convert them to a DAG and then invoke the compilations. This aspect is roughly similar to how tools like Bazel work.

The downside is that trying to reimplement this is challenging because you can't easily get things like compiler flags and defines that are used for the final compiler invocations. Most build systems use a backend like Make which makes it easy to run the commands in verbose mode and swipe all flags for a given source file. UBT does not do that, it invokes the compiler directly. Thus to get the compiler invocations you might run the tool (it ships as C# blob directly inside the repo) with --help. If you do this you'll discover that UBT does not have command line help at all. Determining whether you can get the actual compiler invocations would require either diving in UBT's source code or fiddling with strace. I chose not to and instead just went through the build definition files.

When you set up a build, UBT also creates Makefiles and a CMake project for you to use. They are actually not useful for building. The Makefile just calls UBT and the CMake definitions have one target with all the sources and all the flags. Presumably this is so that you get code completion for IDEs that support the compilation database Don't try to invoke the build, though. the Ninja file it generates has 14319 compilation commands whose command strings are 390 kB long each and one linker command that is 1.6 MB long.

No love for GCC

The engine can only be compiled with MSVC and Clang. There are no #error directives that would give meaningful errors for unsupported compilers, it just fails with undecipherable error messages. Here is an example:

#if !defined(__clang__)
#       include <intrin.h>
#       if defined(_M_ARM)
#               include <armintr.h>
#       elif defined(_M_ARM64)
#               include <arm64intr.h>
#       endif
#endif

This behaviour is, roughly, "if you are not compiling with Clang, inlude the Visual Studio intrinsic headers". If your toolchain is neither, interesting things happen.

UBT will in fact download a full prebuilt Clang toolchain that it uses to do the build. It is up to your own level of paranoia how good of an idea you think this is. I used system Clang instead, it seemed to work fine and was also a few releases newer.

Warnings

The code is most definitely not warning-clean. When I got the core building started, Qt Creator compiled around ten files and reported some 3000 warnings. A lot of them are things like inconsistent overrides which could be fixed automatically with clang-tidy. (it is unclear whether UBT can be made to generate a compile_commands.json so you could actually run it, though). Once you disable all the noisy warnings (Professional driver. Closed circuit. Do not attempt!) all sorts of interesting things start showing up. Such as:

../Engine/Source/Runtime/Core/Private/HAL/MallocBinnedGPU.cpp:563:30: warning: result of comparison of constant 256 with expression of type 'uint8' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
        check(ArenaParams.PoolCount <= 256);

External dependencies

The engine has a lot of dependencies as could be expected. First I looked at how Zlib is built. Apparently with this shell script:

#!/bin/bash

set -x
tar xzf zlib-1.2.5.tar.gz

cd zlib-1.2.5
CFLAGS=-fPIC ./configure
make
cp libz.a ../libz_fPIC.a

make distclean
./configure
make
cp libz.a ../

echo "Success!"

set +x

I chose not to examine how the remaining dependencies are built.

Internal dependencies and includes

The source code is divided up into separate logical subdirectories like Runtime, ThirdParty, Developer and Programs. This is a reasonable way of splitting up your code. The advantages are fairly obvious, but there are also downsides. There is code in the Runtime directory that depends on things in the Developer directory and vice versa. Similarly you need a lot of code to build things like UnrealHeaderTool in the Programs directory, but it is then used in Runtime directory for code generation.

This means that the dependencies between directories are circular and can go kinda wild. This is a common thing to happen in projects that use a string-based dependency matching. If you can use any dependency from anywhere then that is what people tend to do. For example the last time (which was, granted, years and years ago) I looked up how different directories depend on each other in Google's Abseil project, the end result looked a lot like the subway map of Tokio.

In Meson you can only refer to dependencies that have already been defined (as opposed to lazy evaluation that happens with strings) this issue does not arise but the downside is that you need to organize your source tree to make it possible.

Each "module" within those top level dirs has the same layout. There is a Public directory with headers, Private directory with the rest of the stuff and a build definition source. Thus they are isolated from each other, or at least most of the time. Typically you include things from public directories with something like #include<foo/foo.h>. This is not always the case, though. There are also includes like #include"FramePro.h" to include a file in Public/FramePro/Framepro.h, so just adding the Public dir is not enough. Sometimes developers have not even done that, but instead #include<Runtime/Launch/Resources/Version.h>. This means that in order to build you need to have the root directory of the entire engine's source tree in the header include path which means that any source file can include any header they want directly.

Defines

A big part of converting any project is getting all the #defines right. Unreal does not seem to generate a configuration header but will instead add all flags on the command line of the compiler. Unreal has a lot of defines including things like __UNREAL__, which is straight up undefined behaviour. All tokens starting with two underscores (or an underscore and a capital letter) are reserved for the toolchain. Developers are not allowed to use them.

Not having a configuration header and hiding the compiler command lines has its own set of problems. The code has proper visibility export macros so that, for example, all functions exported from the core library are tagged with CORE_API. If you try to grep for that token you'll find that it is not actually defined anywere. This leads to one of two possibilities, either the token is defined via magic macro expansion from a common definition "somewhere" or it is set on the command line. To get around this I added a -DCORE_API= argument to make it work. If that is how it is actually supposed to work then on Windows you'd need to set it to something like -DCORE_API=__declspec(dllexport). Just be sure to quote it properly when you do.

This is where my journey eventually ended. When building the header tool I got this error:

../Engine/Source/Programs/UnrealHeaderTool/Private/IScriptGeneratorPluginInterface.cpp:27:3: error: use of undeclared identifier 'FError'; did you mean 'Error'?
                FError::Throwf(TEXT("Unrecognized EBuildModuleType name: %s"), Value);
                ^~~~~~
                Error

What this most likely means is that some define is either set when it should be unset, unset when it should be set or set to the wrong value.

Friday, April 1, 2022

The C++ best practices game jam sample project Meson conversion

Today is April Fool's day and also when the C++ best practices game jam begins. This is interesting because the Jam provides a modern multiplatform C++ starter project setup using multiple dependencies. I though it would be illuminating to convert it to build with Meson and compare the results with a modern build setup as opposed to one that has gathered unfortunate cruft for years. The original project can be found in this repo whereas my Meson conversion is here (I only tested it on Linux, though it "should work" on all platforms).

The original project builds with CMake and interestingly uses two different methods of obtaining external dependencies. Catch2, spdlog and docopt are obtained via Conan whereas FTXUI is downloaded with CMake's FetchContent. The relevant lines for the latter are the following:

include(FetchContent)

set(FETCHCONTENT_UPDATES_DISCONNECTED TRUE)
FetchContent_Declare(ftxui
  GIT_REPOSITORY https://github.com/ArthurSonzogni/ftxui
  GIT_TAG v2.0.0
)

FetchContent_GetProperties(ftxui)
if(NOT ftxui_POPULATED)
  FetchContent_Populate(ftxui)
  add_subdirectory(${ftxui_SOURCE_DIR} ${ftxui_BINARY_DIR} EXCLUDE_FROM_ALL)
endif()

The Meson version gets all the dependencies directly from WrapDB. The main advantage is that you don't need any third party dependency manager or provider, everything happens directly within Meson. The build definitions needed to get all these deps is just this:

docopt_dep = dependency('docopt')
ftxui_dep = dependency('ftxui-component')
spdlog_dep = dependency('spdlog')
catch2_dep = dependency('catch2')

All the nitty gritty details needed to download the dependencies are stored in the wrap files, which are created automatically with commands like following:

meson wrap install catch2

Thus for this project we find at least that Meson setup is both simpler and does not require external tooling.