sunnuntai 9. syyskuuta 2018

The compiler as a shared library

Since times immemorial, compilers have been run as standalone batch processes. If you have 50 files to compile, then you invoke the compiler 50 times, once on each file. Since each compilation is independent of all others, the work can be parallelised perfectly. This seems like a simple and optimal solution.

But, as is commonly the case, this is not the whole truth. When compiling code, there are many subtasks that are common to each individual compilation and this causes a lot of duplication of effort. Perhaps the best known case of this are C++ templates. They are parsed and codegenerated for each file that uses them yielding in the same code in dozens of files. Then the linker comes along and throws all but one of them away. There are a bunch of other issues which are discussed in this video from LLVM developer's conference:

A problem of state preservation

One of the best known solution to this problem are precompiled headers. They work roughly like this:
  1. Parse the contents of headers
  2. Dump compiler internal state to a file
  3. Load the file on each compiler invocation
The two main problems with this is that it requires someone to design and implement a full serialisation format for the compiler-internal data. That is a lot of tedious work that very few people will volunteer to do. The other downside is that the files need to be loaded explicitly from disk in every compilation process, which takes time, and that the build system needs to tell the compiler how to get this done. The granularity is also fairly coarse.

Ideally we would like to preserve as much data between two compiler invocations as possible without needing to serialise it to disk. As discussed in the above video, one solution is to have a "compiler plugin".

Almost every build system currently works roughly like this:
  1. Read build definition (such as a Ninja file)
  2. For each compilation, spawn a new compiler process and invoke the compiler executable
  3. Shutdown
The proposed new model would go like this (no build system currently supports this, but adding it to e.g. Ninja is not a massive undertaking):
  1. Read build definition
  2. dlopen the compiler shared library file
  3. For each compilation, create a new compiler object and invoke compilation using e.g. a thread pool
  4. Destroy compiler objects and dclose the file
  5. Shutdown
In this model all compilation jobs live in the same process, thus they can coordinate work behind the scenes however they wish. This requires some tricky code with thread safe caches and the like but it all internal to the compiler and never exposed. Even without caching this makes a difference on platforms such as Windows where process spawning is slow.

The big question remaining here is the API to use. It should have the following requirements:
  1. Must be ABI stable in the C sense
  2. Must be supportable on all compilers for all languages
  3. Must expose the full functionality of the compiler
  4. Must support an arbitrary number of compiler tasks within a single process

An API proposal for compiler invocation

On the face of it this seems like an impossible task. The API surface of a compiler is enormous and differs from compiler to compiler. However all of them already expose a stable ABI: the command line argument arrays. Exploiting this allows us to create an API supporting all of the requirements above with only six functions.

First we initialise the library:

CompilerService* compiler_init_service();

Here CompilerService is an opaque struct to a state object. There is one of these per process and it holds (internally) all the cached state and related things. Then we create a compiler object, one per compilation task:

Compiler* compiler_create_compiler(CompilerService *service);

Now we can invoke the compilation:

CompilationResult* compiler_compile(Compiler *c, int argc, const char **argv);

This invocation matches the signature of the main function. Since we are not going through the shell/kernel we can pass an arbitrary number of arguments without needing to use response files, quote shell characters or any other nastiness. The return value contains the return code and the strings for stdout and stderr. The standalone compiler executable such as cl.exe could (in theory ;-) be implemented by just calling these functions and returning the results to the calling process.

The last thing we need are the deallocation functions:

void compiler_free_compilation_result(CompilationResult *r);
void compiler_free_compiler(Compiler *c);
void compiler_free_service(CompilerService *s);

When will this be available in <my favorite compiler>?

Probably not soon, this is all slideware. There is no actual code to implement this (that I know of at least). The big problem here is that most compilers have not been written with this sort of usage in mind. The have global variables and other things hostile to usage as a shared library. Fixing all that to be thread safe and isolated is a lot of work. LLVM is probably the compiler that could most easily get this done since it has been designed to be used as a library from the beginning.

lauantai 18. elokuuta 2018

Linker symbol lookup order does not work the way you think

A common problem in linking problems has to do with circular dependencies. Suppose you have a program that looks like this:



Here program calls into function one, which is in library A. That calls into function two, which is in library B. Finally that calls into function three, which is back in library A again.

Let's assume that we use the following linker line to build the final executable:

gcc -o program prog.o liba.a libb.a

Because linkers were originally designed in the 70s, they are optimized for minimal resource usage. In this particular case the linker will first process the object file and then library A. It will detect that function one is used so it will take that function's implementation and then throw the rest of library A away. It will then process library B, take function two in the final program and note that function three is also needed. Because library A was thrown away the linker can not find three and errors out. The fix to this is to specify A twice on the command line.

This is how everyone has been told things work and if you search the Internet you will find many pages explaining this and how to set it up linker command lines correctly.

But is this what actually happens?

Let's start with Visual Studio

Suppose you were to do this in Visual Studio. What do you think would happen? There are four different possiblities:
  1. Linking fails with missing symbol three.
  2. Linking succeeds and program works.
  3. Either 1. or 2. happens, but there is not enough information to tell which.
  4. Linking succeeds but the final executable does not run.
The correct answer is 2. Visual Studio's linker is smart, keeps all specified libraries open and uses them to resolve symbols. This means that you don't have to add any library on the command line twice.

Onwards to macOS

Here we have the same question as above but using macOS's default LLD linker. The choices are also the same as above.

The correct answer is also 2. LLD keeps symbols around just like Visual Studio.

What about Linux?

What happens if you do the same thing on Linux using the default GNU linker? The choices are again the same as above.

Most people would probably guess that the correct answer here is 1. But it's not. What actually happens is 3. That is, the linking can either succeed or fail depending on external circumstances.

The difference here is whether functions one and three are defined in the same source file (and thus end up in the same object file) or not. If they are in the same source file, then linking will succeed and if they are in separate files, then it fails. This would indicate that the internal implementation of GNU ld does not work at the symbol level but instead just copies object files out from the AR archive wholesale if any of their symbols are used.

What does this mean?

For example it means that if you build your targets with unity builds, their entire symbol resolution logic changes. This is probably quite rare but can be extremely confusing when it happens. You might also have a fully working build, which breaks if you move a function from one file to another. This is a thing that really should not happen but when it does things get very confusing.

The bigger issue here is that symbol resolution works differently on different platforms. Normally this should not be an issue because symbol names must be unique (or they must be weak symbols but let's not go there) or the behaviour is undefined. It does, however, place a big burden on cross platform projects and build systems because you need to have very complex logic in place if you wish to deduplicate linker flags. This is a fairly common occurrance even if you don't have circular dependencies. For example when building GStreamer with Meson some time ago the undeduplicated linker line contained hundreds duplicated library entries (it still does but not nearly as many).

The best possible solution would be if GNU ld started behaving the same way as VS linker and LLD. That way all major platforms would behave the same and things would get a lot simpler. In the mean time one should be able to simulate this with linker grouping flags:
  1. Go through all linker arguments and split them to libraries that use link_whole and those that don't. Throw away any existing linker grouping.
  2. Deduplicate and put the former at the beginning of the link line with the requisite link_full arguments.
  3. Deduplicate all entries in the list of libraries that don't get linked fully.
  4. Put the result of 3 on the command line in a single linker group.
This should work and would match fairly accurately what VS and LLD already do, so at least all cross platform projects should work out of the box already.

What about other platforms?

The code is here, feel free to try it out yourself.

perjantai 17. elokuuta 2018

The Internet of 200 Kilogram Things: Challenges of Managing a Fleet of Slot Machines

In a previous post we talked about Finland's Linux powered slot machines. It was mentioned that there are about 20 000 of these machines in total. It turns out that managing and maintaining all those machines is a not as easy as it may first appear.

In the modern time of The Cloud, 20 thousand machines might not seem like much. Basic cloud management software such as Kubernetes scales to hundreds of thousands, even millions of machines without even breaking a sweat. Having "only" 20 thousand machines may seem like a small and simple thing that can be managed by one intern in their spare time. In reality things get difficult as there are many unique challenges to managing slot machines as opposed to regular servers.

The data center

Large scale computer fleets are housed in data centers. Slot machines are not. They are scattered across Finland in supermarkets and gas stations. This means that any management solution based on central control is useless. Another way of looking at this is that the data center housing the machines is around 337 thousand square kilometers in size. It is left as an exercise to the reader to calculate the average distance between two nearest machines assuming they are distributed evenly over the surface area.

Every machine is needed

The mantra of current data center design is that every machine must be expendable. That is, any computer may break down at any time, but the end user does not notice this because all operations are hidden behind a reliable layer. Workloads can be transferred from one machine to another either in the same rack, or possibly even to the other side of the world without anyone noticing.

Slot machines have the exact opposite requirements. Every machine must keep working all the time. If any machine breaks down, money is lost. Transferring the work load from a broken machine in the countryside to Frankfurt or Washington is not feasible, because it would require also moving the players to the new location. This is not very profitable, as atoms are much more expensive and slow to transfer between continents than electrons.

The reliability requirements are further increased by the distributed locations of the machines. It is not uncommon that in the sparsely populated areas the closest maintenance person may be more than 400 km away.

The Internet connection

Data centers nowadays have 10 Gb Ethernet connections or something even faster. In contrast it is the responsibility of the machine operator to provide a net connection to a slot machine. This means that the connections vary quite a lot. At the lowest end are locations that get poor quality 3G reception some of the time.

Remote management is also an issue. Some machines are housed in corporate networks behind ten different firewalls all administered by different IT provider organisations, some of which may be outsourced. Others are slightly less well protected but flakier. Being able to directly access any machine is the norm in data centers. Devices housed in random networks do not have this luxury.

The money problem

Slot machines deal with physical money. That makes them a prime target for criminals. The devices also have no physical security: you must be able to physically touch them to be able to play them. This is a challenging and unusual combination from a security point of view. Most companies would not leave their production servers outside for people to fiddle around with, but for these devices it is a mandatory requirement.

The beer attack

Many machines are located in bars. That means that they need to withstand the forces of angry intoxicated players. And, as we all know, drunk people are surprisingly inventive. A few years ago some people noticed that the machines have ventilation holes. They then noticed that pouring a pint of beer in those holes would cause a short circuit inside the machine causing all the coins to be spit out.

This issue was fixed fairly quickly, because you really don't want to be in a situation where drunk people would have financial motivation to pour liquids on high voltage equipment in crowded rooms. This is not a problem one has to face in most data centers.

Update challenges

There are roughly two different ways of updating an operating system install: image based updates and package based updates. Neither of these works particularly well in slot machine usage. Games are big, so downloading full images is not feasible, especially for machines that have poor network connections. Package based updates have the major downside that they are not atomic. In desktop and server usage this is not really an issue because you can apply updates at a known good time. For remote devices this does not work because they can be powered off at any time without any warning. If this happens during an upgrade you have a broken machine requiring a physical visit from a maintenance person. As mentioned above this is slow and expensive.

sunnuntai 12. elokuuta 2018

Implementing a distributed compilation cluster

Slow compilation times are a perennial problem. There have been many attempts at caching and distributing the problem such as distcc and Icecream. The main bottleneck on both of these is that some work must be done on the "user's desktop" machine which is then transferred over the network. Depending on the implementation this may include things such as fully preprocessing the source file and then sending the result over the net (so it can be compiled on the worker machine without needing any system headers).

This means that the user machine can easily become the bottleneck. In order to remove this slowdown all the work would need to be done on worker machines. Thus the architecture we need would be something like this:


In this configuration the entire source tree is on a shared network drive (such as NFS). It is mounted in the same path on all build workers as well as the user's desktop machine. All workers and the desktop machine also must have an identical setup, that is, same compilers and installed dependencies. This is fairly easy to achieve with Docker or any similar container technology.

The main change needed to distribute the work is to create a compiler wrapper script, much like distcc or icecc, that sends the compilation request to the work distributor. It consists only of a command line to execute and the path to run it in. The distributor looks up the machine with the smallest load, sends the command, waits for the result and then returns the result to the developer machine.

Note that the input or output files do not need to be transferred between the developer machine and the workers. It is taken care of automatically by NFS. This includes any changes made by the user on their local checkout which are not in revision control. The code that implements all of this (in an extremely simple, quick, dirty and unreliable way) can be found in this Github repo. The implementation is under 300 lines of Python.

Experimental results

Since I don't have a data center to spare I tested this on a single 8 core i7 computer. The "native OS" ran the NFS server and work distributor. The workers were two cloned Virtualbox images each having 2 cores. For testing I compiled LLVM, which is a fairly big C++ code base.

Using the wrapper is straightforward and consists of setting up the original build directory with this:

FORCE_INLINE=1 CXX='/path/to/wrapper workserver_address g++' cmake <options>

Force inline is needed so configuration tests are run on the local machine. They write to /tmp, which is not shared and the executables might be run on a different machine than where they are compiled leading to failures. This could also be solved by having a shared temporary folder but that would increase the complexity of this simple experiment.

Compiling the source just over NFS in a single machine using 2 cores took about an hour. Compiling it with two workers took about 47 minutes. This is not particularly close to the optimal time of 30 minutes so there is a fair bit of overhead in the implementation. Most of this is probably due to NFS and the fact that absolutely everything ran on the same physical machine. NFS also had coherency problems. Sometimes some process invocations could not see files created by their dependency tasks. The most common case was linker invocations, which were missing one or more object files. Restarting the build always made it pass. I tried to add sync commands as necessary but could not make it 100% reliable.

Miscellaneous things of note

In this test only compilation was parallelised. However this same approach works with every executable that is standalone, that is, it does not need to talk to any other ongoing process via IPC. Every build system that supports setting the compiler manually can be used with this scheme. It also works for parallelising tests for build systems that support invoking tests with an arbitrary runner. For example, in Meson you could do this:

meson test --wrapper='/path/to/wrapper workserver_address'

The system also works (in theory) identically on other operating systems such as macOS and Windows. Setting up the environment is even easier because most projects do not use "system dependencies" on those platforms, only the compiler. Thus on Windows you could mount a smb drive with the code on, say, D:\code on all machines and, assuming they have the same version of Visual Studio, it should just work (not actually tested).

Adding caching support is fairly easy. All machines need to have a common directory mounted, point CCACHE_DIR to that and set the wrapper command on the desktop machine to:

CXX='/path/to/wrapper workserver_address ccache g++'

torstai 26. heinäkuuta 2018

Building native multiplatform GUI apps with Meson

A recent trend in multiplatform GUI applications is to create the core business logic of the application in something like C++, have it (optionally) expose a plain C interface and then create a gui on top of that using the native widget set of each supported platform. This means that the application uses GTK on Linux and other unixes, Cocoa on macOS, win32 API on Windows, Java widgets on Android and so on. This makes the application fully native on all platforms. The tradeoff is having to write the gui multiple times against not having to wrangle a multiplatform widget toolkit as your dependency.

Regardless of how you build your guis you need to have a build system that can build the application under all these different environments from a single code base. To this end I created a sample application called Platypus, which can be downloaded from this Github repo.

The code and compilation

The application itself is extremely simple. It consists of one shared library that returns a random number between 0 and 100 when called. It is implemented using C++ 11's random number generator functionality to ensure each platform has a toolchain new enough to handle it. The GUI applications built on top of it have a text label and a button. Pressing the button updates the text label with a new random number. There is also a test program that verifies that the library is working.

The GTK version is a plain C application. The gui is defined using a Glade interface definition file rather than building it by hand.

The macOS version has a GUI written in Objective C. The gui is defined as a XIB file created with XCode. It is built into a standard app bundle.

The Windows application is written in C++ (though it does not really use any C++ features) and has a gui laid out by hand.

All these guis have full platform integration with icons, an Info.plist, .desktop files and so on.

The installers

The GTK version can be built as a Flatpak in the usual way. The build manifest can be found in the repository's root.

The macOS version builds a standard .dmg installer that can be directly shipped to end users.

The Windows version builds an .MSI installer providing complete install/uninstall integration.

How complicated is it?

The entire build definition consists of 107 lines of Meson.

Screenshots

Here is the plain GTK version running as a Flatpak application on Kubuntu. Window icons and desktop integration work as you would expect.


Here is the macOS version showing the drive image, the installer window and the application running with proper platform integration.


Finally here is the Windows application showing the installed path location under Program Files, the application itself and the automatic integration to Windows' application uninstaller system.


Future plans

It would be cool to add an Android application as well as an iOS application written in Swift in the code base. Patches are welcome as always.

keskiviikko 25. heinäkuuta 2018

Why Git is terrible in four pictures

I was asked to write a blog post on why I dislike Git in general and its UI in particular. Here is a representative sample in four images.

Recently a pull request was filed that looked like this:


As you can see there is an extra merge commit. As is customary we wanted to get rid of it to get a clean rebase based merge history. To do that you'd first get a checkout of the code and look at the log, which looks like the following.


So far, so good. Now let's do a rebase --interactive. It looks like this:


Suddenly Git has chosen to silently remove the merge commit from this list. Why? I have no idea. The commit had changes in it, so it was not pruned because it was empty. If you then exit the editor without any changes (which usually means "do not change anything") then the commit is deleted and any changes that were in it are gone:


If your latter commits built on those changes, you get yummy merge conflicts for something that is conceptual a no-op.

This is the essence of working with Git. Most of the time it works sort of ok, but every now and then it will, without any warning or reason, completely screw you over, destroy your data and leave you stranded, forced to debug your way out of the resulting mess without any help.

"Of course it breaks, you should have used --do-not-do-the-idiotic-wrong-thing-which-for-some-reason-is-the-default command line option, everyone knows that, duh!"

A common kneejerk response to these kinds of problems is that it is somehow the user's own fault and that they should have memorized every quirk in the software in order to use it correctly (or at all). I'm certain some of you out there on the Internet had already started writing a strongly worded message to let me know that. Don't bother.

Whenever you have a piece of software that silently destroys user data, the fault always, always, ALWAYS lies with the program. Even if "it only happens rarely". Even if you think "it's the user's fault". Even if you personally know how the problem could have been avoided. The flaw is ABSOLUTELY ALWAYS in the software. Never in users. Ever.

Any attempt at shifting the cause to the user, for whatever reason, is victim blaming. Don't do it.

maanantai 16. heinäkuuta 2018

How expensive is globbing for sources in large projects

A common holy war in build systems is whether you should explicitly list all sources that make up a target or use a globbing pattern. There are both technical and non-technical arguments on both sides. The latter mostly deal with reliability and flexibility vs convenience. In this post we are going to ignore them completely and instead focus on the technical parts, specifically the overhead of globbing. The measurement script used can be downloaded from this repo.

In this test we used the full checkout of Chromium source code. The tests were run under Windows, since it is noticeably slower than Linux on both file operations and process invocations. The task simulation consists of roughly three parts:

  1. Scan the source tree for all directories that contain sources
  2. Generate glob patterns for detected directories (corresponding roughly to "one target for all sources in one directory")
  3. Run the globs
This ignores a bunch of steps, such as serialising the glob results to files and calculating the delta between two glob sets. These are probably fairly fast compared to file access operations, though.

Scanning the source tree and generating the globs

There is no direct correlation between this step and a regular build system. It is mostly interesting as a comparison between file operations between a hot and a cold cache. Running the scan on a cold cache takes 2 minutes but for a warm cache about 6 seconds.

Since this step is always run first, the following tests are all operating with a hot cache.

The actual globbing

Running all globs on the Chromium source tree takes between 2 and 6 seconds. This is the absolute lowest time that can be obtained for a no-op build without daemons because all globs must be re-evaluated every time.

The rule of thumb for UI design is that everything under one second is perceived as instantaneous. This means that for these sizes globbing causes a noticeable delay. Whether this is seen as insignificant or aggravating depends on each user.

Extra bonus: C++ modules

Since we have the measurement script, let's use it for something more interesting. Modules are an upcoming C++ feature to increase build times and a ton of other coolness depending on who you ask. The current specification works by having a kind of "module export declaration" at the beginning of source files. The idea is that you first compile those to generate a sort of a module declaration file and then you can start the actual compilation that uses said files.

If you thought "waitaminute, that sounds exactly like how FORTRAN is compiled", you are correct. Because of this it has the same problem that you can't compile source files in an arbitrary order, but instead you must first somehow scan them to find out the interdependencies between source (not header) files. In practice what this means is that instead of single-phase compilation all files must be processed twice. All scan operations must be done before any compilation jobs can start because otherwise you might start to compile a file before its dependencies are fully processed.

The scanning can be done in one of two ways. Either the build system scans the sources meaning it needs to understand the syntax of source files or the compiler can be invoked in a special preprocessing mode. Note that build systems such as Ninja do not do any such operations by themselves but instead always invoke external processes to do their work.

Testing the performance impact of these two is straightforward. The first one can be done by reading the first ten lines of each source file and then throwing them away. Measuring this time gives a fairly good estimate of the file processing overhead. The second way can be measured by doing the exact same thing but also invoking the compiler with no-op command line arguments to get the process invocation overhead.

Scanning the files directly takes roughly 120 seconds. For an 8 core machine this means a 15 second delay (at minimum) before any compilation tasks can begin. This is not great but for a full build it should be tolerable.

When spawning a compiler process the same operation takes 69 minutes. This is intolerably slow and would require an order of magnitude speedup in compilation times to be worthwhile. Unlike regular compilations, dependency scanning can not be sped up with unity builds because the specification requires that the module declaration must be at the very beginning of source files (and presumably there can not be more than one in a single TU).

keskiviikko 13. kesäkuuta 2018

Easy MSI installer creator

Shipping programs on Windows platforms becomes a lot simpler (especially in corporate environments) if you can create an MSI installer. The only Free software solution for that is the WiX installer toolkit. The fairly big downside to this is that it very much tied to how Visual Studio does things with GUIDs and all that. The installer's contents and behavior is defined with an XML file whose format is both verbose and confusing.

Most Unix developers, once faced with this, will almost immediately blurt out something like "Why can't I just do DESTDIR=c:\some\path ninja install and have it make an installer out of the result?" So I created a script that does exactly that.

The basic usage is simple. First you do a staged install into some directory and create a JSON file describing the installation that would look like this:

{
    "update_guid": "YOUR-GUID-HERE",
    "version": "1.0.0",
    "product_name": "Product name here",
    "manufacturer": "Your organization's name here",
    "name": "Name of product here",
    "name_base": "myprog",
    "comments": "A comment describing the program",
    "installdir": "MyProg",
    "license_file": "License.rtf",
    "parts": [
        {"id": "MainProgram",
         "title": "Program name",
         "description": "The MyProg program",
         "absent": "disallow",
         "staged_dir": "staging"
        }
    ]
}

Running the script would then create a standalone MSI installer with the contents of the staging directory.

Multiple components in one installer

Some programs ship with multiple parts that the user can choose whether to install each part. This is supported by the script. First you must split the files in multiple staging directories, one per component and then add entries to the parts array. See the repository for an example.

maanantai 23. huhtikuuta 2018

Dependencies with code generators got a lot smoother with Meson 0.46.0

Most dependencies are libraries. Almost all build systems can find dependency libraries from the system using e.g. pkg-config. Some can build dependencies from source. Some, like Meson, can do both and toggle between them transparently. Library dependencies might not be a fully solved problem but we as a community have a fairly good grasp on how to make them work.

However there are some dependencies where this is not enough. A fairly common case is to have a dependency that has some sort of a source code generator. Examples of this include Protocol Buffers, Qt's moc and glib-mkenums and other tools that come with Glib. The common solution is to look up these binaries from PATH. This works for dependencies that are already installed on the system but fails quite badly when the dependencies are built as subprojects. Bootstrapping is also a bit trickier because you may need to write custom code in the project that provides the executables.

Version 0.46.0 which shipped yesterday has new functionality that makes this use case noticeably simpler. In Meson you find the code generator scripts to run with the find_program command like this:

mkenums_exe = find_program('glib-mkenums')

This will find the executable from the system. However if you have built Glib as a subproject, then it can issue the following statements (this is not in Glib master yet AFAIK so it does not work, this is more of an illustrative example):

internal_mkenums_exe = <command to generate the mkenum script>
meson.override_find_program('glib-mkenums', internal_mkenums_exe)

After this issuing find_program('glib-mkenums') no longer goes to the system, but instead returns the internal program. Meson's internal helper modules have also been updated to always find the programs they use with find_program. This means that all projects using Glib functionality can be built without needing a system wide install of Glib. Even more importantly this requires zero changes in existing projects. It will just work out of the box. You can even use Glib helper code when building Glib itself.

This is especially convenient when you need a newer version of any dependency than your distro provides and especially on platforms such as Windows where "distro dependencies" do not exist.

As an example of what is possible, Nirbheek has managed to bootstrap GStreamer on Windows using nothing but Visual Studio, Python 3, Ninja and Meson. The main limitation currently is that the overriding executable may not be a build target (i.e. something you build from source with a compiler) because the result of find_program may be used during the configuration phase, before any source code compilation has taken place. We hope to remove this limitation in a future release.

sunnuntai 8. huhtikuuta 2018

Cookie purging the simple way

Getting rid of cookies (especially tracking and ad cookies) consistently is a good thing. However it turns out to be a bit tricky because you don't want to get rid of session cookies for sites you care about. Basically what you want to achieve is this:

  1. Store all cookies as normal
  2. Maintain a whitelist of servers that are allowed to store persistent cookies (usually for sites such as Github, Reddit, Twitter and the like)
  3. At regular intervals (preferably every time the browser is closed), delete all cookies not whitelisted.
There are browser extensions to do this but they are often bizarrely complex and even those that aren't are inconvenient to use as they require installing plugins, clicking through menus and so on. Firefox should have builtin functionality to do this also, but I read through instructions online on how to do it and could not understand how you should set it up to get it to work.

Thus as an experiment I wrote a Python script to do this, it is available in this Github repo. Using it is simple:

  1. Write a whitelist file consisting of one hostname per line. (all subdomains of the specified host are also permitted)
  2. Shut down Firefox.
  3. Run the script.
  4. Start Firefox.

keskiviikko 4. huhtikuuta 2018

Comparing Meson with Bazel on a Raspberry Pi 3

In this experiment we compile Google's Abseil C++ libraries with Bazel and also with Meson as a simple comparison of how they behave.

Apple and orange warning!

Please do not use this text to exclaim that one of these build systems is better/more performant/etc than the other. That's not really what this is for. The two build systems build the code completely differently to different artifacts and with different targets. Consider this a more of a rough outline.

The Meson conversion

The code was converted with a simple script and then manually fixing some dependency declarations to make everything work. Due to complicated reasons there's no Git repo. Instead you can download the whole shebang as a zip file from this location. It's not against current trunk, but instead a random commit from some time ago when I started.

The original Bazel build does a bunch of complicated things with shared libraries and the like. The Meson one simply builds a static library for each of the Abseil modules. This is not particularly efficient but I just wanted to get something out without spending days replicating the build setup.

The build setup is complete enough to build all unit tests apart from ones that seem to require magic compiler flags, because they give compilation errors about missing timespec definitions.

Memory usage

When doing the actual compilation, Meson is not resident in memory. Only Ninja is, and it takes 2-5 MB of memory in total.

The Bazel master process takes roughly 90 MB when compiling. That is almost 10% of total system memory.

CPU utilization

Based on top/htop eyeballing, Ninja keeps all cores pegged almost all of the time. The time command reported CPU usage of 328%. It was necessary to manually specify -j 4 to Ninja (the default value is 6) because otherwise the system would hard freeze under load, most likely due to memory running out.

Weirdly Bazel had a really hard time keeping cores running. It was common to have 1-3 cores idle (that is, not even waiting for IO) during the build. It is not known what causes this. Perhaps a lot of time is spent doing the file copies and symlinks that Bazel needs for its hermetic builds. But even then, maximal usage of resources is one of Bazel's claimed strong points but in this particular case that does not seem to be happening. It is possible that the behaviour has been tuned to data centers with tens of cores and fast SSDs and because of that does not scale down to ARM processors with an SD card for storage.

Total compile time

Meson used 6 minutes whereas Bazel used 17 minutes. But note the text above! Do not use this as any sort of "real" perf measurement because the setups were different. That being said if you consider that Ninja gets up to 3x better CPU utilization, the numbers seem to be in the rough neighborhood as far as total CPU usage is concerned.

Ninja reports doing 190 build steps whereas Bazel reports a number on the order of 4-500. Many of these seem to deal with file copying and the like which the Meson setup does not do at all. Effort was not spent on examining what these steps do and how (or if) they could be replicated in Meson.

Which one is better/which one should I use/which subreddit should I post this to?

That's not what this post was about. Do your own tests and draw your own conclusions.

torstai 29. maaliskuuta 2018

And now for something completely different

If anyone has been wondering why merge requests have not been reviewed fast enough recently, here is one of the reasons.

keskiviikko 7. maaliskuuta 2018

Stop trying to guess display language based on keyboard layout

A very common setup among non-English speaking computer power users is to have display language in English but have a country specific keyboard layout. For example I'm using the Finnish keyboard because I need to be able to easily type our special letters ö and ä. If your computer comes from someone else (such as your employer) there is not even the possibility to have a non-Finnish keyboard layout.

All of this has worked for tens of years flawlessly. However recently I have noticed that many programs on multiple platforms seem to alter their display language based on keyboard layout, which is just plain wrong. Display language should be chosen based on the display language choice and nothing else.

I first noticed this in the output of ls, which one would imagine to have reached stability ages ago.


Here we see that ls has chosen to print months in Finnish. Why? I have no idea. This was weird on its own, but then it spread to other operating systems as well. For no reason at all the existing Gimp install switched its display language to Finnish.


Let me reiterate: no setting was changed and the version of Gimp was exactly the same. One day it just decided to change its language to Finnish.

Then the issue spread to Windows.


VLC on Windows has chosen on my behalf to show its menus in Finnish in a completely English Windows 7 install. The only things it could use for language detection are geolocation and keyboard settings and both of these are terrible ideas. The OS has a language. It is very clearly specified. All other applications obey it, VLC should too.

The real kicker here is that Gimp on Windows displays English text correctly, as does VLC on macOS.

The newest case is the new Gnome-ified Ubuntu, whose lock screen stubbornly displays dates in the wrong language. It also does not conjugate the words correctly and has that weird american month/date date ordering which is wrong for Finnish.


What is causing this?

I don't know. But whoever is behind this: please stop doing that.

sunnuntai 4. maaliskuuta 2018

Compiling Cargo crates natively with Meson

Recently we have been having discussions about how Rust and Meson should work together, especially for mixed language projects. One thing which multiple people have told me (over a time span of several years, actually) is that Rust is Special in that everyone uses crates for everything. Thus there is no point in having any sort of Rust support, the only true way is to blindly call Cargo and have it do everything exactly the way it wants to.

This seems like a reasonable recommendation so I did what every reasonable person would do and accepted this as is.

David Addison wearing cardboard x-ray goggles with caption "This is me being completely reasonable".

But then curiosity takes hold of you and you start to wonder. Is that really the case?

Converting Cargo manifests to Meson projects

The basic setup of a Cargo project is fairly straightforward. The file format is mostly declarative and most crates are simple libraries that have a few source files and a bunch of other crates they link against. Meson has the same primitives, so could they be automatically converted.

It turns out that for simple examples you can. Here is a sample repo that downloads the Itoa crate from github, converts the Cargo build definition into a Meson project and builds it as a subproject (with Meson, not Cargo) of the main project that uses it. This prototype turned out to require 71 lines of Python.

What about dependencies other than itoa?

The script currently only works for itoa, because crates.io does not seem to provide a web API for queries and the entire site is created with JavaScript so you can't even do web scraping easily. To get this working properly the only thing you'd need is a function to get from (crate name, version) to the git repo.

What about dependencies of dependencies?

They can be easily grabbed from Cargo.toml file. Combined with the above they could be downloaded and converted in the same fashion.

What doesn't work?

A lot. Unit tests are not built nor run, but they could be added fairly easily. This would require adding compile options so the actual source could be built with the unittest flags. This is some amount of work but Meson already supports a similar feature for D so adding it should not be a huge amount of work. Similarly docs are not generated.

What about build.rs?

Cargo provides a fairly simple project model and everything more complex should be handled by writing a build.rs program that does everything else necessary. This suffers from the same disadvantages as every Turing complete build system ever has, and these scripts are not in general possible to convert automatically.

However based on documentation the common case seems to be to call into external build tools to build dependency libraries in other languages. In a build system that builds both parts at the same time it would be possible to create a better UX for this (but again would obviously not be something you can convert automatically).

Could this actually work in practice with real world projects?

It might. It might not. Ignoring the previous segment no immediate showstopper has presented itself thus far. It might in the future. But you never know until you try.

keskiviikko 28. helmikuuta 2018

On the unoptimalities of language specific build systems

A fairly big recent trend has been the emergence of new programming languages that are meant to be compiled into machine code. The silent (and sometimes not so silent) goal of these languages has been to replace C and C++ as the dominant systems programming language.

All of these languages come with their own build system and dependency management optimised for that particular language. This makes sense as having a good developer experience is important and not having 20-30 years of legacy to carry with you means you can design and develop slick systems relatively easily. But, as always, there is a downside. Perhaps the main issue comes up pretty quickly when trying to combine said code with projects in other languages.

A common approach is for the programming language in question to bundle up all its dependencies as source in a big clump. Then the advocates will say that "it's simple, just call our build system from yours and it gets built". This seems simple but it uses the weasieliest of all weasel words: just. Whenever someone tells you to "just" do something, what they almost always do is trying to trivialise away the hardest part of the entire operation. So it is here as well.

When could it work?

There is one case where this approach works without problems. That is when the dependency builds into a single library with a C interface and it also ships the header and a pkg-config file to use it. This case is indistinguishable from a plain C library so it will work exactly the same. The dependency can be provided as a system package or built as a dependency in a Flatpak manifest or any other similar issue.

Unfortunately this system breaks down the second you want to do anything else. The most common requirement is to build all dependencies from source in a single build step. This is necessary on any platform that does not have a concept of "system" package manager. Many people also want to do this on Linux systems to, for example, build their project's trunk against their dependencies' trunks. This is where things fall down.

The myth of the build dir

Most people probably haven't thought about the build directory of their builds. The most common conception is that the build system just (there's that word again) compiles source code into object files and then targets and that the installation step merely copies the files out to the staging directory. This is not true in the slightest.

Build systems need to do a whole lot of stuff to make things workable directly from the build tree. Every build system does it slightly (and sometimes massively) differently. More importantly the way each build system does it is not stable. They are allowed to, and will, change the way the build tree is laid out at any time. Nothing inside the build tree is stable, not file formats, not directory layouts, nothing.

The problem with building source code with two different build systems in a single build is that eventually they need to work together. Libraries need to be linked. Sources need to be generated. Executables need to be run. That means joining two different completely unstable elements together. The simplest problem in this space is about file layouts. Every build system expects a certain layout for the files it manages. This is usually very different from other build systems. Thus in order to work, there would need to be a way for every build system to be told to adapt to a different system's file layout when run as its subtask.

This is a challenging place to be requesting, because it takes a lot menial work that build systems have traditionally (ever, actually) been unwilling to do. Guessing the subtask's layout and hoping that it does not change might work for any amount of time and then breaks for the slightest of reasons. The problems only get harder from there.

N^2 manual work algorithms are awesome!

Even if this would work (and it does not) the next problem comes from scaling up. You can only "just call" from one build system to another if someone has taken the time to make one understand the other. This is simple for two build systems: you need to write two integrations, one in each direction. But suppose we live in a world where many of the common C libraries in use today have been replaced by implementations in another languages. If you were doing cross platform mobile development then you could have C, C++, Java, D, Rust, Go and Swift in the same project.

Seven languages means seven different build systems and possibly more since C and C++ commonly have more than one dominant build system. This means reading and understanding seven different build system syntaxes and mental models. If you want to combine those freely it means writing 7 x 7 = 49 different build system integrations who must, lest it be forgotten, combine the unstable innards of all of these. And then it gets worse.

Since every language has its own package manager and dependency downloader, you now have up to seven package managers in your project. Actually no, that is a lie.

The tangled web of lies and deceit dependencies

When talking about dependencies between projects in different languages, most people usually mean a dependency graph like this.

That is, there is one dependency of a single language and a second one of a different language that uses it. For this simple case most things are feasible. But let's see what happens when we add just one more project.

Here we have project 1 using language 1. It has a dependency to project 2 in language 2. However project 2 has an internal dependency on project 3 which is also written in language 1. The question now becomes: how should this be built?

Since languages 1 and 2 use their own build tool and language manager, the two edgemost projects don't know that they are being built as part of the same project. Language 2 completely hides its dependency, as it should. The two projects need to work independently. This means that each one of them must determine its dependencies in isolation. If they download their dependencies during configuration time then for each build setup you are accessing the dependency provider twice. Doing more dependency resolutions than you have languages in your project seems suboptimal.

The other approach to this is usually called vendoring. In this each project in a language is only used as a tarball and it embeds all its own dependencies as source code. This seems like a working solution but it's not really. Many modern languages go the NPM route where it is considered good practice to have many small dependencies. It is not uncommon for medium or even small projects to have 50+ dependencies. This leads to problems such as these:


Here project 1 depends on two different projects that are both implemented in language 1. Just like above these two projects don't know of each other because their dependency chain goes via language 2 that hides it. Both of these projects have their internal dependencies embedded so they can be built from scratch without problems.

The problem here is that due to basic popularity and probability theory, the embedded dependencies of these two projects have many of the same dependencies. The dependencies might have the same versions or they might differ. If they both end up in the toplevel executable you get, depending on your toolchain and the phase of the moon, either a working binary or the nastiest of linker bugs to fix.

Even if this yields a working program there is a big downside: compilation time takes up to twice as long because you have to compile the same dependencies twice in different but isolated parts of the build tree. As a rough approximation this means that adding a dependency to a dependency graph like this goes from being a O(1) more work to being O(N) more work because dependency graphs can not be deduplicated if there is a dependency of a different language between the two. It is left as an exercise to the reader to visualize what this would look like on a huge project such as Chromium.

The simple solution

There is a simple solution to this problem and it is very popular among language zealots: reducing the number of languages to one by claiming that in the future everything will be written in their own favourite language. It does not matter what the growth rate of complexity is if it will only be evaluated for the value 1.

The reduction of programming languages to one is expected to happen any minute now, immediately after mr Godot brings us the news on Eastasia's surrender.

The real problem

All of this boils down to the fact that language specific build systems are two opposing things at the same time. They are both a very comfortable gilded cage and an extremely isolating silo. They fertilise and promote cooperation within their own group but make things a lot harder for cooperation between groups.

One of the things we learn from history is that people who have opposed cooperation have, ultimately, lost to those who have promoted it. Maybe we should heed the teachings of history and start working towards better, more encompassing dependency management.

sunnuntai 18. helmikuuta 2018

Projects and features Meson could use help with

A question I was asked during my LCA2018 presentation was how people could help the Meson project. I could not come up with proper projects off the cuff, so here are a bunch of things that have come up since. Feel free to contact us via IRC, email or any other medium if you wish to contribute.

WrapDB wrangler

WrapDB provides a simple way to download source dependencies automatically. Basically it takes an upstream release tarball, adds Meson build files to it if needed and publishes the result on the web. The work consists mostly of reviewing and merging submissions from the community. Creating your own is also fine. This is a fairly lightweight task, only requiring actions every now and then (submissions come less than once a week, typically).

CI fixer upper

For CI we use the free tiers of Travis and Appveyor. This works fairly well but it is very slow because our testing matrix is huge. Running the full test suite through AppVeyor takes about an hour. This slows us down a fair bit and in addition both CI providers have a nasty habit of breaking down fairly often. We also don't want to do priced tiers because they get ridiculously expensive for our usage pattern (as in, a few months of paid for macOS would cost more than a brand new Mac Mini).

We don't have any good ideas on how to make this better. If you do, let us know.

Large scale regression tester

Meson is being used by a fairly large number of projects. This makes fixing bugs and refactoring code challenging because there is the possibility of regressions. It would be nice if we could do something similar to Rust developers and rebuild all or a large fraction of projects using Meson with the trunk version every now and then.

XCode backend improvements

The XCode backend is currently a bit crappy. The main reason for this is that the XCode project file format is awful in many ways. The two main reasons being that it is completely undocumented and the fact that it is not really a file format as such, but more of a memory dump of XCode's internal data structures. But if you are the sort of person who enjoys a challenge of battling against windmills, this might be for you.

Meson build file rewriter

Integration with IDEs and the like is important and we want to provide tools for operations such as "add source file X to target Y" so everyone does not have to write their own implementations. There is actually code for this in trunk but it is quite limited and has bitrotted a fair bit. Resurrecting and making the code actually work would be very welcome.

Introspect improvements

This one also aims to improve the IDE integration features of Meson. As an example you can only get information about build targets one by one. This means that getting the information from a project that has thousands of targets takes forever. We really need a batch exporter so IDEs can grab all necessary project information in one go. There are probably a bunch of other things to improve as well.

Could these be done as part of gsoc/outreachy/other?

Possibly. Meson is not really an "entity" in the gsoc sense but we could potentially get something accepted under the Gnome umbrella. However anyone is welcome to submit patches, obviously, and several of the topics listed above are not nicely self-contained projects that would fit in the gsoc mold at all.

perjantai 16. helmikuuta 2018

Automatically finding slow headers in C++ projects

A common problem in older C++ codebases is that sources compile slowly due to massive header includes. Headers include other headers, which include even more headers and then, somewhere in the guts of the system, someone includes a header that is very slow to parse. Now things are slow and nobody really knows why.

Trawling through the header soup manually is not feasible. Even if you were to manually inspect the headers, it is difficult to know which are the slow ones. Educated guesses can be made, such as anything having the word "boost" in its name is slow, but this only gets you so far. Fortunately it turns out that it is fairly straightforward to write a tool to find the slow ones automatically.

We need two things to be able to reliably measure the inclusion time breakdown of the headers of any source file.

  1. The transitive list of all header files it includes.
  2. The exact compiler flags used to compile the source.
The former can be obtained from a dependency file that the compiler can be told to generate during compilation (and which almost all modern build systems use by default). The latter can be obtained from the compilation_commands database which is also generated by most build tools today. The actual algorithm is simple: for each dependency header, create a dummy cpp file that just #includes that header, compile the source and measure the time it took.

I created a repo with the measurement script and a sample project to test it on. It has one source file and a few internal headers that include external headers. Here's the top part of its output:

0.5875 ../h1.h
0.5254 /usr/include/c++/7/regex
0.2779 /usr/include/c++/7/shared_mutex
0.2747 /usr/include/c++/7/condition_variable
0.2685 ../h2.h
0.2563 /usr/include/c++/7/locale
0.2445 /usr/include/c++/7/sstream
0.2337 ../h3.h
0.2330 /usr/include/c++/7/iostream
0.2329 /usr/include/c++/7/istream

Iostream has been traditionally considered to be big, bloated and slow to compile. However in this simple example we find that shared_mutex is even slower.

There are, of course, many caveats with this method. The main one being that this does not measure the code generation time, only parsing time. These two are usually highly correlated, though.

keskiviikko 14. helmikuuta 2018

Meson's dependency manager in action building GTK

One of the greatest things about creating software is seeing other people pick it up and run with it. Here is a great example of GTK's new development experience using Meson subprojects to automatically obtain dependencies.

It is easy to see how this makes it easier for newcomers to participate. There are no longer pages upon pages of instructions on how to set up a build environment and so on. All that is required is to clone one Git repo and start building. The build system will take care of all the rest.

The eventual goal is to be able to build the entire stack fully from scratch on any platform, even Windows with the Visual Studio compiler. Unfortunately there are still a few missing features but we'll get them added at some point.

perjantai 9. helmikuuta 2018

Looking inside a Linux powered slot machine

In my day job I work as a consultant. This means that I get to see all kinds of interesting things. One of them is this piece of hardware here:


This is a slot machine as operated by Veikkaus, which is the state run corporation operating all gambling services in Finland. There are roughly 20 000 slot machines in use in Finland currently. This is interesting on its own, but things get really fun when you look on the inside.


A fair fraction of the insides is taken by machinery that deals with coins. When a coin is inserted in the machine it first goes in the coin acceptor, which is marked with a green box in the image. It detects the type of the coin. Each denomination has its own exit chute. Bad coins are rejected from the machine while sorted coins get passed into coin hoppers (marked in red).

A coin hopper is basically a bowl of coins and a mechanism that is cabable of ejecting coins from it one by one. When you think of slot machines, you are probably thinking of the sound they make when start spitting out tons of coins after a jackpot. Coin hoppers are what create that particular sound. I recommend looking up videos on Youtube if you are interested in mechanical engineering, because the way they work is kind of fascinating.

The slot machine also accepts notes and debit card payments but these are mechanically much simpler and don't take much space. The only thing remaining in the picture is the box marked in yellow. It contains the actual brains of the entire machine.

The contents of the brain

The main system is, much like everything these days, a regular computer. This specific one is a fairly average industrial PC that is running a custom version of Debian. At boot it starts up the game software that is based on a custom version of the Ogre 3D graphics engine. The computer also manages and controls all other hardware in the cabinet, such as the coin hoppers and note acceptor mentioned above, using a custom, self designed controller board. The cabinet housing the device is also custom designed and built.

Thus, surprisingly, at its core a slot machine is roughly the same as a desktop PC running desktop games with a few extra peripherals. This means is that Linux desktop gaming has been mainstream among the general Finnish population for 15 years, which is roughly the amount of time these slot machines have been deployed.

In addition to the games themselves, the development environment is also 100% Linux. As a demonstration, here is a screen shot of a development version of the software running on a developer workstation.

What about the money?

Like all forms of gambling, slot machines make quite a lot of money. The yearly profits, as of last count, were on the order of 500 million euros per year. As Veikkaus is a government run business, this money is given out to various charitable organisations as well as to the state. Given that Finland's yearly budget is on the order of 50 billion euros, this means that profits from Linux desktop gaming account for almost 1% of the entire budget of the state of Finland.

Acknowledgements

Thanks to Veikkaus for giving me permission to write this blog post. Extra special thanks for allowing to show the picture of the insides of a slot machine, which has never before been shown in public.