Monday, December 21, 2015

Always use / for directory separation

It is common wisdom that '\' is the path separator on Windows and '/' is the separator on unixlike operating systems. It is also common wisdom that you should always use the native path separator in your programs to minimise errors.

The first one of these is flat out wrong and the second one can be questioned.

On Windows both '\' and '/' work as directory separators. You can even mix them within one path. The end result is that '/' is the truly universal directory separator, it works everywhere (except maybe classic Mac OS, which is irrelevant).

So the question then is whether you should use '\' or '/' when dealing with paths on Windows. I highly recommend always using '/'. The reason for this is a bit unexpected but makes perfect sense once you get it.

The difference between these two characters is that '\' is a quoting character whereas '/' is not. You probably have to pass your path strings through many systems (bat files, persist to disk, files, sql, etc) using libraries and frameworks outside your control. They will most likely have bugs in the way they handle quotes. You may need to quote your quote chars to get them to work. You might need to quote them differently in different parts of your program. Quoting issues are the types of bugs that strike you at random, usually in production and are really difficult to hunt down.

So always use '/' if possible. It will survive any serialisation/deserialisation combination without problems since '/' is not a special character.

Friday, November 27, 2015

Build speed comparison on the Raspberry Pi 2

A common misunderstanding is that all build systems are roughly as fast. This is not the case, but instead the differences can be dramatical, especially on low end hardware such as the Raspberry Pi. To demonstrate I did some benchmarks. The code can be found at github.

The program is a network debugging tool, but we don't care what it actually does. The reason it is used in this test is that it is a nice standalone executable implemented in Qt5 so compiling it takes a fair bit of CPU and it should be a fairly good example of real world applications.

The repository contains build definitions for Meson and CMake. Since the latter is currently more popular let's use it as a baseline. Compiling the application on the Raspberry Pi 2 with the Ninja backend takes 2 minutes 20 seconds.

Compiling the same project with Meson takes 1 minute and 36 seconds. That is over 30% faster. The reason for this is that Meson has native support for precompiled headers. CMake does not have this feature and, according to CMake's bugzilla, it will not have it in the future, either.

But let's not stop there. Meson also has native support for unity builds. The core principle is simple. Instead of compiling all files one by one, create a top level file that #includes all of them and compile only that. This can make things dramatically faster when you compile everything, but the downside is that it slows down incremental builds quite a lot.

A Meson unity build of the test application takes only 39 seconds, which is over 70% faster than plain CMake.

With this simple experiment we find that a build system can have a massive impact on your project with zero code changes.

Sunday, November 15, 2015

Solving the C/C++ dependency problem through build system composability

The perennial problem in C and C++ development is dealing with dependencies. It is relatively straightforward on Linux where you have package managers (assuming your dependency is packaged in the distro you are using) but becomes a whole lot of work the second you step outside this zone. To develop native Windows applications, for example, the best solution people have come up with is copying the source code of dependencies to your own project and hoping for the best. Unfortunately wishful thinking is not a particularly efficient engineering method.

The main reason why dependency management is simple on Linux is a program called pkg-config. It is really simple, as most good solutions are, and basically just says that "to use dependency foo, use these flags to compile and these other flags to link". This makes all libraries that provide pkg-config definitions composable. That is, you can use them in a very simple way without knowing anything about their internals and you can mix and match them in any combination with any other library that provides a pkg-config definition.

It would be really nice if you could use pkg-config everywhere but unfortunately this is not possible. In order to do pkg-config really well, the support for it must come from the basic platform. That is, from the Visual Studios and XCodes out there. They have not provided such support at the moment and thus support is unlikely to appear in the near future. There are also some technical hurdles, especially on Windows with its multitude of incompatible binary ABIs.

Since we can't have pkg-config proper, is it possible to achieve the same result with a different kind of mechanism? It turns out that this is indeed possible and has been under development for quite a while in the Meson build system and its Wrap dependency system. The implementation details are slightly involved so for the purposes of this article the approach taken has been reduced to two main points.

The first one is that you can take any project that uses the Meson build system and then run it inside another Meson project. The subproject runs in a sandbox but the master project can use artifacts from it as if they were a native part of the master project. And yes, this can even be done recursively so subprojects can also use subprojects, but it goes even deeper than that. If two projects use the same subproject, the system guarantees that they both use the same subproject instance so it is compiled only once.

The second point is what you might call an "internal pkg-config". It encapsulates the same information as pkg-config proper, that is, what compiler flags to use and what libraries to link against.

With these two piece we have achieved the same level of composability as pkg-config proper without using pkg-config itself. We can now take our dependency projects and use them inside any other project on any platform because we do not depend on any piece of system infrastructure any more. We can choose to build and link all our dependencies statically for platforms such as iOS that require this. There is even a repository of Meson build definitions for open source libraries that you can easily use on your projects. There are not a whole lot of projects available yet, but anyone is free to submit more.

Talk is cheap, show me the code!

As a sample project we created a simple app using the SDL 2 library. It animates some images on the screen and plays sound effects when keys are pressed. It embeds all its images and sounds and statically links all its dependencies producing a single standalone exe file. Here is a screen shot of it running on OSX.

This project is available on github. The most interesting bit about it is probably the build definition file. The whole definition, which takes care of embedding SDL2, converting resources into C++ source, setting up all build flags for OSX, Windows, Linux etc, contains 33 lines of code, of which only three lines are dedicated to setting up the subproject (see lines 6-8).

If you were to build the same application with currently established methods, you would need to spend a fair bit of time downloading dependencies for each target platform, installing them, setting up build flags, debugging weird issues since you got something wrong and all that jazz. Using Meson you can just write a few declarations and let the build system deal with all the boring legwork including downloading and extracting all dependency projects.

That is the power of composability.

Wednesday, November 4, 2015

A difference in complexity

A common task in build systems is to run a custom command. As an example you might want to do this to regenerate a the build definition or run a tool such as clang-format. Different backends have different ways of specifying this.

Let's look at the difference between two such systems to see if we can learn something about design, simplification and other such things. First we examine the way you would define this in the Ninja build system. It looks roughly like this.

build mycommand: CUSTOM_COMMAND PHONY
 COMMAND = do_something.py
 description = 'Doing something'


All right. This seems reasonable. Now let us see how you would do the exact same thing in MSBuild.

<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
 <ItemGroup Label="ProjectConfigurations">
   <ProjectConfiguration Include="Debug|Win32">
     <Configuration>debug</Configuration>
     <Platform>Win32</Platform>
   </ProjectConfiguration>
 </ItemGroup>
 <PropertyGroup Label="Globals">
   <ProjectGuid>{1C3873C1-1796-4A8A-A7B2-CA9FA6068A1A}</ProjectGuid>
   <Keyword>Win32Proj</Keyword>
   <Platform>Win32</Platform>
   <ProjectName>something</ProjectName>
 </PropertyGroup>
 <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
 <PropertyGroup Label="Configuration">
   <ConfigurationType />
   <CharacterSet>MultiByte</CharacterSet>
   <UseOfMfc>false</UseOfMfc>
 </PropertyGroup>
 <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
 <PropertyGroup>
   <_ProjectFileVersion>10.0.30319.1</_ProjectFileVersion>
   <OutDir>.\</OutDir>
   <IntDir>test-temp\</IntDir>
   <TargetName>something</TargetName>
 </PropertyGroup>
 <ItemDefinitionGroup>
   <Midl>
     <AdditionalIncludeDirectories>%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
     <OutputDirectory>$(IntDir)</OutputDirectory>
     <HeaderFileName>%(Filename).h</HeaderFileName>
     <TypeLibraryName>%(Filename).tlb</TypeLibraryName>
     <InterfaceIdentifierFilename>%(Filename)_i.c</InterfaceIdentifierFilename>
     <ProxyFileName>%(Filename)_p.c</ProxyFileName>
   </Midl>
   <PostBuildEvent>
     <Message />
     <Command>setlocal
"c:\Python34\python3.exe" do_something.py
if %errorlevel% neq 0 goto :cmEnd
:cmEnd
endlocal & call :cmErrorLevel %errorlevel% & goto :cmDone
:cmErrorLevel
exit /b %1
:cmDone
if %errorlevel% neq 0 goto :VCEnd</Command>
   </PostBuildEvent>
 </ItemDefinitionGroup>
 <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
</Project>


Here endeth the lesson.

Saturday, October 31, 2015

Shell script or Python?

Every now and then a discussion arises on whether some piece of scripting functionality should be written as a basic shell script or in Python. Here is a handy step-by-step guide to answering the question. Start at the beginning and work your way down until you have your answer.

Will the script be longer than one screenful of text (80 lines)? If yes ⇒ Python.

Will the script ever need to be run on a non-unix platform such as Windows (including Cygwin)? If yes ⇒ Python.

Does the script require command line arguments? If yes ⇒ Python.

Will the script do any networking (e.g. call curl or wget)? If yes ⇒ Python.

Will the script use any functionality that is in Python standard library but is not available as a preinstalled binary absolutely everywhere (e.g. compressing with 7zip)? If yes ⇒ Python.

Will the script use any control flow (functions/while/if/etc)? If yes ⇒ Python.

Will the script have modifications in the future rather than being written once and then dropped? If yes ⇒ Python.

Will the script be edited by more than one person? If yes ⇒ Python.

Does the script call into any non-trivial Unix helper tool (awk, sed, etc)? If yes ⇒ Python.

Does the script call into any tool whose behaviour is different on different platforms (e.g. mkdir -p, some builtin shell functionality)? If yes ⇒ Python.

Does the script need to do anything in parallel? If yes ⇒ Python.

If none of the above match then you have a script that proceeds in a straight line calling standard and simple Unix commands and never needs to run on a non-unix platform. In this case a shell script might be ok. But you might still consider doing it in Python, because scripts have a tendency to become more and more complex with time.

Friday, October 30, 2015

Proposal for launching standalone apps on Linux

One nagging issue Linux users often face is launching third party applications, that is, those that do not come from package repositories. As an example the Pitivi video editor provides daily bundles that you can just download and run . Starting it takes a surprising amount of effort and may require going to the command line (which is an absolute no-no if your target audience includes regular people).

This page outlines a simple, distro agnostic proposal for dealing with the problem. It is modelled after OSX application bundles (which are really just directories with some metadata) and only requires changes to file managers. It is also a requirement that the applications must be installable and runnable without any priviledge escalation (i.e. does not require root).

The core concept is an application bundle. The definition of an app bundle named foobar is a subdirectory named foobar.app which contains a file called foobar.desktop. There are no other requirements for the bundle and it may contain arbitrary files and directories.

When a file manager notices a directory that matches the above two requirements, and thus is a bundle, it must treat it as an application. This means the following:


  • it must display it as an application using the icon specified in the desktop file rather than as a subdirectory
  • when double clicked, it must launch the app as specified by the desktop file rather than showing the contents of the subdir
These two are the only requirements to get a working OSX-like app bundle going on Linux. There can of course be more integration, such as offering to install an app to ~/Applications when opening a zip file with an app bundle or registering the desktop file to the system wide application list (such as the dash on Ubuntu).

Monday, October 12, 2015

Some comments on the C++ modules talk

Gabriel Dos Reis had a presentation on C++ modules at cppcon. The video is available here. I highly recommend that you watch it, it's good stuff.

However as a build system developer, one thing caught my eye. The way modules are used (at around 40 minutes in the presentation) has a nasty quirk. The basic approach is that you have a source file foo.cpp, which defines a module Foobar. To compile this you should say this:

cl -c /module foo.cxx

The causes the compiler to output foo.o as well as Foobar.ifc, which contains the binary definition of the module. To use this you would compile a second source file like this:

cl -c baz.cpp /module:reference Foobar.ifc

This is basically the same way that Fortran does its modules and it suffers from the same problem which makes life miserable for build system developers.

There are two main reasons. One: the name of the ifc file can not be known beforehand without scanning the contents of source files. The second is that you can't know what filename to give to the second command line without scanning it to see what imports it uses _and_ scanning potentially every source file in your project to find out what file actually provides it.

Most modern build systems work in two phases. First you parse the build definition and determine how and which order to do individual build steps in. Basically it just serialises the dependency DAG to disk. The second phase loads the DAG, checks its status and takes all the steps necessary to bring the build up to date.

The first phase of the two takes a lot more effort and is usually much slower than the second part. A typical ratio for a medium project is that first phase takes roughly ten seconds of CPU time and the second step takes a fraction of a second. In contemporary C++ almost all code changes only require rerunning the second step, whereas changing build config (adding new targets etc) requires doing the first step as well.

This is caused by the fact that output files and their names are fully knowable without looking at the contents of the source files. With the proposed scheme this no longer is the case. A simple (if slightly pathological) example should clarify the issue.

Suppose you have file A that defines a module and file B that uses it. You compile A first and then B. Now change the source code so that the module definition goes to B and A uses it. How would a build system know that it needs to compile B first and only then A?

The answer is that without scanning the contents of A and B before running the compiler this is impossible. This means that to get reliable builds either all build systems need to grow a full C++ parser or all C++ compilers must grow a full build system. Neither of these is particularly desirable. Even if build systems got these parsers they would need to reparse the source of all changed files before starting the compiler and it would need to change the compiler arguments to use. This makes every rebuild take the slow path of step one instead of the fast step two.

Potential solutions


There are a few possible solutions to this problems, none of which are perfect. The first is the requirement that module Foo must be defined in a source file Foo.cpp. This makes everything deterministic again but has that iffy Java feeling about it. The second option is to define the module in a "header-like" file rather than in source code. Thus a foo.h file would become foo.ifc and the compiler could pick that up automatically instead of the .h file.