Friday, October 30, 2020

How to build dependencies as Meson subprojects using SDL as an example

Today we released version 0.56.0 of the Meson build system. This is an especially important release as it marks the 10 000th commit since the start of the project. A huge thank you to everyone who has contributed their time and effort, this project would not exist without all of you. However in this post we are not going to talk about that, those interested can find further details in the release notes. Instead we are going to be talking about how to build your dependencies from source on every platform without needing anything other than Meson. 

Last month I had a lightning talk at CppCon about this way of managing dependencies:

Since then there have been many improvements to the workflow for a smoother experience. To demonstrate this I upgraded the sample program to use SDL Mixer and SDL Image instead of relying on plain SDL. The code is available in this Github repo (only tested on Windows because I ran out of time to do proper multiplatform testing)  The core of the build definition is this:

sdl2_dep = dependency('sdl2')
sdl2_image_dep = dependency('sdl2_image')
sdl2_mixer_dep = dependency('sdl2_mixer')
executable('sdltestapp', 'main.cpp',
  dependencies : [sdl2_image_dep, sdl2_mixer_dep, sdl2_dep],
  win_subsystem: 'windows')

This has always worked on Linux and other platforms that provide system dependencies via Pkg-Config. As of the latest release of Meson and newest code from WrapDB, this works transparently on all platforms. Basically for every dependency you want to build yourself, you need a wrap file, like this:

The contents consist mostly of download links, hashes and build meta info. Upstream sources are not checked in your repo (unless you want to) so it remains small. The actual files are in the repository linked above. When you start building and the project needs some dependency, Meson will use info in wrap files to download, patch and build the dependencies as subprojects as needed. This is what it looks like when configuring on Windows using MSVC without any external dependency providers.

Then you can build it. Here I'm using Visual Studio Code because why not.

The end result runs directly as a native Windows application (text annotations not part of the original program) using SDL's Direct3D accelerated rendering.

There are three different source files loaded using SDL Image: a png file, a jpg file and an LZW compressed tif file. The app is also playing sound effects in Ogg Vorbis and wav formats using SDL Mixer. Dependencies of dependencies work automatically as one would expect. All dependency libraries are statically linked as it is the easiest way to get things working under Windows (at least until your project gets too big).

If you want to try it yourself, here's what you need to do:

  1. Install Visual Studio, Meson and Ninja (not strictly required, you can use the VS backend instead if you wish)
  2. Open the VS x64 dev tools shell.
  3. Clone the repo and cd inside it.
  4. Run meson build
  5. Run ninja -C build
The executable should now be at the root of the build dir ready to run.

Contributing to the WrapDB

The biggest problem facing WrapDB has at the moment is its selection of available libraries is fairly small. Fortunately it is easy to contribute new dependencies. There are two main cases.

Submitting projects that already build with Meson

  1. Make sure your project gets all its dependencies via the dependency function rather than invoking subproject directly and provide a suitable dependency object via declare_dependency.
  2. Create an upstream.wrap file with the necessary info. See the documentation or, if you prefer examples, how it is done in libtiff.
  3. Request that your project be added to WrapDB as described in the documentation linked above.
That's pretty much it. Once your project is in the WrapDB it can be immediately used by any other Meson project.

Contributing projects that do not build with Meson

The basic outline is the same as above, except that first you need to rewrite the project's build system in Meson. I'm not going to lie: sometimes this is a fair bit of work. Sometimes it is not the most pleasant thing in the world. No pain, no gain and all that.

Given that there are thousands upon thousands of projects available, which ones should be converted first? The most obvious answer is the ones that you personally need, because then you are doing work that directly benefits yourself. After that it gets trickier. One good approach is to look at established cross platform open source projects. Many of them have a directory where they store copies of their third party dependencies. Any libraries there are most likely needed by other programs as well. Here are links to some of them:

Monday, October 26, 2020

The Meson Manual: Good News, Bad News and Good News

 Starting with good news, the Meson Manual has been updated to a third edition. In addition to the usual set of typo fixes, there is an entirely new chapter on converting projects from an existing build system to Meson. Not only are there tips and tricks on each part of the conversion, there is even guidance on how to get it done on projects that are too big to be converted in one go.

Unfortunately there are also bad news, which boils down to this graph.

This is the monthly sales amounts (in euros) since the beginning of this year. As you can tell it follows a typical exponential decay with a peak at the beginning and then a steady decline. Sales for this month are expected to be noticeably smaller than last month. Unfortunately keeping online sales for a product like this ongoing requires both time and money (to pay for the various services needed). Unfortunately the numbers show that this is not financially worthwhile.

Thus I must inform everyone that the manual will be discontinued. You can still buy it and the download links will stay valid until the end of this year. Thank you to everyone who bought the book and especially to those who sent me feedback and typo fixes.

To end on a positive note, there has been a final price reduction and the manual can now be bought for just €19.95.

FAQ: Will the manual be available under a free license after sales end?

No. At least not in the foreseeable future.

Tuesday, October 20, 2020

Cargo-style dependency management for C, C++ and other languages with Meson

My previous blog post about modern C++ got a surprising amount of feedback. Some people even reimplemented the program in other languages, including one in Go, two different ones in Rust and even this slightly brain bending C++ reimplementation as a declarative style pipeline. It also got talked about on Reddit and Hacker news. Two major comments that kept popping up were the following.

  1. There are potential bugs if the program is ever extended to support non-ASCII text
  2. It is hard to use external dependencies in C++ (and by extension C)
Let's solve both of these problems at the same time. There are many "lite" solutions for doing Unicode aware text processing, but we're going to use the 10 000 kilogram hammer: International Components for Unicode. The actual code changes are not that big, interested parties can go look up the details in this Github repo. The thing that is relevant for this blog post is the build and dependency management. The Meson build definition is quite short:

project('cppunicode', 'cpp',
        default_options: ['cpp_std=c++17',
                          'default_library=static'])
             
icu_dep = dependency('icu-i18n')
thread_dep = dependency('threads')
executable('cppunicode', 'cppunicode.cpp',
           dependencies: [icu_dep, thread_dep])

The threads dependency is for the multithreaded parts (see end of this post). I developed this on Linux and used the convenient system provided ICU. Windows and Mac do not provide system libs so we need to build ICU from scratch on those platforms. This is achieved by running the following command in your project's source root:

$ meson wrap install icu
Installed icu branch 67.1 revision 1

This contacts Meson's WrapDB server and downloads build definition files for ICU. That is all you need to do. The build files do not need any changes. When you start building the project, Meson will automatically download and build the dependency. Here is a screenshot of the download step:

Here it is building in Visual Studio:

And here is the final built result running on macOS:

One notable downside of this approach is that WrapDB does not have all that many packages yet. However I have been told that given the next Meson release (in a few weeks) and some upstream patches, it is possible to build the entire GTK widget toolkit as a subproject, even on Windows. 

If anyone wants to contribute to the project, contributions are most welcome. You can for example convert existing projects and submit them to wrapdb or become a reviewer. The Meson web site has the relevant documentation

Appendix: making it parallel

Several people pointed out that while the original program worked fine, it only uses one thread. This may be a bottleneck and that "in C++ it is hard to execute work in parallel". This is again one of those things that has gotten a lot better in the last few years. The "correct" solution would be to use the parallel version of transform_reduce. Unfortunately most parallel STL implementations are still in the process of being implemented so we can't use those in multiplatform code. We can, however, roll our own fairly easily, without needing to create or lock a single mutex by hand. The code has the actual details, but the (slightly edited) gist of of it is this:

for(const auto &e:
    std::filesystem::recursive_directory_iterator(".")) {
    futures.emplace_back(std::async(std::launch::async,
                                    count_word_files,
                                    e));
    if(futures.size() > num_threads) {
        pop_future(word_counts, futures);
    }
}
while(!futures.empty()) {
   pop_future(word_counts, futures);
}

Here the count_word_files function calculates the number of words in a single file and the pop_future function joins individual results to the final result. By using a share-nothing architecture, pure functions and value types all business logic code can be written as if it was single threaded and the details of thread and mutex management can be left to library code. Haskell fans would be proud (or possibly horrified, not really sure).

Thursday, October 15, 2020

Does C++ still deserve the bad rap it has had for so long?

Traditionally C++ has been seen by many (and you know who you are) as just plain bad: the code is unreadably verbose, error messages are undecipherable, it's unsafe, compilation takes forever and so on. In fact making fun of C++ is even a fun pastime for some. All of this was certainly true in the 90s and even as recent as 10 years ago. But is it still the case? What would be a good way to determine this?

A practical experiment

Let's write a fairly simple program that solves a sort-of-real-worldish problem and see what comes out of it. The problem we want to solve is the following:
Find all files with the extension .txt recursively in the subdirectories of the current directory, count the number of times each word appears in them and print the ten most common words and the number of times they are used.

We're not going to go for extreme performance or handling all possible corner cases, instead going for a reasonable implementation. The full code can be found by following this link.

The first thing we need is a way to detect files with a given extension and words within text. This calls for regular expressions:

const std::regex fname_regex("\\.txt$", std::regex_constants::icase);

const std::regex word_regex("([a-z]{2,})", std::regex_constants::icase);

This might a bit verbose, but quite readable. This only works for ASCII text, but for the purposes of this experiment it is sufficient. To calculate the number of times each word is seen, we need a hash table:

std::unordered_map<std::string, int> word_counts;

Now we need to iterate over all files in the current directory subtree.

for(const auto &e: std::filesystem::recursive_directory_iterator("."))

Skip everything that is not a .txt file.

if(!e.is_regular_file()) {
    continue;
}
if(!std::regex_search(e.path().c_str(), fname_regex)) {
    continue;
}

Process the file line by line:

std::ifstream current_file(e.path());
for (std::string line; std::getline(current_file, line); )

For each line we need to find all words that match the regular expression.

std::sregex_iterator word_end;
for(auto it = std::sregex_iterator(line.begin(), line.end(), word_regex); it != word_end; ++it)

This is a bit verbose, but it takes care of all the state tracking needed to run the regex multiple times on the same input string. Doing the same by hand is fairly tedious and error prone. Now that we have the matches, we need to convert them to standalone lowercase words.

std::string word{it->str()};
for(auto &c : word) {
    c = std::tolower(c);
}

Lowercasing strings is a bit cumbersome, granted, but this is at least fairly readable. Then on to incrementing the word count.

++word_counts[word];

That's all that is needed to count the words. This can't be done directly in the hash map so we need to convert the data to an an array. It needs some setup code.

struct WordCount {
    std::string word;
    int count;
};

std::vector<WordCount> word_array;
word_array.reserve(word_counts.size());

Since we know how many entries will be in the array, we can reserve enough space for it in advance. Then we need to iterate over all entries and add them to the array.

for(const auto &[word, count]: word_counts) {
    word_array.emplace_back(WordCount{word, count});
}

This uses the new structured bindings feature, which is a lot more readable than the old way of dealing with iterator objects or pairs with their firsts and their seconds and all that horror. Fortunately no more.

The simple way of getting the 10 most used entries is to sort the array and then grab the 10 first elements. Let's do something slightly swaggier instead [0]. We'll do a partial sort and discard all entries after 10. For that we need a descending sorting criterion as a lambda.

auto count_order_desc = [](const WordCount &w1, const WordCount &w2) { return w2.count < w1.count; };

Using it is straightforward.

const auto final_size = std::min(10, (int)word_array.size());
std::partial_sort(word_array.begin(), word_array.begin() + final_size, word_array.end(), count_order_desc);
word_array.erase(word_array.begin() + final_size, word_array.end());

All that remains is to print the final result.

for(const auto& word_info: word_array) {
    std::cout << word_info.count << " " << word_info.word << "\n";
};

Safety

As safety and security are important features in current sofware development, let's examine how safe this program is. There are two main safety points: thread safety and memory safety. As this program has only one thread, it can be formally proven to be thread safe. Memory safety also divides into two main things to note: use after frees (or dangling references) and out of bound array accesses.

In this particular program there are no dangling references. All data is in value types and the only references are iterators. They are all scoped so that they never live past their usages. Most are not even stored into named variables but instead passed directly as function arguments (see especially the call to partial_sort). Thus they can not be accessed after they have expired. There is no manual resource management of any kind. All resources are freed automatically. Running the code under Valgrind yields zero errors and zero memory leaks.

There is one place in the code where an out-of-bounds access is possible: when creating the iterator pointing to 10 elements past the beginning of the array [2]. If you were to forget to check that the array has at least 10 elements, that would be an immediate error. Fortunately most standard libraries have a way to enable checks for this. Here's what GCC reports at runtime if you get this wrong:

Error: attempt to advance a dereferenceable (start-of-sequence) iterator 100000 steps, which falls outside its valid range.

Not really a perfect safety record, but overall it's fairly good and certainly a lot better than implementing the same by manually indexing arrays.

Problems

Compiling the program takes several seconds, which is not great. This is mostly due to the regex header, which is known to be heavyweight. Some of the compiler messages encountered during development were needlessly verbose. The actual errors were quite silly, though, such as passing arguments to functions in the wrong order. This could definitely be better. It is reported that the use of concepts in C++20 will fix most of this issue, but there are no practical real-world usage reports yet.

Unique features

This code compiles, runs and works [1] on all major platforms using only the default toolchain provided by the OS vendor without any external dependencies or downloads. This is something that no other programming language can provide as of date. The closest you can get is plain C, but its standard library does not have the necessary functionality. Compiling the code is also simple and can be done with a single command:

g++ -o cppexample -O2 -g -Wall -std=c++17 cppexample.cpp

Conclusions

If we try to look at the end result with objective eyes it would seem to be fairly nice. The entire implementation takes fewer than 60 lines of code. There's nothing immediately terrible that pops up. What is happening at each step is fairly understandable and readable. There's nothing that one could immediately hate and despise for being "awful", as is tradition.

This is not to say you could still not hate C++. If you want to, go right ahead. There are certainly reasons for it. But if you choose to do that, make sure you hate it for real actual reasons of today, rather than things that were broken decades ago but have been fixed since.

[0] There is an even more efficient solution using a priority queue that does not need the intermediate array. Implementing it is left as an exercise to the reader.
[1] Reader beware, I have only proven this code to be portable, not tried it.
[2] The implementation in [0] does not have this problem as no iterators are ever modified by hand.

Tuesday, October 6, 2020

Is your project a unique snowflake or do you just think that it is?

If there is a national sport for programmers, it is reinventing existing things from scratch. This seems to be especially common inside corporations that tend to roll their own rather than, for example, using an existing open source library. If you try to ask around why this has been done, you typically get some variation of this:
We can't use off-the-shelf solutions because we have unique needs and requirements that no-one else has.

Now, there genuinely are cases where this is true, but it mostly happens only in very spesialised cases, such as when you are google-scale, do space engineering or something similar. There might also be legal or regulatory reasons that you must own all code in a product. Most projects are not like this. In fact almost always someone (typically hundreds of someones) has had the exact same problem and solved it. Yet people seem to keep reinventing the wheel for no real purpose. If you ever find yourself in a debate on why people should use existing solutions rather than roll your own, here are some talking points to consider.

How have you established that your needs are actually unique?

Very often when someone says "the needs for <project X> are unique" what they are actually saying is "this needs a feature I have never personally encountered before". This is not due to malice, it is just how human beings seem to behave. If one has a rough idea of how to solve a problem by coding it from scratch but no knowledge of an existing solution, people tend to go with the former rather than spend time on researching the problem. I don't know why. This is especially true if there is external pressure on "getting something going fast".

I know this, because I have actually done it myself. Ages ago I was working on some Java code and needed to communicate with a different process. I was starting to write a custom socket protocol when my coworker asked why am I not simply using regular HTTP with Jetty. The answer, to the surprise of absolutely no-one, is that I had never heard of it. This single sentence saved days of implementation work and probably the same amount in ongoing maintenance and debugging work.

We need more performance than existing solutions can provide

This splits nicely into two parts. In the first one you actually need the performance. This is rare, but it happens. Even so, you might consider optimizing the existing solution first. If you are successfull, submit the changes upstream. This way they benefit everyone and reduce the ongoing maintenance burden to roughly zero.

The other case is that all the performance you think you need, you don't actually need. This is surprisingly common. It turns out that modern computers and libraries are so fast that even "reasonably ok" is performant enough most of the time for most problems [0]. As an example I was working on a product that chose to roll its own functionality X rather than using the standard solution in Linux because it needed to do "thousands of operations per second". These operations were only done in response to human interaction, so there was no way it could generate more than 10 or so events per second. But still. Must have performance, so multiple months of hand-rolling it was.

To add insult to injury, I did some measurements and it turned out that the existing solution could do one thousand operations per second without even needing to go to C. The Python bindings were fast enough.

We need the ability to change the implementation at will

One major downside of using an existing open source solution is that you have to give up some control. Changes must be submitted to people outside your immediate control. They might say no and you can't override them with office politics. In-house code does not have this problem.

This control comes with a significant downside. What fairly often happens is that some developers write and maintain the new functionality and everything is rosy. Then time happens and the original coders switch teams and jobs, become managers and so on. They take the knowledge required to change the system with them. Unless your company has a very strong culture of transferring this knowledge, and most companies most definitely don't, the code will bitrot, fossilize and become a massive clump of legacy that must not be touched because there is no documentation, no tests and nobody really understands it any more.

So even if the corporation has the possibility to change the code at will they lack the ability to do so.

Sometimes people just plain lie

There are two main reasons that explain pretty much everything what people do, say and recommend:
  1. The reason they give to others
  2. The real reason
Even for "purely technical" decisions like these, the two can be drastically different. Suppose some developer has done several similar "boring" projects. Then when the time comes to choose the approach for the new project. If they choose an existing system, they know that the next project will be just like the preceding five. If, on the other hand, some new tech is chosen, then the entire task becomes more meaningful and exciting. In this case the stated reason is technical fitness, but the real reason is frustration. Another way of stating this is that when asked "should we use library X or write our own from scratch" the answer the coder would want to give is "I don't want to work on that project". 

This is not something that people won't say out loud because it gets you easily labeled as a "prima-donna", "work avoider" or "elitist". Many people will also have trouble saying it because it comes from a "bad" emotional response. All of this leads to shame and people are willing to go to great lengths to avoid feeling ashamed.

The same reasoning applies to other "impure" emotions such as infatuation ("OMG this new tech is so cool"), disgust ("using tech X means I have to work with Bob from department X"), infidelity ("either you give me this or I'm accepting the job offer from a competitor"), megalomania ("creating this new framework will make me a superstar celebrity coder") and the ever-popular self-disappointment ("I got into programming because I wanted to create new things like an architect, but ended up as a code janitor").