Nibble Stew

Sunday, April 19, 2020

Do humans or compilers produce faster code?

Modern optimizing compilers are truly amazing. They have tons and tons of tricks to make even crappy code run incredibly fast. Faster than most people could write by hand. This has lead some people to claim that program optimization is something that can be left to compilers, as they seem to be a lot better at it. This usually sparks a reply from people on the other end of the spectrum that say that they can write faster code by hand than any compiler which makes compilers mostly worthless when performance actually matters.

In a way both of these viewpoints are correct. In another way they are both wrong. To see how, let's split this issue into two parts.

A human being can write faster code than a compiler for any given program

This one is fairly easy to prove (semi)formally. Suppose you have a program P written in some programming language L that runs faster than any hand written version. A human being can look at the assembly output of that program and write an equivalent source version in straight C. Usually when doing this you find some specific optimization that you can add to make the hand written version faster.

Even if the compiler's output were proved optimal (such as in the case of superoptimization), it can still be matched by copying the output into your own program as inline assembly. Thus we have proven that for any program humans will always be faster.

A human being can not write faster code than a compiler for every program

Let's take something like Firefox. We know from the previous chapter that one could eschew complex compiler optimizations and rewrite it in straight C or equivalent and end up with better performance. The downside is that you would die of old age before the task would be finished.

Human beings have a limited shelf life. There are only so many times they can press a key on the keyboard until they expire. Rewriting Firefox to code that works faster with straight C than the current version with all optimizations enabled is just too big of a task.

Even if by some magic you could do this, during the rewrite the requirements on the browser would change. A lot. The end result would be useless until you add all the new functionality that was added since then. This would lead to eternal chasing of the tail lights of the existing project.

And even if you could do that, optimizing compilers keep getting better, so you'd need to go through your entire code base regularly and add the same optimizations by hand to keep up. All of these things could be done in theory, but they completely impossible in practice.

The entire question poorly posed

Asking whether compilers and humans write faster code is kind of like asking which one is "bluer", the sea or the sky. Sure you could spend years debating the issue on Twitter without getting anywhere, but it's not particularly interesting. A more productive way is to instead ask the question "given the requirements, skills and resources I have available, should I hand-optimize this particular routine or leave it to the compiler".

If you do this you rediscover the basic engineering adage: you get the best bang for the buck by relying on the compiler by default and doing things by hand only for bottlenecks that are discovered by profiling the running application.

PS. Unoptimized C is slow, too

Some people think that when they write C it is "very close" to the underlying assembly and thus does not benefit much from compiler optimizations. This has not been true for years (possibly decades). The performance difference between no optimization and -O2 can be massive, especially for hot inner loops.

When people say that they can write code that is faster than compiler optimized version of the same algorithm in some other language, that is not what they are actually saying. Unless they are writing 100% pure ASM by hand [0] that is not what they are saying. Instead they are saying "I can take any algorithm implementation, write it with an alternative syntax and, when both of these are run through their respective optimizing compilers, end up with a program that runs faster".

[0] Which does happen sometimes, especially for SIMD code.

Saturday, April 11, 2020

Your statement is 100% correct but misses the entire point

Let's assume that there is a discussion going on on the Internet about programming languages. One of the design points that come up is a garbage collector. One participant mentions the advantages of garbage collection with something like this:

Garbage collectors are nice and save a lot of work. If your application does not have strict latency requirements, not having to care about memory management is liberating and can improve developer efficiency by a lot.

This is a fairly neutral statement that most people would agree with, even if they work on code that has strict real time requirements. Yet, inevitably, someone will present this counterpoint.

No! If you have dangling references memory is never freed and you have to fix that by doing manual memory management anyway. Garbage collectors do not magically fix all bugs.

If you read through the sentences carefully you'll notice that every asserted statement in it is true. That is what makes it so frustrating to argue against. Most people with engineering backgrounds are quite willing to admit they are wrong when presented with evidence that their statements are not correct. This does not cover everyone, of course, as some people are quite willing to violently disagree with any and all facts that are in conflict with their pre-held beliefs. We'll ignore those people for the purpose of this post.

While true, that single sentence ignores all of the larger context of the issue, which contains points like the following:

Dangling reference out of memory errors are rare (maybe 1 in 10 programs?) whereas regular memory bugs like use-after-free, double free, off by one errors etc are very common (100-1000 in every program).
Modern GCs have very good profilers, finding dangling references is a lot easier than debugging stack corruptions.
Being able create things on a whim and just drop them to the floor makes programmers a lot more productive than forcing them to micromanage the complete life cycle of every single resource.
Even if you encounter a dangling reference issue, fixing it probably takes less time than would have gone to fixing memory corruption issues in a GCless version of the same app.

In brief, the actual sentence is true but misses the entire point of the comment they are replying to. This is sadly common on Internet debates. Let's see some examples.

Computer security

A statement like this:

Using HTTPS on all web traffic is good for security and anonymity.

might be countered with something like this:

That provides no real security, if the NSA want your data they will break into your apartment and get it.

This statement is again absolutely true. On the other hand if you are not the leader of a nation state or do regular business with international drug cartels, you are unlikely to be the target of a directed NSA offensive.

If you think that this is a stupid point that nobody would ever make, I agree with you completely. I have also seen it used in the real world. I wish I hadn't.

Bugs are caused by incompetents

High level programming languages are nice.

Programming languages that guard against buffer overruns is great for security and ease of development.

But not for everyone.

You can achieve the exact same thing in C, you just have to be careful.

This is again true. If every single developer on a code base is being 100% focused and 100% careful 100% of the time, then bug free code is possible. Reality has shown time and time again that it is not possible, human beings are simply not capable of operating flawlessly for extended periods of time.

Yagni? What Yagni?

There's the simple.

Processing text files with Python is really nice and simple.

And not so simple.

Python is a complete joke, it will fail hard when you need to process ten million files a second on an embedded microcontroller using at most 2 k of RAM.

Yes. Yes it does. In that use case it would be the wrong choice. You are absolutely correct. Thank you for your insight, good sir, here is a shiny solid gold medal to commemorate your important contribution to this discussion.

What could be the cause of this?

The one thing that school trains you for is that being right is what matters. If you get the answers right in your test, then you get a good grade. Get them wrong and you don't. Maybe this frame of mind "sticks on" once you leave school, especially given that most people who post these kinds of comments seem to be from the "smarter" end of the spectrum (personal opinion, not based on any actual research). In the real world being right is not a merit by itself. In any debate being right is important, of course, but the much more important feature is being relevant. That requires understanding the wider context and possibly admitting that something that is the most important thing in the world to you personally, might be completely irrelevant for the issue at hand.

Being right is easy. Being relevant is extremely difficult.

Sunday, April 5, 2020

Meson manual sales status and price adjustment

The sales dashboard of the Meson manual currently looks like this.

It splits up quite nicely into three parts. The first one is the regular sales from the beginning of the year, which is on average less than one sale per day.

The second part (marked with a line) indicates when I was a guest on CppCast talking about Meson and the book. As an experiment I created a time limited discount coupon so that all listeners could buy it with €10 off. As you can tell from the graph it did have an immediate response, which again proves that marketing and visibility are the things that actually matter when trying to sell any product.

After that we have the "new normal", which means no sales at all. I don't know if this is caused by the coronavirus isolation or whether this is the natural end of life for the product (hopefully the former but you can never really tell in advance).

Price reduction

Thus, effective immediately, the price of the book has been reduced to €24.95. You can purchase it from the official site.

Thursday, March 26, 2020

It's not what programming languages do, it's what they shepherd you to

How many of you have listened, read or taken part in a discussion about programming languages that goes like the following:

Person A: "Programming language X is bad, code written in it is unreadable and horrible."

Person B: "No it's not. You can write good code in X, you just have to be disciplined."

Person A: "It does not work, if you look at existing code it is all awful."

Person B: "No! Wrong! Those are just people doing it badly. You can write readable code just fine."

After this the discussion repeats from the beginning until either one gets fed up and just leaves.

I'm guessing more than 99% of you readers have seen this, often multiple times. The sad part of this is that even though this thing happens all the time, nobody learns anything and the discussion begins anew all the time. Let's see if we can do something about this. A good way to go about it is to try to come up with a name and a description for the underlying issue.

shepherding An invisible property of a progamming language and its ecosystem that drives people into solving problems in ways that are natural for the programming language itself rather than ways that are considered "better" in some sense. These may include things like long term maintainability, readability and performance.

This is a bit abstract, so let's look at some examples.

Perl shepherds you into using regexps

Perl has several XML parsers available and they are presumably good at their jobs (I have never actually used one so I wouldn't know). Yet, in practice, many Perl scripts do XML (and HTML) manipulation with regexes, which is brittle and "wrong" for lack of a better term. This is a clear case of shepherding. Text manipulation in Perl is easy. Importing, calling and using an XML parser is not. And really all you need to do is to change that one string to a different string. It's tempting. It works. Surely it could not fail. Let's just do it and get on with other stuff. Boom, just like that you have been shepherded.

Note that there is nothing about Perl that forces you to do this. It provides all the tools needed to do the right thing. And yet people don't, because they are being shepherded (unconsciously) into doing the thing that is easy and fast in Perl.

Make shepherds you into embedding shell pipelines in Makefiles

Compiling code with Make is tolerable, but it fails quite badly when you need to generate source code, data files and the like. The sustainable solution would be to write a standalone program in a proper scripting language that has all the code logic needed and call that from Make with your inputs and outputs. This rarely happens. Instead people think "I know, I have an entire Unix userland available [1], I can just string together random text mangling tools in a pipeline, write it here and be done". This is how unmaintainability is born.

Nothing about Make forces people to behave like this. Make shepherds people into doing this. It is the easy, natural outcome when faced with the given problem.

Other examples

C shepherds you into manipulating data via pointers rather than value objects.
C++ shepherds you into providing dependencies as header-only libraries.
Java does not shepherd you into using classes and objects, it pretty much mandates them.
Turing complete configuration languages shepherd you into writing complex logic with them, even though they are usually not particularly good programming environments.

[1] Which you don't have on Windows. Not to mention that every Unix has slightly different command line arguments and semantics for basic commands meaning shell pipelines are not actually portable.

Wednesday, March 11, 2020

The character that swallowed a week

In the last few posts we have looked at compiling LibreOffice from scratch using Meson. Contrary to what one might expect it was not particularly difficult, just laborious. The codegen bits (yes, there are several) required some deciphering, but other than that it was fairly straightforward. Unfortunately just compiling source code is not sufficient, as usually one also wants to run the result. Further you'd want the new binaries to behave in the same way as the old ones. This is where things get interesting.

Trying to run the main LibreOffice application is not particularly useful because it will almost certainly not work. Fortunately LO provides a bunch of sample and test applications one can use. One of these is a demo app that starts, initialises the LO runtime and opens up a GUI window. Perfect. After a build (which takes an hour) and install the binary is ready to run and produces a … segfault. More annoyingly it produces a segfault without any useful information that would aid in debugging.

Time to take out strace. Several attempts later we discover that LO tries to be dynamic and smart. LO consists of >150 shared libraries. Rather than link all of them together, what seems to be happening is that the code tries to determine where the main executable is, then looks up shared libraries relative to that binary, dlopens them and then goes on its merry way. If this fails it crashes somewhere for some reason I chose not to dig into. This debugging brought up an interesting discovery about naming. One of the libraries is called SAL, so naturally you would expect the actual file to be called libsal.so. This is also what the Makefile defining the library calls it. But that is not its actual name. Instead it is is libsallo.so. Somewhere within the bowels of the 150 000 lines of Make that make (ha) up the system, some (but not all) library names get an lo appended to them. There does not seem to be any logic to which ones, but fine, they can at least be fixed with manual work.

After that the program failed trying to open some data files. This was easily fixed by swiping all binary files from an existing working installation. Then it failed with a weird assert failure that seemed to indicate that the runtime object system had not been properly initialised. LO is built around a custom object model called UNO or Universal Network Objects. As you could tell by the name it comes from the 90s and has the following properties:

It provides an object model very close to the one in Java
Absolutely everything is done with inheritance, and all functionality is at least three namespaces deep
Objects are defined with a custom language that is compiled to C++ headers using a self built tool (I'm told this generates roughly a million lines of C++ headers in total)
All class types and objects are constructed at runtime
The runtime is written in C++, but the actual implementation is all void pointers and reinterpret casts

Break all the tools!

At this point the easy avenues had been explored and it was time to bring out gdb. While editing build definition files is something you can do with a simple editor, debugging really requires an IDE. The command line interface, even the curses one, is cumbersome compared to a full fledged GUI with mouse hovers for variables and so on. Fortunately this is a simple task: create a new Eclipse project, build (taking an hour again), install, run and debug. It worked really nicely. For about a day.

Eventually Eclipse's code indexer crashed because it was doing a full reindexing operation. When you restart Eclipse after such a crash it will restart the indexing operation and promptly crash again. The eventual solution was to manually delete all code that is not absolutely necessary for the test app, such as Writer, Calc and the rest. This brought the working set small enough that it did not crash any more. And no, increasing the amount of memory allocated to Eclipse's JVM does not fix the issue. It crashes even if given 8 gigs of memory. With this done the sequence of steps leading to the crash could be debugged:

The runtime tries to initialise itself, fails and throws its own custom exception type.
The exception is caught, ignored and discarded, followed by calling a custom error handling code.
The code retrieves the exception via other methods, and then tries to introspect the object for further information.
Introspecting the error object for the reason why runtime initialisation failed requires the runtime to be initialised and working. As it is not, things crash.

Interestingly if you change the code at step 2 to just grab the exception and print its error message, it works correctly. Hooray for frameworks, I guess.

After some fixes the issue was gone but a new one took its place. Somehow the code got into a loop where function A called B, which called C and so on for about 20 functions until something called back to function A. This lead to an eternal loop and stack exhaustion.

At this point we have two supposedly identical programs: one built with gbuild and one built with Meson. They should behave identically but don't. This gives us an approach to solve the problem: single step both programs in a debugger until their execution differs. When that happens we know where the bug is. In practice this is less simple. Because the code uses a ton of OO helper classes and templates, even a single function call can lead into many stack frames that you have to step through manually. These helpers go away when optimisations are enabled, but in this particular case they can't be used as they make debugging quite difficult. Even -Og changes the code too much to be useful. So no optimization it is and every time you change the optimization level to test it takes, you guessed it, an hour to build (or more if you try -O2).

Manually single stepping through code is the sort of tedious manual labor that humans are bad at. Fortunately gdb has a Python scripting interface. Thus one could write a script to run on each gdb that connects to a common server that orders them to single step and report their current PC location and then halt when they differ. This worked perfectly until the programs called into libc. For some reason the same calls into libc (specifically getenv) took a different amount of steps to finish so the programs desynched almost immediately. Fixing that seemed to take too much work so that idea was scrapped.

Manually single stepping through code is difficult because if you accidentally go too far, you have to start over again. Fortunately we now have rr, which allows you to step backwards in code execution as well. Recording a trace of one of the programs worked. The other program worked as well. Running both of them at the same time failed miserably for reasons that remained unclear.

Debuglander ][: The Sloggening

Nevertheless at this point I had two debugging aides: what actually should happen as an rr trace and what actually was happening in a live debugger. Now it was just a matter of finding out where their execution differs. After about two days of debugger stepping I gave up, doing this sort of work by hand is just not something the human brain does very well (or at least mine doesn't). It was back to the old straw-grasping-at board.

Like most big frameworks, UNO had its own code that does special compiler magic. The Makefile lists several flags that should not be used to compile said code. I tried adding those in the Meson version. It did not help. There is also some assembly code that fiddles with registers. Maybe that was the problem? Nope again. Maybe one of the libraries is missing some preprocessor define that causes bad compilations? No. Everything seemed to be in order and doing the exact same thing as the original build system did.

LO does provide unit tests for some functionality. I had not built them with Meson yet, so I figured I'd fix some of those just to get something done. Eventually I converted one test that just exercises an Any object and it crashed in the exact same way. Now I had a reproducer for the bug with 10x to 100x less code. Once more unto the debuggers dear friends, once more!

Eventually the desync point was found. One execution went to a header file line 16 and the other went to the same header, but to line 61. It was getting fairly late so simply noticing the difference between 16 and 61 took a while. Now we had the smoking gun, but there was one more twist to this story: the header file did not have a line 61, it had only about 30 lines of text.

Nevertheless the debugger reported line 61. It even allowed one to inspect variables and all the things you would not expect to be able to do when your process execution is on a nonexisting line. What was happening? Was the debug info in the ELF files corrupted in some way? And then it finally hit me.

LibreOffice generates two different sets of headers for the same IDL files: regular and comprehensive (there is also a third bootstrap one, but let's ignore that). These headers have the same names and the same supposed behaviour but different implementations. You can #include either in your code and it will compile and sort-of-work. I don't know why the original developers had decided to tempt the ODR violation gods in this way and nobody on the LO IRC channel seemed to know either, but finally the core issue had been found. When reverse engineering the Makefiles I had found the code generation phase that creates the regular headers rather than the comprehensive ones.

The fix consisted of changing one of the code generator's command line arguments from -l to -C, recompiling (which took an hour again) and running the binary. Finally, one week's worth of debugging work was rewarded with this:

A blank window has never looked sweeter.

Final commercial announcement

If you enjoyed this post and would like to know more about the Meson build system, consider buying my ebook.

Monday, March 2, 2020

Meson manual sales numbers and a profitability estimate

The Meson Manual has been available for purchase for about two months now. This is a sufficient amount of time to be able to estimate total sales amounts and the like. As one of the goals of the project was to see if this could be a reasonable way to compensate FOSS maintainers for their work, let's go through the numbers in detail.

Income

At the time of writing, 45 copies of the book have been sold amounting to around 1350€ of income. The first month's sales were about 700€ and the second one's about 500€. This is the common sales pattern where sales start at some level and then gradually decrease as time passes. Depending on how one calculates the drop-off, this would indicate total sales over a year of 2000-2500€.

Expenses

The ecommerce site charges ~10€ per month, which is 120€ per year. Credit card processing fees should land somewhere in the 200-300€ area depending on sales. Accounting services come to about 500€. Then there are many small things that probably sum up to 100-200€ in a year. After these immediate expenses roughly 800-1300€ remains. Unfortunately this is where things get uglier.

The book was launched in my presentation at Linux.conf Australia. Having some sort of a launch is a necessity, because just putting things on the Internet does not work. This cost about 2000€ total in travel and lodging expenses. This single expense takes out all remaining income and then some. One could have gone to a conference nearer by but as Finland is far away from everywhere, the expenses are easily 500€ even for continental Europe. This also assumes that one would have gotten a talk accepted there, which is not at all guaranteed. In fact the vast majority of Meson talks I have submitted have been rejected. Thus depending on how you look, we are now either 700€ in the red or 300-800€ in the black in the best possible case.

Writing a book takes time. A lot of time. The way I got mine was to make a deal with my employer to only work four days a week for several months. I spent that one day a week (specifically Wednesday) writing the book and working out all the requisite bureaucracy. In my day job I'm a consultant and my salary is directly determined by the number of hours billed from customers. Doing the math on this says that writing the manual cost me ~9000€ compared to just being at work.

With this the total compensation of the manual comes out to a loss between 8000€ and 9500€.

So was it worth it?

It depends? As a personal endeavor writing, publishing and selling a full book is very satisfying and rewarding (even though writing it was at times incredibly tedious). But financially? No way. The break-even point seems to be about 10× the current sales. If 100× sales were possible, it might be sufficient amount for more people to take the risk and try to make a living this way. With these sales figures it's just not worth it.

Bonus chapter: visibility and marketing

The biggest problem of any project like this is marketing: how to get your project visible. This is either extremely hard or downright impossible. As an example, let's do some Fermi estimations on reachability on Twitter. Suppose you post about your new project on Twitter, how many of your followers would then buy the product? Ten percent is probably too much, and one tenth of a percent is too small, so let's say one percent. Thus to sell 400 copies you'd need to have 40 000 direct Twitter followers. Retweets do not count as their conversion rate is even lower.

Your only real chance is to get visibility on some news media, but that is also difficult. They get inundated by people and projects that want to get their products to be seen. Thus news sites publish news almost exclusively on large established projects as they are the things that interest their readers the most. Note that this should not be seen as a slag against these sites, it's just the nature of the beast. A news site posting mostly about crowdfunding campaigns of small unknown projects would not be a very tempting site and would go out of business quickly.

Advertising might work, but the downsides include a) it costs more money and b) the target audience for a programming manual is the set of people who are the most likely to use ad blockers or otherwise ignore online ads (I have never bought anything based on an ad I have seen online).

Sunday, March 1, 2020

Unity build test with Meson & LibreOffice

In a previous blog post we managed to build a notable chunk of LibreOffice with Meson. This has since been updated so you can build all top level apps (Writer, Calc, Impress, Draw). The results do not actually run, so the conversion may seem pointless. That is not the case, though, because once you have this build setup you can start doing interesting experiments on a large real world C++ code base. One of these is unity builds.

A unity build is a surprisingly simple technique to speed up builds, especially C++. The idea is that instead of compiling source files individually, you build them in larger batches by creating files like these and compiling them instead:

#include<source1.cpp>
#include<source2.cpp>
#include<source3.cpp>
// etc

One of the main downsides of this technique is writing and maintaining the unity files by hand. Fortunately Meson has builtin support for creating unity files for build targets. In the next release you can even specify how many source files you want to have in a single unity source. Enabling unity builds for a single target is simple and consists of adding the following keyword argument to a target's definition:

override_options: ['unity=on']

In this experiment we used LibreOffice's Writer target, which is a single shared library consisting of almost 700 source files. The unity block size was Meson's default of 4.

The results

Compiling source code as a unity build for the first time usually leads to build failures. This is exactly what happened here as well. There are many reasons for this. The most popular are name clashes from static and anonymous namespace functions and of classes with common names that are used without namespace qualifications. LO turned out to have at least three different classes called Size. All these issues need to be fixed either by renaming or by adding namespace qualifiers. This is boring manual work, but on the other hand you may find duplicated static functions across your code base.

Once all the issues have been fixed we can do actual measurements. The test machine was an 8 thread i7-3770 with an SSD drive and the buildmode was set to debug. The full build times are these:

Regular 10m

Unity 4m 32s

This unity build is over 50% faster than the regular one, which is a fairly typical result. The difference would have been even bigger if we had used a bigger unity block size. Incremental builds were tested by deleting one object file and rebuilding.

Regular 26s

Unity 22s

Typically incremental unity builds are slower than regular ones, but in this case it is actually faster. This is probably because unity builds produce smaller output files that have less debug data. Thus the linker has less work. Increasing the unity block size would make the incremental build time slower.

Converting the code to compile as a unity build took on the order of three hours. Ironically most of it was spent waiting for compilations to finish. Since this saves around 5 minutes per build, the time investment is recovered after the development team has done 60 full builds. The code changes have been done with as little effort as possible, so polishing this to production quality would probably take 2-3 times as long.

Do try this at home!

The code is available in the unitytest branch of this repo, it has only been tested with Meson trunk on Linux. There are two targets that use unity builds: vcl and sw. You can toggle unity builds as well as the unity block size per target with the override_options keyword argument.