Monday, May 4, 2020

Let's talk meta

In my previous blog post about old techs causing problems in getting new developers on board. In it I had the following statement:
As a first order approximation, nobody under the age of 35 knows how to code in Perl, let alone would be willing to sacrifice their free time doing it.
When I wrote this, I spent a lot of time thinking whether I should add a footnote or extra sentence saying, roughly, that I'm not claiming that there are no people under 35 who know Perl, but that it is a skill that has gotten quite rare compared to ye olden times. The reason for adding extra text is that I feared that someone would inevitably come in and derail the discussion with some variation of "I'm under 35 and I know Perl, so the entire post is wrong".

In the end I chose not to put the clarification in the post. After all it was written slightly tongue-in-cheek, and even specifically says that this is not The Truth (TM), but just an approximation. The post was published. It got linked on a discussion forum. One of the very first comments was this:


This is what makes blogging on the Internet such a frustrating experience. Every single sentence you write has to be scrutinised from all angles and then padded and guarded so its meaning can not be sneakily undermined in this way. This is tiring, as it is difficult to get a good writing flow going. It may also make the text less readable and enjoyable. It makes blogging less fun and thus people less likely to want to do it.

An alternative to this is to not ready any comments. This works, but then you are flying blind. You can't tell what writing is good and which is not and you certainly can't improve. The Internet has ruined everything.

Meta-notes

Contrary to the claim made above, the Internet has not, in fact, ruined everything. The statement is hyperbole, stemming from the author's feelings of frustration. In reality the Internet has improved the quality of life of most people on the earth by a tremendous amount and should be considered as one of the greatest inventions of mankind.

"Ye olden times" was not written as "þe olden times" because in the thorny battle between orthographic accuracy and readability the latter won.

The phrase "flying blind" refers neither to actual flying nor to actual blindness. It is merely a figure of speech for any behaviour that is done in isolation without external feedback. You should never operate any vehicle under any sort of vision impairment unless you have been specifically trained and authorized to do so by the appropriate authorities.

Meta-meta-notes

The notes above were not written because the author thought that readers would take the original statements literally. Instead they are there to illustrate what would happen if the defensive approach to writing, as laid out in the post, were taken to absurd extremes. It exists purely for the purposes of comedy. As does this chapter.

Saturday, May 2, 2020

You have to kill your perlings

Preface

This blog post deals only with the social and "human" aspects of various technologies. It is not about the technical merits of any programming language or other such tech. If you intend to write a scathing Reddit comment along the lines of "this guy is an idiot, Perl is a great language for X because you can do Y, Z and W", please don't. That is not the point of this post. Perl was chosen as the canonical example mostly due to its prevalence, the same points apply for things like CORBA, TCL, needing to write XML files by hand, ridiculously long compilation times and so on.

What is the issue at hand?

The 90s and early 2000s a lot of code was written. As was fashionable at the time, a lot of it was done using Perl. As open source projects are chronically underfunded, a lot of that code is still running. In fact a lot of the core infrastructure of Debian, the BSDs and other such foundational projects is written in Perl. When told about this, many people give the "project manager" reply and say that since the code exists, works and is doing what it should, everything is fine. But it's really not, and to find out why, let's look at the following graph.

Graph of number of people capable and willing to work on Perl. The values peak at 2000 and plummet to zero by 2020.

As we can see the pool of people willing to work on Perl projects is shrinking fast. This is a major problem for open source, since a healthy project requires a steady influx of new contributors, developers and volunteers. As a first order approximation, nobody under the age of 35 knows how to code in Perl, let alone would be willing to sacrifice their free time doing it.

One could go into long debates and discussions about why this is, how millennials are killing open source and how everyone should just "man up" and start writing sigils in their variable names. It would be pointless, though. The reasons people don't want to do Perl are irrelevant, the only thing that matters is that the use of Perl is actively driving away potential project contributors. That is the zeitgeist. The only thing you can do is to adapt to it. That means migrating your project from Perl to something else.

But isn't that risky and a lot of work?

Yes. A lot of Perl code is foundational. In many cases the people who wrote it have left and no-one has touched it in years. Changing it is risky. No matter how careful you are, there will be bugs. Nasty bugs. Hard to trace bugs. Bugs that work together with other bugs to cancel each other out. It will be a lot of hard work, but that is the price you have to pay to keep your project vibrant.

An alternative is to do nothing. If your project never needs to change, then this might be a reasonable course of action. However if something happens and major changes are needed (and one thing we have learned this year is that unexpected things actually do happen) then you might end up as the FOSS equivalent of the New Jersey mayor trying to find people to code COBOL for free.

Sunday, April 19, 2020

Do humans or compilers produce faster code?

Modern optimizing compilers are truly amazing. They have tons and tons of tricks to make even crappy code run incredibly fast. Faster than most people could write by hand. This has lead some people to claim that program optimization is something that can be left to compilers, as they seem to be a lot better at it. This usually sparks a reply from people on the other end of the spectrum that say that they can write faster code by hand than any compiler which makes compilers mostly worthless when performance actually matters.

In a way both of these viewpoints are correct. In another way they are both wrong. To see how, let's split this issue into two parts.

A human being can write faster code than a compiler for any given program

This one is fairly easy to prove (semi)formally. Suppose you have a program P written in some programming language L that runs faster than any hand written version. A human being can look at the assembly output of that program and write an equivalent source version in straight C. Usually when doing this you find some specific optimization that you can add to make the hand written version faster.

Even if the compiler's output were proved optimal (such as in the case of superoptimization), it can still be matched by copying the output into your own program as inline assembly. Thus we have proven that for any program humans will always be faster.

A human being can not write faster code than a compiler for every program

Let's take something like Firefox. We know from the previous chapter that one could eschew complex compiler optimizations and rewrite it in straight C or equivalent and end up with better performance. The downside is that you would die of old age before the task would be finished.

Human beings have a limited shelf life. There are only so many times they can press a key on the keyboard until they expire. Rewriting Firefox to code that works faster with straight C than the current version with all optimizations enabled is just too big of a task.

Even if by some magic you could do this, during the rewrite the requirements on the browser would change. A lot. The end result would be useless until you add all the new functionality that was added since then. This would lead to eternal chasing of the tail lights of the existing project.

And even if you could do that, optimizing compilers keep getting better, so you'd need to go through your entire code base regularly and add the same optimizations by hand to keep up. All of these things could be done in theory, but they completely impossible in practice.

The entire question poorly posed

Asking whether compilers and humans write faster code is kind of like asking which one is "bluer", the sea or the sky. Sure you could spend years debating the issue on Twitter without getting anywhere, but it's not particularly interesting. A more productive way is to instead ask the question "given the requirements, skills and resources I have available, should I hand-optimize this particular routine or leave it to the compiler".

If you do this you rediscover the basic engineering adage: you get the best bang for the buck by relying on the compiler by default and doing things by hand only for bottlenecks that are discovered by profiling the running application.

PS. Unoptimized C is slow, too

Some people think that when they write C it is "very close" to the underlying assembly and thus does not benefit much from compiler optimizations. This has not been true for years (possibly decades). The performance difference between no optimization and -O2 can be massive, especially for hot inner loops.

When people say that they can write code that is faster than compiler optimized version of the same algorithm in some other language, that is not what they are actually saying. Unless they are writing 100% pure ASM by hand [0] that is not what they are saying. Instead they are saying "I can take any algorithm implementation, write it with an alternative syntax and, when both of these are run through their respective optimizing compilers, end up with a program that runs faster".

[0] Which does happen sometimes, especially for SIMD code.

Saturday, April 11, 2020

Your statement is 100% correct but misses the entire point

Let's assume that there is a discussion going on on the Internet about programming languages. One of the design points that come up is a garbage collector. One participant mentions the advantages of garbage collection with something like this:
Garbage collectors are nice and save a lot of work. If your application does not have strict latency requirements, not having to care about memory management is liberating and can improve developer efficiency by a lot.
This is a fairly neutral statement that most people would agree with, even if they work on code that has strict real time requirements. Yet, inevitably, someone will present this counterpoint.
No! If you have dangling references memory is never freed and you have to fix that by doing manual memory management anyway. Garbage collectors do not magically fix all bugs.
If you read through the sentences carefully you'll notice that every asserted statement in it is true. That is what makes it so frustrating to argue against. Most people with engineering backgrounds are quite willing to admit they are wrong when presented with evidence that their statements are not correct. This does not cover everyone, of course, as some people are quite willing to violently disagree with any and all facts that are in conflict with their pre-held beliefs. We'll ignore those people for the purpose of this post.

While true, that single sentence ignores all of the larger context of the issue, which contains points like the following:

  • Dangling reference out of memory errors are rare (maybe 1 in 10 programs?) whereas regular memory bugs like use-after-free, double free, off by one errors etc are very common (100-1000 in every program).
  • Modern GCs have very good profilers, finding dangling references is a lot easier than debugging stack corruptions.
  • Being able create things on a whim and just drop them to the floor makes programmers a lot more productive than forcing them to micromanage the complete life cycle of every single resource.
  • Even if you encounter a dangling reference issue, fixing it probably takes less time than would have gone to fixing memory corruption issues in a GCless version of the same app.
In brief, the actual sentence is true but misses the entire point of the comment they are replying to. This is sadly common on Internet debates. Let's see some examples.

Computer security

A statement like this:
Using HTTPS on all web traffic is good for security and anonymity.
 might be countered with something like this:
That provides no real security, if the NSA want your data they will break into your apartment and get it.
This statement is again absolutely true. On the other hand if you are not the leader of a nation state or do regular business with international drug cartels, you are unlikely to be the target of a directed NSA offensive.

If you think that this is a stupid point that nobody would ever make, I agree with you completely. I have also seen it used in the real world. I wish I hadn't.

Bugs are caused by incompetents

High level programming languages are nice.
Programming languages that guard against buffer overruns is great for security and ease of development.
But not for everyone.
You can achieve the exact same thing in C, you just have to be careful.
This is again true. If every single developer on a code base is being 100% focused and 100% careful 100% of the time, then bug free code is possible.  Reality has shown time and time again that it is not possible, human beings are simply not capable of operating flawlessly for extended periods of time.

Yagni? What Yagni?

There's the simple.
Processing text files with Python is really nice and simple.
And not so simple.
Python is a complete joke, it will fail hard when you need to process ten million files a second on an embedded microcontroller using at most 2 k of RAM.
Yes. Yes it does. In that use case it would be the wrong choice. You are absolutely correct. Thank you for your insight, good sir, here is a shiny solid gold medal to commemorate your important contribution to this discussion.

What could be the cause of this?

The one thing that school trains you for is that being right is what matters. If you get the answers right in your test, then you get a good grade. Get them wrong and you don't. Maybe this frame of mind "sticks on" once you leave school, especially given that most people who post these kinds of comments seem to be from the "smarter" end of the spectrum (personal opinion, not based on any actual research). In the real world being right is not a merit by itself. In any debate being right is important, of course, but the much more important feature is being relevant. That requires understanding the wider context and possibly admitting that something that is the most important thing in the world to you personally, might be completely irrelevant for the issue at hand.

Being right is easy. Being relevant is extremely difficult.

Sunday, April 5, 2020

Meson manual sales status and price adjustment

The sales dashboard of the Meson manual currently looks like this.

It splits up quite nicely into three parts. The first one is the regular sales from the beginning of the year, which is on average less than one sale per day.

The second part (marked with a line) indicates when I was a guest on CppCast talking about Meson and the book. As an experiment I created a time limited discount coupon so that all listeners could buy it with €10 off. As you can tell from the graph it did have an immediate response, which again proves that marketing and visibility are the things that actually matter when trying to sell any product.

After that we have the "new normal", which means no sales at all. I don't know if this is caused by the coronavirus isolation or whether this is the natural end of life for the product (hopefully the former but you can never really tell in advance).

Price reduction

Thus, effective immediately, the price of the book has been reduced to €24.95. You can purchase it from the official site.

Thursday, March 26, 2020

It's not what programming languages do, it's what they shepherd you to

How many of you have listened, read or taken part in a discussion about programming languages that goes like the following:

Person A: "Programming language X is bad, code written in it is unreadable and horrible."

Person B: "No it's not. You can write good code in X, you just have to be disciplined."

Person A: "It does not work, if you look at existing code it is all awful."

Person B: "No! Wrong! Those are just people doing it badly. You can write readable code just fine."

After this the discussion repeats from the beginning until either one gets fed up and just leaves.

I'm guessing more than 99% of you readers have seen this, often multiple times. The sad part of this is that even though this thing happens all the time, nobody learns anything and the discussion begins anew all the time. Let's see if we can do something about this. A good way to go about it is to try to come up with a name and a description for the underlying issue.
shepherding An invisible property of a progamming language and its ecosystem that drives people into solving problems in ways that are natural for the programming language itself rather than ways that are considered "better" in some sense. These may include things like long term maintainability, readability and performance.
This is a bit abstract, so let's look at some examples.

Perl shepherds you into using regexps

Perl has several XML parsers available and they are presumably good at their jobs (I have never actually used one so I wouldn't know). Yet, in practice, many Perl scripts do XML (and HTML) manipulation with regexes, which is brittle and "wrong" for lack of a better term. This is a clear case of shepherding. Text manipulation in Perl is easy. Importing, calling and using an XML parser is not. And really all you need to do is to change that one string to a different string. It's tempting. It works. Surely it could not fail. Let's just do it and get on with other stuff. Boom, just like that you have been shepherded.

Note that there is nothing about Perl that forces you to do this. It provides all the tools needed to do the right thing. And yet people don't, because they are being shepherded (unconsciously) into doing the thing that is easy and fast in Perl.

Make shepherds you into embedding shell pipelines in Makefiles

Compiling code with Make is tolerable, but it fails quite badly when you need to generate source code, data files and the like. The sustainable solution would be to write a standalone program in a proper scripting language that has all the code logic needed and call that from Make with your inputs and outputs. This rarely happens. Instead people think "I know, I have an entire Unix userland available [1], I can just string together random text mangling tools in a pipeline, write it here and be done". This is how unmaintainability is born.

Nothing about Make forces people to behave like this. Make shepherds people into doing this. It is the easy, natural outcome when faced with the given problem.

Other examples

  • C shepherds you into manipulating data via pointers rather than value objects.
  • C++ shepherds you into providing dependencies as header-only libraries.
  • Java does not shepherd you into using classes and objects, it pretty much mandates them.
  • Turing complete configuration languages shepherd you into writing complex logic with them, even though they are usually not particularly good programming environments.
[1] Which you don't have on Windows. Not to mention that every Unix has slightly different command line arguments and semantics for basic commands meaning shell pipelines are not actually portable.

Wednesday, March 11, 2020

The character that swallowed a week

In the last few posts we have looked at compiling LibreOffice from scratch using Meson. Contrary to what one might expect it was not particularly difficult, just laborious. The codegen bits (yes, there are several) required some deciphering, but other than that it was fairly straightforward. Unfortunately just compiling source code is not sufficient, as usually one also wants to run the result. Further you'd want the new binaries to behave in the same way as the old ones. This is where things get interesting.

Trying to run the main LibreOffice application is not particularly useful because it will almost certainly not work. Fortunately LO provides a bunch of sample and test applications one can use. One of these is a demo app that starts, initialises the LO runtime and opens up a GUI window. Perfect. After a build (which takes an hour) and install the binary is ready to run and produces a … segfault. More annoyingly it produces a segfault without any useful information that would aid in debugging.

Time to take out strace. Several attempts later we discover that LO tries to be dynamic and smart. LO consists of >150 shared libraries. Rather than link all of them together, what seems to be happening is that the code tries to determine where the main executable is, then looks up shared libraries relative to that binary, dlopens them and then goes on its merry way. If this fails it crashes somewhere for some reason I chose not to dig into. This debugging brought up an interesting discovery about naming. One of the libraries is called SAL, so naturally you would expect the actual file to be called libsal.so. This is also what the Makefile defining the library calls it. But that is not its actual name. Instead it is is libsallo.so. Somewhere within the bowels of the 150 000 lines of Make that make (ha) up the system, some (but not all) library names get an lo appended to them. There does not seem to be any logic to which ones, but fine, they can at least be fixed with manual work.

After that the program failed trying to open some data files. This was easily fixed by swiping all binary files from an existing working installation. Then it failed with a weird assert failure that seemed to indicate that the runtime object system had not been properly initialised. LO is built around a custom object model called UNO or Universal Network Objects. As you could tell by the name it comes from the 90s and has the following properties:
  • It provides an object model very close to the one in Java
  • Absolutely everything is done with inheritance, and all functionality is at least three namespaces deep
  • Objects are defined with a custom language that is compiled to C++ headers using a self built tool (I'm told this generates roughly a million lines of C++ headers in total)
  • All class types and objects are constructed at runtime
  • The runtime is written in C++, but the actual implementation is all void pointers and reinterpret casts

Break all the tools!

At this point the easy avenues had been explored and it was time to bring out gdb. While editing build definition files is something you can do with a simple editor, debugging really requires an IDE. The command line interface, even the curses one, is cumbersome compared to a full fledged GUI with mouse hovers for variables and so on. Fortunately this is a simple task: create a new Eclipse project, build (taking an hour again), install, run and debug. It worked really nicely. For about a day.

Eventually Eclipse's code indexer crashed because it was doing a full reindexing operation. When you restart Eclipse after such a crash it will restart the indexing operation and promptly crash again. The eventual solution was to manually delete all code that is not absolutely necessary for the test app, such as Writer, Calc and the rest. This brought the working set small enough that it did not crash any more. And no, increasing the amount of memory allocated to Eclipse's JVM does not fix the issue.  It crashes even if given 8 gigs of memory. With this done the sequence of steps leading to the crash could be debugged:

  1. The runtime tries to initialise itself, fails and throws its own custom exception type.
  2. The exception is caught, ignored and discarded, followed by calling a custom error handling code.
  3. The code retrieves the exception via other methods, and then tries to introspect the object for further information.
  4. Introspecting the error object for the reason why runtime initialisation failed requires the runtime to be initialised and working. As it is not, things crash.
Interestingly if you change the code at step 2 to just grab the exception and print its error message, it works correctly. Hooray for frameworks, I guess.

After some fixes the issue was gone but a new one took its place. Somehow the code got into a loop where function A called B, which called C and so on for about 20 functions until something called back to function A. This lead to an eternal loop and stack exhaustion.

At this point we have two supposedly identical programs: one built with gbuild and one built with Meson. They should behave identically but don't. This gives us an approach to solve the problem: single step both programs in a debugger until their execution differs. When that happens we know where the bug is. In practice this is less simple. Because the code uses a ton of OO helper classes and templates, even a single function call can lead into many stack frames that you have to step through manually. These helpers go away when optimisations are enabled, but in this particular case they can't be used as they make debugging quite difficult. Even -Og changes the code too much to be useful. So no optimization it is and every time you change the optimization level to test it takes, you guessed it, an hour to build (or more if you try -O2).

Manually single stepping through code is the sort of tedious manual labor that humans are bad at. Fortunately gdb has a Python scripting interface. Thus one could write a script to run on each gdb that connects to a common server that orders them to single step and report their current PC location and then halt when they differ. This worked perfectly until the programs called into libc. For some reason the same calls into libc (specifically getenv) took a different amount of steps to finish so the programs desynched almost immediately. Fixing that seemed to take too much work so that idea was scrapped.

Manually single stepping through code is difficult because if you accidentally go too far, you have to start over again. Fortunately we now have rr, which allows you to step backwards in code execution as well. Recording a trace of one of the programs worked. The other program worked as well. Running both of them at the same time failed miserably for reasons that remained unclear.

Debuglander ][: The Sloggening

Nevertheless at this point I had two debugging aides: what actually should happen as an rr trace and what actually was happening in a live debugger. Now it was just a matter of finding out where their execution differs. After about two days of debugger stepping I gave up, doing this sort of work by hand is just not something the human brain does very well (or at least mine doesn't). It was back to the old straw-grasping-at board.

Like most big frameworks, UNO had its own code that does special compiler magic. The Makefile lists several flags that should not be used to compile said code. I tried adding those in the Meson version. It did not help. There is also some assembly code that fiddles with registers. Maybe that was the problem? Nope again. Maybe one of the libraries is missing some preprocessor define that causes bad compilations? No. Everything seemed to be in order and doing the exact same thing as the original build system did.

LO does provide unit tests for some functionality. I had not built them with Meson yet, so I figured I'd fix some of those just to get something done. Eventually I converted one test that just exercises an Any object and it crashed in the exact same way. Now I had a reproducer for the bug with 10x to 100x less code. Once more unto the debuggers dear friends, once more!

Eventually the desync point was found. One execution went to a header file line 16 and the other went to the same header, but to line 61. It was getting fairly late so simply noticing the difference between 16 and 61 took a while. Now we had the smoking gun, but there was one more twist to this story: the header file did not have a line 61, it had only about 30 lines of text.

Nevertheless the debugger reported line 61. It even allowed one to inspect variables and all the things you would not expect to be able to do when your process execution is on a nonexisting line. What was happening? Was the debug info in the ELF files corrupted in some way? And then it finally hit me.

LibreOffice generates two different sets of headers for the same IDL files: regular and comprehensive (there is also a third bootstrap one, but let's ignore that). These headers have the same names and the same supposed behaviour but different implementations. You can #include either in your code and it will compile and sort-of-work. I don't know why the original developers had decided to tempt the ODR violation gods in this way and nobody on the LO IRC channel seemed to know either, but finally the core issue had been found. When reverse engineering the Makefiles I had found the code generation phase that creates the regular headers rather than the comprehensive ones.

The fix consisted of changing one of the code generator's command line arguments from -l to -C, recompiling (which took an hour again) and running the binary. Finally, one week's worth of debugging work was rewarded with this:


A blank window has never looked sweeter.

Final commercial announcement

If you enjoyed this post and would like to know more about the Meson build system, consider buying my ebook.