Wednesday, February 24, 2021

Millennium prize problems but for Linux

There is a longstanding tradition in mathematics to create a list of hard unsolved problems to drive people to work on solving them. Examples include Hilbert's problems and the Millennium Prize problems. Wouldn't it be nice if we had the same for Linux? A bunch of hard problems with sexy names that would drive development forward? Sadly there is no easy source for tens of millions of euros in prize money, not to mention it would be very hard to distribute as this work would, by necessity, be spread over a large group of people.

Thus it seems is unlikely for this to work in practice, but that does not prevent us from stealing a different trick from mathematicians' toolbox and ponder how it would work in theory. In this case the list of problems will probably never exist, but let's assume that it does. What would it contain if it did exist? Here's one example I came up with. it is left as an exercise to the reader to work out what prompted me to write this post.

The Memory Depletion Smoothness Property

When running the following sequence of steps:
  1. Check out the full source code for LLVM + Clang
  2. Configure it to compile Clang and Clang-tools-extra, use the Ninja backend and RelWithDebInfo build type, leave all other settings to their default values
  3. Start watching a video file with VLC or a browser
  4. Start compilation by running nice -19 ninja
The outcome must be that the video playback works without skipping a single frame or audio sample.

What happens currently?

When Clang starts linking, each linker process takes up to 10 gigabytes of ram. This leads to memory exhaustion, flushing active memory to swap and eventually crashing the linker processes. Before that happens, however, every other app freezes completely and the entire desktop remains frozen until things get swapped back in to memory, which can take tens of seconds. Interestingly all browser tabs are killed before linker processes start failing. This happens both with Firefox and Chromium.

What should happen instead?

The system handles the problematic case in a better way. The linker processes will still die as there is not enough memory to run them all but the UI should never noticeably freeze. For extra points the same should happen even if you run Ninja without nice.

The wrong solution

A knee-jerk reaction many people have is something along the lines of "you can solve this by limiting the number of linker processes by doing X". That is not the answer. It solves the symptoms but not the underlying cause, which is that bad input causes the scheduler to do the wrong thing. There are many other ways of triggering the same issue, for example by copying large files around. A proper solution would fix all of those in one go.

Saturday, February 6, 2021

Why most programming language performance comparisons are most likely wrong

For as long as programming languages have existed, people have fought over which one of them is the fastest. These debates have ranged from serious scientific research to many a heated late night bar discussion. Rather than getting into this argument, let's look at the problem at a higher level, namely how would you compare the performance of two different programming languages. The only really meaningful approach is to do it empirically, that is, implementing a bunch of test programs in both programming languages, benchmarking them and then declaring the winner.

This is hard. Really hard. Insanely hard in some cases and very laborious in any case. Even though the problem seems straightforward, there are a ton of error sources that can trip up the unaware (and even many very-much-aware) performance tester.

Equivalent implementations?

In order to make the two implementations comparable they should be "of equal quality". That is, they should have been implemented by people with roughly the same amount of domain knowledge as well as programming skills in their chosen language. This is difficult to organise. If the implementations are written by different people, they may approach the problem with different algorithms making the relative performance not a question of programming languages per se, but of the programming approaches chosen by each programmer.

Even if both implementation are written by the same person using the same algorithm, there are still problems. Typically people are better at some programming languages than others. Thus they tend to provide faster implementations in their favourite language. This causes bias, because the performance is not a measure of the programming languages themselves, but rather the individual programmer. These sorts of tests can be useful in finding usability and productivity differences, but not so much for performance.

Thus you might want to evaluate existing programs written by many expert programmers. This is a good approach, but sometimes even seasoned researches get it wrong. There is a paper that tries to compare different programming languages for performance and power efficiency using this approach. In their test results one particular program's C implementation was 30% faster than the same program written in C++. This single measurement throws a big shade over the entire paper. If we took the original C source, changed all the sources' file extension from .c to .cpp and recompiled, the end result should have the same performance within a few percentage points. Thus we have to conclude that one of the following is happening (in decreasing order of probability):
  1. The C++ version is suboptimally coded.
  2. The testing methodology has a noticeable flaw.
  3. The compiler used has a major performance regression for C++ as opposed to C.
Or, in other words, the performance difference comes somewhere else than the choice of programming language.

The difficulty of measurement

A big question is how does one actually measure the performance of any given program. A common approach is to run the test multiple times in a row and then do something like the following:
  • Handle outliers by dropping the points at extreme ends (that is, the slowest and fastest measurements)
  • Calculate the mean and/or median for the remaining data points
  • Compare the result between different programs, the one with the fastest time wins
Those who remember their high school statistics lessons might calculate standard deviation as well. This approach seems sound and rigorous, but it contains several sources of systematic error. The first of these is quite surprising and has to do with noise in measurements.

Most basic statistical tools assume that the error is normally distributed with an average value of zero. If you are measuring something like temperature or speed this is a reasonable assumption. For this case it is not. A program's measured time consists of the "true" time spent solving the problem and overhead that comes from things like OS interruptions, disk accesses and so on. If we assume that the noise is gaussian with a zero average then what it means is that the physical machine has random processes that make the program run faster than it would if the machine was completely noise free. This is, of course, impossible. The noise is strongly non-gaussian simply because it can never have a negative value.

In fact, the measurement that is the closest to the platonic ideal answer is the fastest one. It has the least amount of noise interference from the system. That is the very same measurement that was discarded in the first step when outliers were cleaned out. Sometimes doing established and reasonable things makes things worse.

Statistics even harder

Putting that aside, let's assume we have measurements for the two programs, which do look "sufficiently gaussian". Numerical analysis shows that language #1 takes 10 seconds to run whereas language #2 takes 9 seconds. A 10% difference is notable and thus we can conclude that language #2 is faster. Right?

Well, no. Suppose the actual measurement data look like this:


Is the one on the right faster or not? Maybe? Probably? Could be? Answering this question properly requires going all the way to university level statistics. First one formulates a null hypothesis, that is, that the two programs have no performance difference. Then one calculates the probability that both of these measurements have come from the same probability distribution. If the probability for this is small (typically 5%), then the null hypothesis is rejected and we have proven that one program is indeed faster than the other. This method is known as Student's t-test. and it is used commonly in heavy duty statistics. Note that some implementations of the test assume gaussian data and if you data has some other shape, the results you get might not be reliable.

This works for one program, but a rigorous test has many different programs. There are statistical methods for evaluating those, but they get even more complicated. Looking up how they work is left as an exercise to the reader.

All computers' alignment is Chaotic Neutral

Statistics are hard, but fortunately computers are simple because they are deterministic, reliable and logical. For example if you have a program and you modify it by adding a single NOP instruction somewhere in the stream, the biggest impact it could possibly have is one extra instruction cycle, which is so vanishingly small as to be unmeasurable. If you do go out and measure it, though, the results might confuse and befuddle you. Not only can this one minor change make the program 10% slower (or possibly even more), it can even make it 10% faster. Yes. Doing pointless extra work can make your the program faster.

If this is the first time you encounter this issue you probably won't believe it. Some fraction might already have gone to Twitter to post their opinion about this "stupid and wrong" article written by someone who is "clearly incompetent and an idiot". That's ok, I forgive you. Human nature and all that. You'll grow out of it eventually. The phenomenon is actually real and can be verified with measurements. How is it possible that having the CPU do extra work could make the program faster?

The answer is that it doesn't. The actual instruction is irrelevant. What actually matters is code alignment. As code shifts around in memory, its performance characteristics change. If a hot loop gets split by a cache boundary it slows down. Unsplitting it speeds it up. The NOP does not need to be inside the loop for this to happen, simply moving the entire code block up or down can cause this difference. Suppose you measure two programs in the most rigorous statistical way possible. If the performance delta between the two is under something like 10%, you can not reasonably say one is faster than the other unless you use a measurement method that eliminates alignment effects.

It's about the machine, not the language

As programs get faster and faster optimisation undergoes an interesting phase transition. Once performance gets to a certain level the system no longer about what the compiler and CPU can do to run the developer's program as fast as possible. Instead it becomes about how the programmer can utilize the CPU's functionality as efficiently as possible. These include things like arranging your data into a layout that the processor can crunch with minimal effort and so on. In effect this means replacing language based primitives with hardware based primitives. In some circles optimization works weirdly backwards in that the programmer knows exactly what SIMD instructions they want a given loop to be optimized into and then fiddles around with the code until it does. At this point the functionality of the programming language itself is immaterial.

This is the main reason why languages like C and Fortran are still at the top of many performance benchmarks, but the techniques are not limited to those languages. Years ago I worked on a fairly large Java application that had been very thoroughly optimized. Its internals consisted of integer arrays. There were no classes or even Integer objects in the fast path, it was basically a recreation of C inside Java. It could have been implemented in pretty much any programming language. The performance differences between them would have mostly come down to each compiler's optimizer. They produce wildly different performance outcomes even when using the same programming language, let alone different ones. Once this happens it's not really reasonable to claim that any one programming language is clearly faster than others since they all get reduced to glorified inline assemblers.

References

Most of the points discussed here has been scraped from other sources and presentations, which the interested reader is encouraged to look up. These include the many optimization talks by Andrei Alexandrescu as well as the highly informational CppCon keynote by Emery Berger. Despite its venue the latter is fully programming language agnostic so you can watch it even if you are the sort of person who breaks out in hives whenever C++ is mentioned.

Monday, February 1, 2021

Using a gamepad to control a painting application

One of the hardest things in drawing and painting is controlling the individual strokes. Not only do you have to control the location but also the pressure, tilt and rotation of the pen or brush. This means mastering five or six degrees of freedom at the same time with extreme precision. Doing it well requires years of practice. Modern painting applications and tools like drawing tablets emulate this experience quite well, but the beauty of computers is that we can do even more.

As an experiment I wrote a test application that separates tilt and pressure from drawing. In this approach one hand draws the shape as before, but the other controls can be controlled with the other hand by using a regular gamepad controller. Here's what it looks like (in case your aggregator strips embedded YouTube players, here is the direct link).

The idea itself is not new, there are discussions about it in e.g. Krita's web forum. Nonetheless it was a fun weekend hack (creating the video actually took longer than writing the app). After playing around with the app for a while this seems like a useful feature for an actual painting application. It is not super ergonomic though, but that may just be an issue with the Logitech gamepad I had. Something like the Wii remote would probably feel smoother, but I don't have one to test.

The code is available here for those who want to try it out.

Wednesday, January 6, 2021

Quick review of Lenovo Yoga 9i laptop

Some time ago I pondered on getting a new laptop. Eventually I bought a Lenovo Yoga 9i, which ticked pretty much all the boxes. I also considered a Dell 9310 but chose against it due to two reasons. Firstly, several reviews say that the keyboard feels bad with too shallow a movement. The second bit being that Dell's web site for Finland does not actually sell computers to individuals, only corporations, and their retailers did not have any of the new models available.

The hardware

It's really nice. Almost everything you need is there, such as USB A and C, touch screen, pen, 16GB of ram, Tiger Lake CPU, Xe graphics and so on. The only real missing things are a microsd card slot and a HDMI port. The trackpad is nice, with multitouch working flawlessly in e.g. Firefox. You can only do right click by clicking on the right edge rather than clicking with two fingers, but that's probably a software limitation (of Windows?). The all glass trackpad surface is a bit of a fingerprint magnet, though.

There are two choices for the screen, either FullHD or 4k. I took the latter because once you have experienced retina, you'll never go back. This reduces battery life, but even the 4k version gets 4-8 hours of battery life, which is more than I need. The screen itself is really, really nice apart from the fact that it is extremely glossy, almost like a mirror. Colors are very vibrant (to the point of being almost too saturated in some videos) and bright. Merely looking at the OS desktop background and app icons feels nice because the image is so sharp and calm. As a negative point just looking at Youtube videos makes the fan spin up. 

The touchscreen and pen work as expected, though pen input is broken in Windows Krita by default. You need to change the input protocol from the default to the other option (whose actual name I don't remember).

When it comes to laptop keyboards, I'm very picky. I really like the 2015-era MBPro and Thinkpad keyboards. This keyboard is not either of those two but it is very good. The key travel is slightly shallower and the resistance is crisper. It feels pleasant to type on.

Linux support

This is ... not good. Fedora live USBs do not even boot, and a Ubuntu 20/10 live USB has a lot of broken stuff, but surprisingly wifi works nicely. Things that are broken include:
  • Touchscreen
  • 3D acceleration (it uses LLVM softpipe instead)
  • Trackpad
  • Pen
The trackpad bug is strange. Clicking works, but motion does not unless you push it at a very, very, very specific amount pressure that is incredibly close to the strength needed to activate the click. Once click activates, motion breaks again. In practice it is unusable.

All of these are probably due to the bleeding-edgeness of the hardware and will probably be fixed in the future. For the time being, though, it is not really usable as a Linux laptop.

In conclusion

This is the best laptop I have ever owned. It may even be the best one I have ever used.

Monday, December 28, 2020

Some things a potential Git replacement probably needs to provide

Recently there has been renewed interest in revision control systems. This is great as improvements to tools are always welcome. Git is, sadly, extremely entrenched and trying to replace will be an uphill battle. This is not due to technical but social issues. What this means is that approaches like "basically Git, but with a mathematically proven model for X" are not going to fly. While having this extra feature is great in theory, in practice is it not sufficient. The sheer amount of work needed to switch a revision control system and the ongoing burden of using a niche, nonstandard system is just too much. People will keep using their existing system.

What would it take, then, to create a system that is compelling enough to make the change? In cases like these you typically need a "big design thing" that makes the new system 10× better in some way and which the old system can not do. Alternatively the new system needs to have many small things that are better but then the total improvement needs to be something like 20× because the human brain perceives things nonlinearly. I have no idea what this "major feature" would be, but below is a list of random things that a potential replacement system should probably handle.

Better server integration

One of Git's design principles was that everyone should have all the history all the time so that every checkout is fully independent. This is a good feature to have and one that should be supported by any replacement system. However it is not revision control systems are commonly used. 99% of the time developers are working on some sort of a centralised server, be it Gitlab, Github or the a corporation's internal revision control server. The user interface should be designed so that this common case is as smooth as possible.

As an example let's look at keeping a feature branch up to date. In Git you have to rebase your branch and then force push it. If your branch had any changes you don't have in your current checkout (because they were done on a different OS, for example), they are now gone. In practice you can't have more than one person working on a feature branch because of this (unless you use merges, which you should not do). This should be more reliable. The system should store, somehow, that a rebase has happened and offer to fix out-of-date checkouts automatically. Once the feature branch gets to trunk, it is ok to throw this information away. But not before that.

Another thing one could do is that repository maintainers could mandate things like "pull requests must not contain merges from trunk to the feature branch" and the system would then automatically prohibit these. Telling people to remove merges from their pull requests and to use rebase instead is something I have to do over and over again. It would be nice to be able to prohibit the creation of said merges rather than manually detecting and fixing things afterwards.

Keep rebasing as a first class feature

One of the reasons Git won was that it embraced rebasing. Competing systems like Bzr and Mercurial did not and advocated merges instead. It turns out that people really want their linear history and that rebasing is a great way to achieve that. It also helps code review as fixes can be done in the original commits rather than new commits afterwards. The counterargument to this is that rebasing loses history. This is true, but on the other hand is also means that your commit history gets littered with messages like "Some more typo fixes #3, lol." In practice people seem to strongly prefer the former to the latter.

Make it scalable

Git does not scale. The fact that Git-LFS exists is proof enough. Git only scales in the original, narrow design spec of "must be scalable for a process that only deals in plain text source files where the main collaboration method is sending patches over email" and even then it does not do it particularly well. If you try to do anything else, Git just falls over. This is one of the main reasons why game developers and the like use other revision control systems. The final art assets for a single level in a modern game can be many, many times bigger than the entire development history of the Linux kernel.

A replacement system should handle huge repos like these effortlessly. By default a checkout should only download those files that are needed, not the entire development history. If you need to do something like bisection, then files missing from your local cache (and only those) should be downloaded transparently during checkout operations. There should be a command to download the entire history, of course, but it should not be done by default.

Further, it should be possible to do only partial checkouts. People working on low level code should be able to get just their bits and not have to download hundreds of gigs of textures and videos they don't need to do their work.

Support file locking

This is the one feature all coders hate: the ability to lock a file in trunk so that no-one else can edit it. It is disruptive, annoying and just plain wrong. It is also necessary. Practice has shown that artists at large either can not or will not use revision control systems. There are many studios where the revision control system for artists is a shared network drive, with file names like character_model_v3_final_realfinal_approved.mdl. It "works for them" and trying to mandate a more process heavy revision control system can easily lead to an open revolt.

Converting these people means providing them with a better work flow. Something like this:
  1. They open their proprietary tool, be it Photoshop, Final Cut Pro or whatever.
  2. Click on GUI item to open a new resource.
  3. A window pops up where they can browse the files directly from the server as if they were local.
  4. They open a file.
  5. They edit it.
  6. They save it. Changes go directly in trunk.
  7. They close the file.
There might be a review step as well, but it should be automatic. Merge requests should be filed and kept up to date without the need to create a branch or to even know that such a thing exists. Anything else will not work. Specifically doing any sort of conflict resolution does not work, even if it were the "right" thing to do. The only way around this (that we know of) is to provide file locking. Obviously this should only be limitable to binary files.

Provide all functionality via a C API

The above means that you need to be able to deeply integrate the revision control system with existing artist tools. This means plugins written in native code using a stable plain C API. The system can still be implemented in whatever SuperDuperLanguage you want, but its one true entry point must be a C API. It should be full-featured enough that the official command line client should be implementable using only functions in the public C API.

Provide transparent Git support

Even if a project would want to move to something else, the sad truth is that for the time being the majority of contributors only know Git. They don't want to learn a whole new tool just to contribute to the project. Thus the server should serve its data in two different formats: once in its native format and once as a regular Git endpoint. Anyone with a Git client should be able to check out the code and not even know that the actual backend is not Git. They should be able to even submit merge requests, though they might need to jump through some minor hoops for that. This allows you to do incremental upgrades, which is the only feasible way to get changes like these done.

Friday, November 27, 2020

How Apple might completely take over end users' computers

Many people are concerned about Apple's ongoing attempts to take more and more control of end user machines from their users. Some go so far as to say that Apple won't be happy until they have absolute and total control over all programs running on end user devices, presumably so that they can enforce their 30% tax on every piece of software. Whether this is true or not we don't really know.

What we can do instead is a thought experiment. If that was their end goal, how would they achieve it? What steps would they take to obtain this absolute control? Let's speculate.

Web apps

A common plan against tightening app store requirements is to provide a web app instead. You can do a lot of cool things with WebAssembly and its state is continuously improving. Thus it must be blocked. This is trivial: require that web browsers may only run WASM programs that are notarized by Apple. This is an easy change to sell, all it needs is a single tear jerking please-think-of-the-children presentation about the horrible dangers of online predators, Bitcoin miners and the like. Notarization adds security and who wouldn't want to have more of that?

There is stilll the problem that you can run an alternative browser like Chrome or Firefox. This can be solved simply by adding a requirement that third party browsers can only get notarized if they block all non-notarized web apps. On iOS this is of course already handled by mandating that all browsers must use the system's browser engine. At some point this might get brought over to macOS as well. For security.

Javascript

Blocking WASM still leaves the problem of Javascript. There is a lot of it and even Apple can not completely block non-notarized JS from running. Here you have to run the long game. An important step is, surprisingly, to drive the adoption of WebAssembly. There are many ways of doing this, the simplest is to stop adding any new JS functionality and APIs that can instead be done in WebAssembly. This forces app developers to either drop Apple support or switch to WASM. This transition can be boosted by stopping all development and maintenance on the existing JS engine and letting it bitrot. Old web pages will get worse and worse over time and since Apple won't fix their browser, site operators will be forced to upgrade to technologies like WASM that come with mandatory notarization. For security.

Scripting languages

Scripting languages such as Perl and Python can be used to run arbitrary programs so they must go. First they are removed from the core install so people have to download and install them separately. That is only an inconvenience, though. To achieve total control notarization requirements must again be updated. Any program that loads "external code" must add a check that the code it is running is notarized by Apple. At first you will of course allow locally written script files to be run, as long as you first hunt through system security settings to add run permissions to the script file. This must be done with physical human interaction like a mouse or touchpad. It must not be automatable to prevent exploits. The obtained permissions are of course revoked every time the file is edited. For security.

Compilers

There is still a major hole in this scheme: native compilers. It might be tedious, but it is possible to compile even something as big as Firefox from scratch and run the result. Therefore this must be blocked, and notarization is again the key. This can be done by requiring all binaries, even self-built ones, to be notarized. This is again easy to sell, because it blocks a certain class malware injection attacks. Following iOS's lead you have to get a developer certificate from Apple to sign your own code to run on your own machine.

Once the basic scheme is in place you have to tighten security and block the signing from any compiler except the Apple provided system one. This has to be done for security, because existing third party compilers may have bugs and features that could be used to circumvent notarization requirements somehow. Only Apple can get this right as the system provider. There must be one, and only one, way of going from source code to executable binaries and that path must be fully controlled by Apple. For security.

Total separation of development and use

Even with all this you can still compile and run your own code, meaning people will find ways of getting around these requirements and doing what they want to do rather than what Apple permits them to do. This means that even tighter reins are required. The logical end result is to split the macOS platform into two separate entities. The first one is the "end user" system that can only run Apple-notarized apps and nothing else. The second is the "dev platform" that runs only XCode, Safari (in some sort of a restricted mode) and one other program that has to have been fully compiled on the current physical machine. Remote compilation servers are forbidden as they are a security risk. This is roughly how iOS development and things like game console dev kits already work. The precedent is there, waiting for the final logical step to be taken.

This has the side effect that every developer who wants to support Apple platforms now has to buy two different Apple laptops, one for development and one for testing. But let us be absolutely clear about one thing. This is not done to increase sales and thus profits. No. Not under any circumstances! It is for a higher purpose: for security.

Saturday, November 21, 2020

Adding (very) preliminary support for C++ modules in Meson

One of the most common questions people ask about Meson is why does it not yet have support for building C++ modules. Up until now the answer has been simple: no compiler really supports it yet. However Visual Studio has added sufficient functionality in their latest 2019 developer preview that an implementation in Meson has become feasible. The actual code can be found in this merge request for those brave enough to try it out.

The basic problem with C++ modules is the same as with Fortran modules: you can no longer build source files in an arbitrary order. Instead you have to scan the contents of files, see what modules each source file generates and consumes and orchestrate the build so that all source files that produce modules are built before any source files that consume them. This requires dynamic dependency generation that has been added to Ninja only fairly recently.

The original idea was that compiler toolchain vendors would provide scanner binaries because parsing C++ code is highly unreliable due to C preprocessor macro shenanigans. It turns out that a "toolchain provided" dependency scanner can not obtain all necessary data reliably, because it requires higher level knowledge about the project setup. This can only be done reliably by the build system. An alternative would be to pass all this information to the compiler/scanner via compiler flags but that turns out to be a terrible thing to define and maintain stable over changes. It also has the downside that you'd need to spawn a single process for every file, which is fairly slow on Windows. Meson's approach is to write a custom dependency scanner. Yes, it is based on regexes, so it is not 100% reliable but on the other hand you only need to spawn one process per build target (exe, shared lib, static lib) as opposed to one per source file.

Still, the end result does work for simple projects. It does not handle things like module partitions but those can be added later. Even this simple project and test has brought about several notes and questions:

  • Where should the generated module files be put? In the target private dir? In a global dir? If the latter, what happens if two unrelated parts in the code base specify the same module?
  • Microsoft has not documented the module compiler flags and cl /? does not even list them. Because of this all module files get dumped to the build directory root.
  • Only ixx files are supported. VS does not enforce file name extensions. I would really want to enforce module file name extensions to only one. We can't change change legacy code and force everyone to use a single extension for C++ source, but we totally should do that for new ones. Having to support many file name extensions for the same thing is madness.
Sadly I don't have any numbers on how much modules improve compilation speed. Feel free to try it out yourself, though. Bug reports and especially fixes are welcome.