Wednesday, February 24, 2021

Millennium prize problems but for Linux

There is a longstanding tradition in mathematics to create a list of hard unsolved problems to drive people to work on solving them. Examples include Hilbert's problems and the Millennium Prize problems. Wouldn't it be nice if we had the same for Linux? A bunch of hard problems with sexy names that would drive development forward? Sadly there is no easy source for tens of millions of euros in prize money, not to mention it would be very hard to distribute as this work would, by necessity, be spread over a large group of people.

Thus it seems is unlikely for this to work in practice, but that does not prevent us from stealing a different trick from mathematicians' toolbox and ponder how it would work in theory. In this case the list of problems will probably never exist, but let's assume that it does. What would it contain if it did exist? Here's one example I came up with. it is left as an exercise to the reader to work out what prompted me to write this post.

The Memory Depletion Smoothness Property

When running the following sequence of steps:
  1. Check out the full source code for LLVM + Clang
  2. Configure it to compile Clang and Clang-tools-extra, use the Ninja backend and RelWithDebInfo build type, leave all other settings to their default values
  3. Start watching a video file with VLC or a browser
  4. Start compilation by running nice -19 ninja
The outcome must be that the video playback works without skipping a single frame or audio sample.

What happens currently?

When Clang starts linking, each linker process takes up to 10 gigabytes of ram. This leads to memory exhaustion, flushing active memory to swap and eventually crashing the linker processes. Before that happens, however, every other app freezes completely and the entire desktop remains frozen until things get swapped back in to memory, which can take tens of seconds. Interestingly all browser tabs are killed before linker processes start failing. This happens both with Firefox and Chromium.

What should happen instead?

The system handles the problematic case in a better way. The linker processes will still die as there is not enough memory to run them all but the UI should never noticeably freeze. For extra points the same should happen even if you run Ninja without nice.

The wrong solution

A knee-jerk reaction many people have is something along the lines of "you can solve this by limiting the number of linker processes by doing X". That is not the answer. It solves the symptoms but not the underlying cause, which is that bad input causes the scheduler to do the wrong thing. There are many other ways of triggering the same issue, for example by copying large files around. A proper solution would fix all of those in one go.

9 comments:

  1. How many swap do you have? I've always thought swap on an interactive system should be sized by the time it takes to page it back to RAM. With regards to large IO process, perhaps interactive processes should be ioniced above batch processes.

    ReplyDelete
    Replies
    1. The point of the article was this should work out of the box with a default install of the distro. No fiddling. No tweaking. No editing of any config files. Nothing. Just things working out of the box.

      Delete
    2. IMHO, "large swaps and interactivity" is an unsolvable problem. At some point, if you swap out a lot of stuff, you're in for a world of pain. But the installer could handle this (or a daemon), by choosing adequate sizing. The laptop I'm using now has a default install and it gets a 16gb swap file (I have 32gb of RAM). I don't think that much swap can cause anything but problems.

      About ionicing- yes, you're right :)

      Delete
  2. The good news is, there has been a lot of progress!

    Thanks to cgroup, we can do most of what is needed. We can prioritize one application over another, and we can prevent a task with a lot of processes to effectively fork-bomb the system.

    There are still some rough edges. For example, IO scheduling may not work because the setup is not supported (e.g. LUKS is problematic) and because the underlying performance properties of the disk are unknown. Unfortunately, we may need to distribute performance characteristics of disks to solve part of this.

    However, overall you should be seeing improvements in this area on some distributions already. Fedora already uses uresourced to better configure cgroups (https://lwn.net/Articles/829567/) and is now also switching to systemd-oomd for improved out-of-memory handling (https://fedoraproject.org/wiki/Changes/EnableSystemdOomd).

    There will be more work needed and we hopefully will start to do more tricks in the desktop itself (e.g. prioritize focused application). However, this should have improved a lot already and work is underway for more improvements in the area.

    ReplyDelete
  3. Or possibly the build systems should learn to monitor their children and back off (including killing some of them) when such things happen.

    ReplyDelete
    Replies
    1. That is again fixing the symptoms, not the cure. There are dozens of ways to get in the same buggy state. Fixing the main issue makes all of them go away, including for those programs that will only be written sometime in the future.

      Delete
  4. Just a question (and hopefully food for thought), how is this currently handled in other major OSes? Do current Windows, macOS, Android (to the extent it can have heavy background, non-interactive load) manage this situations any better, at least when the bottleneck is just CPU, not also RAM?

    ReplyDelete
    Replies
    1. I don't know about Android, but at least Windows can be brought to a halt just by doing malloc in an eternal loop. I have not done testing on macOS, but it's fairly easy to get it lagging just with regular usage.

      Delete
    2. Oh well, so it's not like Linux is the class dunce, it's (still) a diffused problem.
      I'm a bit disappointed that even macOS is so easily derailed, I'd expect it to just slow down a bit but still have some "constant" behaviour; actually, I can remember that some years ago I was able to make some Macs almost freeze by simply running "cat /dev/urandom" in the terminal, so it all makes sense.

      Delete