Tuesday, October 11, 2016

Unix Userland as C++ Coroutines

The Unix userland concept is simple. It has standalone programs that do only one thing which are then chained together. A classical example is removing duplicates from a file, which looks like this:

cat filename.txt | sort | uniq

This is a very powerful concept. However there is a noticeable performance overhead. In order to do this, the shell needs to launch three different processes. In addition all communication between these programs happens over Unix pipes, meaning a round trip call into the kernel for each line of text transferred (roughly, not an exact number in all cases). Even though processes and pipes are cheap on Unix they still cause noticeable slowdown (as anyone who has run large shell scripts knows).

Meanwhile there has been a lot of effort to put coroutines into C++. A very simple (and not fully exhaustive) way of describing coroutines is to say that they basically implement the same functionality as Python generators. That is, this Python code:

def gen():
    yield 1
    yield 2
    yield 3

could be written in C++ like this (syntax is approximate and might not even compile):

int gen() {
    co_yield 1;
    co_yield 2;
    co_yield 3;

If we look more closely into the Unix userland specification we find that even though it is (always?) implemented with processes and pipes, it is not required to. If we use C's as if rule we are free to implement shell pipelines in any way we want as long as the output is the same. This gives us a straightforward guide to implement the Unix userland in C++ coroutines.

We can model each Unix command line application as a coroutine that reads input data, either stdout or stderr, one line at a time, operates on it and outputs the result in the same way. Then we parse a shell pipeline into its constituent commands, connect the output of element n as the input of element n + 1 and keep reading the output of the last pipeline element until EOF is reached. This allows us to run the entire pipeline without spawning a single process.

A simple implementation with a few basic commands is available on Github. Unfortunately real coroutines were not properly available on Linux at the time of writing, only a random nonmerged branch of Clang. Therefore the project contains a simple class based implementation that mimics coroutines. Interested readers are encouraged to go through the code and work out how much boilerplate code a proper coroutine implementation would eliminate.


This implementation does not handle background processes/binary data/file redirection/invoking external programs/yourthinghere!

That is correct.

Isn't this just a reinvented Busybox?

Yes and no. Busybox provides all executables in a single binary but still uses processes to communicate between different parts of the pipeline. This implementation does not spawn any processes.

Couldn't you run each "process" in its own thread to achieve parallelism?

Yes. However I felt like doing this instead.