Saturday, February 23, 2019

What if everything you know is wrong?

In interesting intellectual design challenge is to take a working thing (library, architecture, etc) and then see what would happen if you would reimplement it with the exact opposite way. Not because you'd use the end result anywhere, but just to see if you can learn something new. Or, in other words:
Screenshot of Marble Madness Silly Race level with text "Everything you know is wrong".
As an example let's apply this approach to C++'s fixed size array object or std::array. Some of its core design principles include:
  1. Indexing is by default unsafe, user is responsible for providing valid indexes.
  2. Errors are reported via exceptions.
  3. Iterators may be invalid and invoking invalid iterators is allowed but UB.
  4. Indexing must be exactly as fast as accessing a C array.
Inverting all of these give us the following design principles:
  1. Out of bound accesses must be impossible.
  2. No exceptions (assuming contained objects do not throw exceptions).
  3. All iterator dereferences are guarded and may never lead to bad accesses.
  4. It's ok to use some (but not much) processor time to ensure safety. Aim for zero overhead when possible.

So how does it look like?

An experimental PoC implementation can be found in this Github repo. Note that the code is intentionally unpolished. There are silly choices made. That is totally fine, it's not meant for actual use, only to explore the problem space.

The most important operation for an array type is indexing. It must work for all index values, even for those out of bounds. As no exceptions are allowed, the natural way to make this work is to return a Maybe type. This could be a std::optional<std::reference_wrapper<T>>, but for reasons that will become apparent later, we use a custom type. The outcome for this is kind of nice, allowing you to do things like (assuming an array of size 20):

assert(x[0]);       // check if index is valid.
assert(!x[20]);     // OOB indexes are invalid
*x[0] = 5           // assignment
assert(*x[0] == 5); // dereference
int i = *x[20];     // aborts program

The overall usability is kind of nice. This is similar to languages that have int? variables. The biggest problem here is that there is no way to prevent dereferencing an invalid maybe object, leading to termination. A typical bug would look like this:

if(maybe) {
  *maybe = 3;
} else {
  *maybe = 4; // Legal code but should not be.
}

There are at least three possible architectural ways to solve this:
  1. Pattern matching (a switch/case on the object type) with language enforcement.
  2. A language construct for "if attempting to use an invalid maybe object, exit the current function with an error". There have been talks of a try statement that would do something like this.
  3. Maybes can not be dereferenced, only called with a method like visit(if_valid_function, if_not_valid_function).
#3 can be done today but is tedious and still does not permit automatically returning an error from the current function block.

Iteration

Creating a safe iterator is fairly simple. This iterator has a pointer to the original object and an integer offset. Dereferencing it calls the indexing operation and returns the maybe to the caller. This works fine until you test it with std::sort and after a lot of debugging find out that the implementation has a code block looking like this:

T tmp = std::move(a);
a = std::move(b);
b = std::move(tmp);

Even if you have swap defined for your objects, it will call this at some point. The problem here is that since the temporary does not point to any existing object, it can not store the value of the first move. There does not seem to be a good solution for this. Swap-only types might work, but such a type can not be defined with C++'s type system (or at least std::sort can not handle such types). The solution used in this example is that if a maybe object is created so that it does not point to any existing object, it stores objects inside itself. This works (for this use case) but feels a bit iffy.

Another problem this exposes is that the STL does not really work with these kinds of types. The comparison function returns a boolean, but if you compare (for whatever reason) invalid iterators and don't use exceptions there should be three different return values: true, false and erroneous_comparison. That is, an std::optional<bool>. Fixing this properly would mean changing std::sort to handle failing comparisons.

But what about performance?

For something like sorting, checking every access might cause a noticeable performance hit. If you do something like this:

std::sort(coll.begin(), coll.end())

it is fairly easy to verify that the range is valid and all thus accesses will be valid (assuming no bugs in the sort implementation). It would be nice to be able to opt out of range checks in these cases. Here's one way of doing it:

std::sort(coll.unsafe().begin(), coll.unsafe().end())

Here unsafe is a helper class that returns raw pointers to the underlying object. In this way individual accesses (which are bug prone) are guarded but larger batch operations (which are safer) can optionally request a faster code path. The unsafe request is immediately apparent and easily greppable.

Is this fully safe?

No. There are at least three different ways of getting an invalid reference.
  1. Deleting the main object while iterators/maybes are outstanding. Would require tracking all outstanding objects. Probably too heavyweight. In Rust this is handled by the borrow checker at compile time.
  2. Unallocating the memory backing the object without running its destructor. This is actually legal. Can't be guarded against.
  3. Changing the object's backing memory to read only with a syscall. Can't be guarded against.

1 comment:

  1. I was hoping you'd implement a Python style indexing (with a twist) where all indexes are actually valid, but higher indexes just wrap around to the start. Oh and negative indexes just count from the end. So for x = "abcdefg", x[0]=="a", x[8]=="b", x[-1]=="g" and so on :)

    ReplyDelete