Thursday, June 25, 2015

Simple structure of arrays with C++ templates

One feature that has been getting more popular recently is called the structure of arrays. As an example Jonathan Blow's new Jai programming language should have built in support for that. The basic principle is very simple. Instead of having this:

struct Foo {
  int x;
  std::string name;
  FooObj bar;
};


std::vector<Foo> items;

You'd want to have something like this:

struct Items {
  std::vector<int> x;
  std::vector<std::string> name;
  std::vector<FooObj> bar;

  struct Foo& operator[](size_t index);
};

The reason for this is better cache locality. First of all you don't get padding holes between items. The second issue is that usually when you doing operations on elements of these arrays you only touch a subset of items. This data layout reduces cache misses as you can skip a large portion of data.

In C++ terms you would like to have a declaration like the following:

Soa<int, std::string, FooObj> items;

This would then create a data structure like the one discussed above with all basic operations such as push_back, capacity and so on. Then you could do things like items[0].name = "hello" or std::sort(items.begin(), items.end(), mypredicate).

C++ does not support this natively but with templates you can create this yourself. There are some things that you can't do (such as accessing elements by name) but other bits are fairly straightforward. The code for this example is on github. For simplicity this implementation only has two items in the struct and they must have the same type. Doing arbitrary items would require working with tuples and other fun stuff that is beyond my template-fu.

The core class contains the two vectors that hold the data. The core building block is this helper struct:

template<typename T>
struct SoaItemRef {
    T &item1;
    T &item2;
    // rest omitted for clarity
};

Whenever we need to access the underlying data we return this struct instead. The refs are initialised to point to the underlying data so all reads and writes get proxied to the actual data. Then there are a bunch of helper methods to assignments. In addition we need iterator definitions. Those can be found in the actual code.

This single struct gets you most functionality. You just need a swap method that does elementwise value copying between SoaItemRefs. The second addition is a value only type that looks like this:

template<typename T>
struct SoaItem {
    T item1;
    T item2;
    // stuff such as comparison operators omitted
};

This allows you to do things like std::find(items.begin(), items.end(), SoaItem<int>{1, 2}).

Missing functionality

There's quite a lot of it. The most obvious thing is that you should be able to specify an arbitrary amount of arbitrary types. The problem here is that this gets really complicated really fast. Another niggle is that the helper structs are standalone where they should actually be inner classes of Soa. I tried to make this happen for quite a while but I could not make std::swap resolve to these inner types. Moving them to standalone classes made everything magically work.

It is possible that you can't do items[2].name = "foo" without resorting to the preprocessor and possibly not even then. It's not going to be pretty in any case. Feel free to try it out, though.