Sunday, January 29, 2017

Testing exception vs error code behaviour with real world code

Our previous looks at exception performance have used microbenchmarks and generated code (here and here). These are fun and informative tests but ultimately flawed. What actually matters is performance on real world code. The straightforward way of measuring this is to convert a program using error codes into one using exceptions. This is harder than it seems.

The problem is that most code using error codes is not exception safe so this change can not be done easily (I tried, would not recommend). Fortunately there is a simple solution: going the other way. A fully exception safe code base can be easily converted into one using GLib style error objects with the following simple steps:

  1. replace all instances of throw with a call to a function such as Error* create_error(const char *msg)
  2. replace all catch statements with equivalent error object handling code
  3. alter the function signature of all functions creating or using error objects from void func() to void func(Error **e)
  4. change every call site to pass an error object and check if it points to non-null after the call, exit immediately if so
  5. repeat steps 4 and 5 recursively until compiler errors go away
This implements the exception code flow with explicit hand-written code

For testing we used Parzip, a high performance PkZip implementation. For simplicity we removed all parallel code and also the parts that deal with Zip file creation. Thus we ended up with a simple single threaded Zip file unpacker. The full code is available in this repository.

Source code size

LOC is a poor metric in general but illuminating in this case, because it directly shows the difference between these two approaches. The measurement is done with wc and if we ignore the license header on each file we find that the implementation using exceptions is 1651 lines and the one with error objects has 1971 lines. This means that error codes have roughly 19% more lines of source code than the equivalent code using exceptions. 

Looking at the code it is easy to see where this difference comes from. As an extreme example there is this function that reads data from file with endianness swapping:

localheader read_local_entry(File &f) {
    localheader h;
    uint16_t fname_length, extra_length;
    h.needed_version = f.read16le();
    h.gp_bitflag = f.read16le();
    h.compression = f.read16le();
    h.last_mod_time = f.read16le();
    h.last_mod_date = f.read16le();
    h.crc32 = f.read32le();

    /* And so on for a while. */
    return h;
}

And this is how it looks when using error objects:

localheader read_local_entry(File &f, Error **e) {
    localheader h;
    uint16_t fname_length, extra_length;
    h.needed_version = f.read16le(e);
    if(*e) {
        return h;
    }
    h.gp_bitflag = f.read16le(e);
    if(*e) {
        return h;
    }
    h.compression = f.read16le(e);
    if(*e) {
        return h;
    }
    h.last_mod_time = f.read16le(e);
    if(*e) {
        return h;
    }
    h.last_mod_date = f.read16le(e);
    if(*e) {
        return h;
    }
    h.crc32 = f.read32le(e);
    if(*e) {
        return h;
    }

    /* And so on and so on. */
    return h;
}

This example nicely shows the way that exceptions can make code more readable. The former sample is simple, straightforward, linear and understandable code. The latter is not. It looks like idiomatic Go (their words, but also mine). Intermixing the happy path with error handling makes the code quite convoluted.

Binary size

We tested the size of the generated code with GCC 6.2.0, Clang 3.8.1 and with Clang trunk. We built the code on 64 bit Ubuntu 16/10 using -g -O2 and the error code version was built with and without -fno-exceptions. The results look like this.
The results are, well, weird. Building with -fno-exceptions reduces code size noticeably in every case but the other parts are odd. When not using -fno-exceptions, the code that uses exceptions is within a few dozen bytes within the size of the code that uses error objects. GCC manages to make the error code version smaller even though it has the abovementioned 19% more lines of code. Some of this may be caused by the fact that -fno-exceptions links in less stuff from the C++ standard library. This was not researched further, though.

Looking at the ELF tables we find that the extra size taken by exceptions goes in the text segment rather than data segment. One would have expected it to have gone to unwind tables in a readonly data segment instead of text (which houses runnable code).

Things get even weirder with noexcept specifications. We added those to all functions in the exception code base which do not take an error object pointer in the error code version (meaning they will never throw). We then measured the sizes again and found that the resulting binaries had exactly the same size. On all three compilers. One would have expected some change (such as smaller exception unwind tables or anything) but no.

What have we learned from all this?

Mainly that things have more complex behavior than one would expect. The binary sizes especially are unexpected, especially the way Clang produces the same binary size for both error objects and exceptions. Even with this contradictory information we can make the following conclusions:
  • using exceptions can cause a noticeable reduction in lines of code compared to the equivalent functionality using error objects
  • compiling with -fno-exceptions can reduce binary size by 10-15%
Other than that these measurements really raise more questions than they answer. Why does noexcept not change output size? Why does Clang produce the same code size with exceptions and error objects? Are exception performance and code size fully optimized or could they be made smaller (this area probably has not had as much work done because many compiler contributors do not use exceptions internally)? 

2 comments:

  1. Nice article!
    For noexcept:
    "We then measured the sizes again and found that the resulting binaries had exactly the same size."
    Are they really the same on the provided hist?

    ReplyDelete
  2. Can you measure performance in addition to binary size?

    ReplyDelete