Tuesday, December 27, 2016

Comparing executable size of C++ exceptions vs plain C error structs

One of the Grand Pieces of Hacker Folklore is that you should not use C++ exceptions because they cause massive code bloat. Instead you should use error codes as exemplified by plain C, Go, Rust and other languages. These opinions have caused many a heated debates both on and off the Internet.

One of the problems is that accurately measuring the size difference of code that uses exceptions vs one that uses error codes is hard to do properly. Legend has it that when Symbian developers were considering whether to add exception support, they compiled a helloworld application (that has no error paths at all), noted that with exceptions enabled it was somewhat bigger and declared this unacceptable. What they did not consider is that this increase in code size was constant, not linear to the code size. They then went on to code a completely nonstandard cleanup mechanism on top of C++, which was both terrible and bloaty.

A more methodological approach

There is really only one way to fully reliably test whether error codes or exceptions produce smaller code. That is to take a large program that uses one or the other mechanism and convert it to use the other without changing anything else. This can take months of work and is obviously not feasible in the real world but would make for an interesting research project.

Instead we can simulate the problem. It is relatively easy to generate code with the same functionality in C and in C++ and compare their performance. Let's create a function that simulates an simple work load. First it grabs some resource, then calls some other functions and finally returns a result. In simple C++ it would look like this:

int func222() {
    DummyObject dobj;
    int a = func227();
    int b = func224();
    int c = func228();
    return a + b + c;
}

Here DummyObject is just a simple object that is just there so the exception unwinding mechanism has something to destroy. Its constructor and destructor are defined elsewhere so that the optimizer can not remove it completely. We create 1000 of these functions where each one calls into three random functions with a higher number. Calls to functions bigger than 1000 are done to function 1000 instead. Function 1000 just returns 1. This program does not do anything useful and its runtime is exponential, so you should probably not run it. 

In plain C using glib style error "objects" the function looks like this:

int func222(char **error) {
    int a, b, c;
    struct Dummy *d = dummy_new();

    a = func227(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    b = func224(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    c = func228(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    dummy_delete(d);
    return a + b + c;
}

The functionality is the same, we just need to manually check whether the function calls succeeded and deallocate the resource on every path. The code to generate and run this code is downloadable from this Github repo.

The results

We measure three different compilation methods. Plain C, plain C++ and C++ with -fno-exceptions. The last one represents the use case where any error is immediately fatal and thus recovery is pointless. We tested with a bunch of compilers. C++ version was 14 except on GCC 4.8.4 which does not support it so C++11 was used instead. C version was gnu11. All tests were done on AMD64 architecture except one which was done on 32 bit ARM (Raspberry Pi 2 + Debian). -O2 and -g were used for optimization and debug info. Here are the sizes of the resulting stripped binaries.


The results are surprising in many ways. Perhaps the biggest one is that using C++ and exceptions produces a smaller binaries than plain C every single time. The closest is Clang but even there C binaries take over 6% more space. Noexcept is always the smallest, but this is expected as it has no error recovery mechanism at all.

Another thing to note is that GCC seems to have had a big C++ code size increase between versions 4.9.2 and 6.2.0. Clang seems to produce a lot smaller C binaries than GCC. For C++ the difference is there but the gap is not as big.

What is causing this?

I don't know for sure. It would be interesting to hear from Clang/GCC developers what is happening here and what actually happens under the covers. But basically what it (probably) boils down to is that the compiler has more freedom to optimize the steps needed for stack unwinding. When the user does deallocation manually, the compiler needs to obey the code paths they have written even if they are not optimal). Autogenerated stack unwinding just needs to call a few destructors in any way it wants. Optimizing this is tedious work, the kind best left to a machine.

Conclusions

This is only one aspect of the (seemingly) eternal debate between error codes and exceptions. It does not prove (I say again, does not) that one is faster than the other, or even that one always produces more efficient or smaller code than the other. However what we can say is that there are cases where converting from error codes into full exceptions makes the final binary smaller. It would be interesting to know if this is true for other code bases as well. That requires a lot more testing, though.

The Github repo contains more stuff for those interested in going deeper. For example it contains a version of the C code that is a bit more size optimized (though a bit less readable) that only has error codes rather than full error structs. It is still bigger than the straightforward C++ implementation.

5 comments:

  1. G++ doesn't actually emit any code for stack unwinding! Instead, the compiler translates try/catch/throw into a few function calls, add some extra debug info, and otherwise generates straight-line code as if exceptions didn't exist.

    When an exception gets thrown, the stack is unwound *at runtime* by platform-specific code in libsupc++. At this point, it's pretty similar to what gdb would do: looking at the program state, and special sections in the binary, figure out what function you're in and what the local variables are, etc.

    Therefore:
    - Code compiled with exceptions is smaller and faster than it would be with manual error checking.
    - Handling an exception is much slower than manual error recovery.
    - If you mix code compiled with&without exceptions or strip the ".eh_frame" section out of the binary, things will break horribly.

    Read more: http://www.airs.com/blog/archives/460 http://www.airs.com/blog/archives/462 http://www.airs.com/blog/archives/464

    ReplyDelete
    Replies
    1. By "stack unwinding" I actually meant that table based runtime thing. Thanks for the extra details.

      Delete
  2. I loved your blogpost and I'm an advocate of proper automated error handling (rather than manual). However it feels that nowadays, it's more important the performance impact of error handling than the size of the binaries, IMHO.

    ReplyDelete
  3. Could you repeat your test compiling everything statically?. because we tend to forget that C++ runtime is way bigger that C runtime, and maybe in this particular example, most of the meat is (in the case of C++) inside of the runtime rather than the executable itself.

    ReplyDelete
    Replies
    1. That would not really be a fair experiment since it brings in a ton of stuff that is not relevant to this test. We are testing the code size difference of errors vs exceptions, not the absolute size of everything.

      Delete