Nibble Stew: Measuring execution performance of C++ exceptions vs error codes

In an earlier blog post we found out that C++ exceptions produce smaller executables than corresponding code using error codes. The most common comment to it was that executable size is not that important, performance is what matters. Since we have the tooling, let's do perf measurements as well.

Test setup

We generate code that calls a few other functions that call other functions and so on until we reach the bottommost function. In that function we either raise an exception or error or return success. This gives us two parameters to vary: the depth of the call hierarchy and what percentage of calls produce an error.

The code itself is simple. The C++ version looks like this:

int func0() {

int num = random() % 5;

if(num == 0) {

return func1();

}

if(num == 1) {

return func2();

}

if(num == 2) {

return func3();

}

if(num == 3) {

return func4();

}

return func5();

}

We generate a random number and based on that call a function deeper in the hierarchy. This means that every time we call the test it goes from the top function to the final function in a random way. We use the C random number generator and seed it with a known value so both C and C++ have the same call chain path, even though they are different binaries. Note that this function calls on average a function whose number is its own number plus three. So if we produce 100 functions, the call stack is on average 100/3 = 33 entries deep when the error is generated.

The plain C version that uses error objects is almost identical:

int func0(struct Error **error) {

int num = random() % 5;

int res;

if(num == 0) {

res = func1(error);

} else if(num == 1) {

res = func2(error);

} else if(num == 2) {

res = func3(error);

} else if(num == 3) {

res = func4(error);

} else {

res = func5(error);

}

/* This is a no-op in this specific code but is here to simulate

* real code that would check the error and based on that

* do something. We must have a branch on the error condition,

* because that is what real world code would have as well.

if(*error) {

return -1;

}

return res;

}

The code for generating this code and running the measurements can be found in this Github repo.

Results

We do not care so much about how long the executables take to run, only which one of them runs faster. Thus we generate output results that look like this (Ubuntu 16/10, GCC 6.2, Intel i7-3770):

CCeCCCCCCC

CCCCCCccec

eEcCCCeECE

cceEEeECCe

EEEcEEEeCE

EEEEEeccEE

EEEEEEEEEE

In this diagram C means that plain C error codes are faster and E that exceptions are faster for that particular pair of parameters. For easier reading all cases where error codes win have been given a red background color. Each row represents a different number of functions. The top row has 50 functions (call depth of 16) and each row adds 50 so the bottommost row has 500 functions.

The columns represent different error rates. The leftmost column has 0% errors, the next one has 1% and so on. A capital letter means that the runtime difference was bigger than 10%. Each executable was executed five times and the fastest run was taken in each case. Full results look like the following:

GCC 6.2 Clang 3.8.1 Clang 4.0 trunk

CCeCCCCCCC EecCcCcCcC ECcECcEeCC

CCCCCCccec CceEEeEcEC EEeEECEcCC

eEcCCCeECE EEEEEECEeE EEEECECeCC

cceEEeECCe EcEeEECEEE EEEEceeEeC

EEEcEEEeCE EEEeEEEEEe EEEEEEEEEE

EEEEEeccEE EEEEEEEEEE EEEEEEEEEE

EEEEEEEEEE EEEEEEEEEE EEEEEEEEEE

Immediately we see that once the stack depth grows above a certain size (here 200/3 = 66), exceptions are always faster. This is not very interesting, because call stacks are usually not this deep (enterprise Java notwithstanding). For lower depths there is a lot of noise, especially for GCC. However we can see that for small error rates exceptions are sometimes faster, especially for Clang. Because of the noise we redid the measurements several times. This changed the individual measurements but produced the same overall shape where small errors seem to be faster with exceptions but as the error rate grows error codes become faster.

On a Raspberry Pi the results look like this:

GCC 4.9.2

eeccCCCCCC

EeeeccccCC

Eeeeeecccc

Eeeeeeeccc

EEeeeeeeec

EEEEEeeeee

EEEEEEEEee

EEEEEEEEEE

Here the results are a lot clearer and consistent, probably due to the simpler architecture of the ARM processor. Exceptions are faster except for small call depths and large error rates. When using 50 functions error objects only become noticeable faster at 4% error rate.

Finally let's do the same measurements on an i5-6600K.

GCC 4.8.4 gcc 5.4.0 Clang 3.{4, 6, 8}

ECCCCCCCCCC eeccCCCCCC EeeccCCCCC

EecCCCCCCCC EEEeeeeccc EEEEEeeeee

EEecCCCCCCC EEEEEEeeee EEEEEEEEEE

EEEeecCCCCC EEEEEEEEEe EEEEEEEEEE

EEEEeecccCC EEEEEEEEEE EEEEEEEEEE

EEEEEeeeccc EEEEEEEEEE EEEEEEEEEE

EEEEEEeeeec EEEEEEEEEE EEEEEEEEEE

EEEEEEEEeee EEEEEEEEEE EEEEEEEEEE

EEEEEEEEEee EEEEEEEEEE EEEEEEEEEE

EEEEEEEEEEe EEEEEEEEEE EEEEEEEEEE

Here we see that the compiler used can make a massive difference. GCC 4.8.4 favors error codes but 5.4.0 is the opposite as are all versions of Clang (which produce identical results). On this platform exceptions seem to be even more performant than on the Pi.

The top left corner seems to be the most interesting part of this entire diagram, as it represents shallow call chains with few errors. This is similar to most real world code used in, for example, XML or JSON parsing. Let's do some more measurements focusing on this area.

i5 Raspberry Pi

GCC Clang GCC Clang

5.4.0 3.8 4.9.2 3.5.0

cCCCC eCCCC eCCCC eCCCC

ecCCC EeCCC ecCCC ecCCC

ecCCC EeccC eccCC eccCC

eccCC EEecc eecCC eecCC

eeccC EEEee eeccC eeccC

In these measurements the parameters go from 10 functions at the top row to 50 in the bottom row in increments of 10 and the colums go from 0% error on the left to 4% on the right in increments of 1. Here we see that exceptions keep their performance advantage when there are no errors, especially when using Clang, but error codes get better as soon as the error rate goes up even a little.

Conclusions

Whenever a discussion on C++ exceptions occurs, there is That Guy who comes in and says "C++ exceptions are slow, don't use them". At this point the discussion usually breaks down on fighting about whether exceptions are fast or not. These discussion are ultimately pointless because asking whether exceptions are "fast" is the wrong question.

The proper question to ask is "under which circumstances are exceptions faster". As we have seen here, the answer is surprisingly complex. It depends on many factors including which platform, compiler, code size and error rate is used. Blanket statements about performance of X as compared to Y are usually simplifying, misleading and just plain wrong. Such is the case here as well.

(Incidentally if someone can explain why i7 has those weird performance fluctuations, please post in the comments.)

Thanks to Juhani Simola for running the i5 measurements.

6 comments:

KuhankeittäjäJanuary 8, 2017 at 10:33 PM
One possible suspect for varying performance is Intel's Turbo Boost, which varies the clock speed based on load over several cores. The i5 I used for tests doesn't drop clock speed from maximum boost even under mprime torture test.
KuhankeittäjäJanuary 8, 2017 at 11:09 PM
For completeness' sake, I also ran the tests on a Macbook with i5-3230M (same generation as Jussi's i7 but a bit lower rung, questionable cooling) and something that is called "gcc" but has version string Apple LLVM version 8.0.0 (clang-800.0.42.1).

And sure, these are the patterns that showed up:

EEeEeCCCCc
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE
EEEEEEEEEE

cCCCC
EccCC
ceccC
EccCc
eEEeE
AnonymousJanuary 9, 2017 at 12:46 PM
Did you build with -fno-exceptions and -fno-rtti?

Also, is there a difference when using a switch rather than the if's? IIRC, switch has some optimisations applied but they're likely invoked by the repeated if's you're using here too.

Finally, you should probably disable hyper-threading and turbo boost in the BIOS to get consistent results.
StyxJanuary 9, 2017 at 6:54 PM
I've made a version of this that passes int pointers for errors instead of a struct, which is idiomatic of C code.

The code can be found here: https://github.com/julian-goldsmith/excspeed , and I've also made a pull request on the Git repo.

Sunday, January 8, 2017

Measuring execution performance of C++ exceptions vs error codes

Test setup

Results

Conclusions

6 comments: