Sunday, April 10, 2016

Testing performance of build optimisations

This blog post examines the performance effect of various compiler options. The actual test consists of compressing a 270 megabyte unpacked tar file with zlib. For training data of profile guided optimisation we used a different 5 megabyte tar file.

All measurements were done five times and the fastest time of each run was chosen. Both the zlib library and the code using it is compiled from scratch. For this we used the Wrap dependency system of the Meson build system. This should make the test portable to all platforms (we tested only Linux + GCC). The test code is available on Github.

We used two different machines for testing. The first one is a desktop machine with Intel i7 running latest Ubuntu and GCC 5.2.1. The other machine is a Raspberry Pi 2 with Raspbian and GCC 4.9.2. All tests were run with basic optimization (-O2) and release optimization (-O3).

Here are the results for the desktop machine. Note the vertical scale does not go down to zero to make the differences more visible. The first measurement uses a shared library for zlib, all others use static linking.

Overall the differences are quite small, the difference between the fastest and slowest time is roughly 4%. Other things to note:

  • static libraries are noticeably faster than shared ones
  • -O2 and -O3 do not have a clear performance order, sometimes one is faster, sometimes the other
  • pgo seems to have a bigger performance impact than lto
The times for Raspberry Pi look similar.

The performance difference between the fastest and slowest options is 5%, which is roughly the same as on the desktop. On this platform pgo also has a bigger impact than lto. However here lto has a noticeable performance impact (though it is only 1%) as opposed to the desktop, where it was basically unmeasurable.

No comments:

Post a Comment