tiistai 27. joulukuuta 2016

Comparing executable size of C++ exceptions vs plain C error structs

One of the Grand Pieces of Hacker Folklore is that you should not use C++ exceptions because they cause massive code bloat. Instead you should use error codes as exemplified by plain C, Go, Rust and other languages. These opinions have caused many a heated debates both on and off the Internet.

One of the problems is that accurately measuring the size difference of code that uses exceptions vs one that uses error codes is hard to do properly. Legend has it that when Symbian developers were considering whether to add exception support, they compiled a helloworld application (that has no error paths at all), noted that with exceptions enabled it was somewhat bigger and declared this unacceptable. What they did not consider is that this increase in code size was constant, not linear to the code size. They then went on to code a completely nonstandard cleanup mechanism on top of C++, which was both terrible and bloaty.

A more methodological approach

There is really only one way to fully reliably test whether error codes or exceptions produce smaller code. That is to take a large program that uses one or the other mechanism and convert it to use the other without changing anything else. This can take months of work and is obviously not feasible in the real world but would make for an interesting research project.

Instead we can simulate the problem. It is relatively easy to generate code with the same functionality in C and in C++ and compare their performance. Let's create a function that simulates an simple work load. First it grabs some resource, then calls some other functions and finally returns a result. In simple C++ it would look like this:

int func222() {
    DummyObject dobj;
    int a = func227();
    int b = func224();
    int c = func228();
    return a + b + c;
}

Here DummyObject is just a simple object that is just there so the exception unwinding mechanism has something to destroy. Its constructor and destructor are defined elsewhere so that the optimizer can not remove it completely. We create 1000 of these functions where each one calls into three random functions with a higher number. Calls to functions bigger than 1000 are done to function 1000 instead. Function 1000 just returns 1. This program does not do anything useful and its runtime is exponential, so you should probably not run it. 

In plain C using glib style error "objects" the function looks like this:

int func222(char **error) {
    int a, b, c;
    struct Dummy *d = dummy_new();

    a = func227(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    b = func224(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    c = func228(error);
    if(*error) {
        dummy_delete(d);
        return -1;
    }
    dummy_delete(d);
    return a + b + c;
}

The functionality is the same, we just need to manually check whether the function calls succeeded and deallocate the resource on every path. The code to generate and run this code is downloadable from this Github repo.

The results

We measure three different compilation methods. Plain C, plain C++ and C++ with -fno-exceptions. The last one represents the use case where any error is immediately fatal and thus recovery is pointless. We tested with a bunch of compilers. C++ version was 14 except on GCC 4.8.4 which does not support it so C++11 was used instead. C version was gnu11. All tests were done on AMD64 architecture except one which was done on 32 bit ARM (Raspberry Pi 2 + Debian). -O2 and -g were used for optimization and debug info. Here are the sizes of the resulting stripped binaries.


The results are surprising in many ways. Perhaps the biggest one is that using C++ and exceptions produces a smaller binaries than plain C every single time. The closest is Clang but even there C binaries take over 6% more space. Noexcept is always the smallest, but this is expected as it has no error recovery mechanism at all.

Another thing to note is that GCC seems to have had a big C++ code size increase between versions 4.9.2 and 6.2.0. Clang seems to produce a lot smaller C binaries than GCC. For C++ the difference is there but the gap is not as big.

What is causing this?

I don't know for sure. It would be interesting to hear from Clang/GCC developers what is happening here and what actually happens under the covers. But basically what it (probably) boils down to is that the compiler has more freedom to optimize the steps needed for stack unwinding. When the user does deallocation manually, the compiler needs to obey the code paths they have written even if they are not optimal). Autogenerated stack unwinding just needs to call a few destructors in any way it wants. Optimizing this is tedious work, the kind best left to a machine.

Conclusions

This is only one aspect of the (seemingly) eternal debate between error codes and exceptions. It does not prove (I say again, does not) that one is faster than the other, or even that one always produces more efficient or smaller code than the other. However what we can say is that there are cases where converting from error codes into full exceptions makes the final binary smaller. It would be interesting to know if this is true for other code bases as well. That requires a lot more testing, though.

The Github repo contains more stuff for those interested in going deeper. For example it contains a version of the C code that is a bit more size optimized (though a bit less readable) that only has error codes rather than full error structs. It is still bigger than the straightforward C++ implementation.

maanantai 19. joulukuuta 2016

What is the global cost of Autoconf?

Waiting for autoconf to finish is annoying. But have you ever considered how much time and effort is wasted every day doing that? Neither have I, but let's estimate it anyway.

First we need an estimate of how many projects are using Autotools. 100 is too low and 10 000 is probably too high, so let's say 1000. We also need an assumption of how long an average autoconf run takes. Variability here is massive ranging from 30 seconds to (tens of) minutes on big projects on small hardware. Let's say an average of one minute as a round number.

Next we need to know how many times builds are run during a day. Some projects are very active with dozens of developers whereas most have only one person doing occasional development. Let's say each project has on average 2 developers and each one does an average of 4 builds per day.

This gives us an estimate of how much time is spent globally just waiting for autoconf to finish:

1000 proj * 2 dev/proj * 2 build/dev * 1 minute/build =
   4000 minutes = 66 hours

66 hours. Every day. If you are a business person, feel free to do the math on how much that costs.

But it gets worse. Many organisations and companies build a lot of projects each day. As an example there are hundreds of companies that have their own (embedded) Linux distribution that they do a daily build on. Linux distros do rebuilds constantly and so on. If we assume 10 000 organisations that do a daily build and we do a lowball estimate of 5 dependencies per project (many projects are not full distros, but instead build a custom runtime package or something similar), we get different numbers:

10 000 organisations * 1 build/organisation * 5 dependencies/build * 1 min/dependency = 50 000 minutes = 833 CPU hours

That is, every single day over 800 CPU hours are burned just to run autoconf. Estimating CPU consumption is difficult but based on statistics and division we can estimate an average consumption of 20 Watts. This amounts to 16 kWh every day or, assuming 300 working days a year, 5 MWh a year.

That is a lot of energy to waste just to be absolutely sure that your C compiler ships with stdlib.h.

lauantai 17. joulukuuta 2016

On reversing btrees in job interviews

Hiring new people is hard. No-one has managed to find a perfect way to separate poorly performing job applicants from the good ones. The method used in most big companies nowadays is the coding interview. In it the applicant is given some task, such as reversing a btree, and they are asked to write the code to do that on a whiteboard without a computer.

At regular intervals this brings up Internet snarkiness.



The point these comments make are basically sound, but they miss the bigger point which is this:

When you are asked to reverse a btree on a whiteboard, you are not being evaluated based on whether you can reverse a btree on a whiteboard.

Before we look into what you are evaluated on, some full disclosure may be in order. I have interviewed for Google a few times but each time they have chosen not to hire me. I have not interviewed people with this method nor have I received any training in doing so. Thus I don't know what aspects of interview performance are actually being evaluated, and presumably different companies have different requirements. These are just estimates which may be completely wrong.

Things tested before writing code

Getting an onsite interview is a challenge in itself. You have to pass multiple phone screens and programming tests just to get there. This test methodology is not foolproof. Every now and then (rarely, but still) a person manages to slip through the tests even if they would be unqualified for the job. Starting simple is an easy way to make sure that the applicant really is up to the challenge.

The second big test is one of attitude. While we all like to think that programming is about having fun solving hard problems with modern functional languages, creating world altering projects with ponies, rainbows and kittens, reality is more mundane. Over 80% of all costs in a software project are incurred during maintenance. This is usually nonglamorous and monotonous work. Even new projects have a ton of “grunt work” that just needs to be done. The attitude one displays when asked to do this sort of menial task can be very revealing. Some people get irritated because such a task is “beneath their intelligence”. These people are usually poor team players who you definitely do not want to hire.

The most important aspect of team work is communication. Yes, even more important than coding. Reversing a btree is a simple problem but the real work is in explaining what is being done and why. It is weird that even though communication is so important, very little effort is spent in teaching it (in many schools it is not taught at all). Being able to explain how to implement this kind of simple task is important, because a large fraction of your work consists of describing your solutions and ideas to other people. It does not matter if you can come up with the greatest ideas in the world if you can not make other “get” them.

Writing the code

Writing the code to reverse a btree should not take more than a few minutes. It is not hard, anyone who has done basic CS courses can get it done in a jiffy. But writing the code is an almost incidental part of the interview, in fact it is just a springboard to the actual problem.

Things tested after the code is written

Once the basic solution is done, you will get asked to expand it. For a btree questions might include:

  • If you used stack based iteration, what if the btree is deeper than there is stack space?
  • What if you need to be able to abort the reversal and restore the original tree given an external signal during reversal?
  • What if the btree is so large it won't fit in RAM? Or on one machine?
  • What if you need to reverse the tree while other threads are reading it?
  • What if you need to reverse the three while other threads are mutating it?
  • And so on.

As we see even the most seemingly simple problem can be made arbitrarily hard. The point of the original question is not to test if you can reverse the tree, but find how far you can go from there and where is the limit of what you don't know. If you come out of an interview where you are asked to reverse a tree and the only thing you do is reverse the tree, you have most likely failed.

When the edge of knowledge has been reached comes the next evaluation point: what do you do when you don't know what to do. This is important because a large fraction of programming is solving problems you have not solved before. It can also be very frustrating because it goes against the natural problem solving instinct most people have. Saying “honestly at this point I would look this up on Google” gives you a reply of “Suppose you do that and find nothing, what do you do?”.
Different people react in different ways in this situation. These include:

  • freezing up completely
  • whining
  • trying random incoherent things
  • trying focused changes
  • trying a solution that won't solve the problem fully but is better than the current solution (and explicitly stating this before starting work)

This test is meant to reveal how the person works under pressure. This is not a hard requirement for many jobs but even then those who can keep a cool head under stressing circumstances have a distinct advantage to those who can't.

Points of criticism

There are really only two reliable ways of assessing a candidate. The first one is word of mouth from someone who has worked with the person before. The second one is taking the candidate in and make them work with the team for a while. Neither of these approaches is scalable.

The coding interview aims to be a simplification of this process, where the person is tested on things close to “real world” problems. But it is not the real world, only a model. And like all models, it is a simplification that leaves some things out and emphasizes other things more than they should.
The above mentioned lack of Internet connectivity is one. It can be very frustrating to be denied the most obvious and simplest approach (which is usually the correct approach) for a problem for no seemingly good reason. There actually is a reason: to test behavior when information is not available, but it still may seem arbitrary and cold, especially since looking up said information usually yields useful data even if the exact solution is not found.

A different problem has to do with thinking. The interviewers encourage the applicant to think out loud as much as possible. The point of this is to see the thought process, what approaches they are going through and what things they are discarding outright. The reasoning behind this is sound but on the other hand it actively makes thinking about the problem harder for some people.

Every time you start explaining your thought process you have to do a mental context switch and then another one to go back to thinking. Brain research has shown us that different people think in very different ways. Talking while pondering a problem is natural for some people. It is very taxing for others. This gives a natural advantage to people who talk while thinking but there is not, as far as I know, any research results that these sort of people perform better at a given job.

Mental context switches are not the most problematic thing. For some people explaining their own thought process can be impossible. I don't know how common this is (or it it's even a thing in general) but at least for me personally, I can't really explain how I solve problems. I think about them but I can't really conceptualize how the solution is reached. Once a solution presents itself, explaining it is simple, but all details about the path that lead there remain in the dark.

People who think in this manner have a bit of an unfortunate handicap, because it is hard to tell the difference between a person thinking without speaking and a person who has just frozen with no idea what to do. Presumably interviewers are taught to notice this and help the interviewee but since all books and guidelines emphasize explaining your thought process, this might not always succeed perfectly.

But does it really matter?

That's the problem with human issues. Without massive research in the issue we don't know.

perjantai 2. joulukuuta 2016

What does configure actually run

One of the claimed advantages of Autotools is that it depends (paraphrasing here) "only on make + sh". Some people add sed to the list as well. This is a nice talking point, but it is actually true? Lets' find out.

Determining this is simple, we just run the GNU Hello program's configure script under strace like this:

strace -e execve  -f ./configure 2>stderr.txt > stdout.txt

This puts all command invocations of the process and its children to stderr.txt. Then we can massage it slightly with Python and get the following list of commands.

arch
as
basename
bash
cat
cc
cc1
chmod
collect2
conftest
cp
diff
dirname
echo
expr
file
gawk
gcc
getsysinfo
grep
hostinfo
hostname
install
ld
ln
ls
machine
make
mkdir
mktemp
msgfmt
msgmerge
mv
oslevel
print
rm
rmdir
sed
sh
sleep
sort
tr
true
uname
uniq
universe
wc
which
xgettext

This list contains a total of 49 different commands including heavyweights such as diff and gawk.

Thus we find that the answer to the question is no. Configure requires a lot more stuff than just shell plus make. In fact it requires a big chunk of the Unix userland implictly.

Pedantic postscriptum

It could be said that all of these programs are not strictly required and that configure could (potentially) work without them present. This is probably correct, but many of these programs provide functionality which is essential and not provided by either plain shell or Make.