Waiting for autoconf to finish is annoying. But have you ever considered how much time and effort is wasted every day doing that? Neither have I, but let's estimate it anyway.
First we need an estimate of how many projects are using Autotools. 100 is too low and 10 000 is probably too high, so let's say 1000. We also need an assumption of how long an average autoconf run takes. Variability here is massive ranging from 30 seconds to (tens of) minutes on big projects on small hardware. Let's say an average of one minute as a round number.
Next we need to know how many times builds are run during a day. Some projects are very active with dozens of developers whereas most have only one person doing occasional development. Let's say each project has on average 2 developers and each one does an average of 4 builds per day.
This gives us an estimate of how much time is spent globally just waiting for autoconf to finish:
1000 proj * 2 dev/proj * 2 build/dev * 1 minute/build =
4000 minutes = 66 hours
66 hours. Every day. If you are a business person, feel free to do the math on how much that costs.
But it gets worse. Many organisations and companies build a lot of projects each day. As an example there are hundreds of companies that have their own (embedded) Linux distribution that they do a daily build on. Linux distros do rebuilds constantly and so on. If we assume 10 000 organisations that do a daily build and we do a lowball estimate of 5 dependencies per project (many projects are not full distros, but instead build a custom runtime package or something similar), we get different numbers:
10 000 organisations * 1 build/organisation * 5 dependencies/build * 1 min/dependency = 50 000 minutes = 833 CPU hours
That is, every single day over 800 CPU hours are burned just to run autoconf. Estimating CPU consumption is difficult but based on statistics and division we can estimate an average consumption of 20 Watts. This amounts to 16 kWh every day or, assuming 300 working days a year, 5 MWh a year.
That is a lot of energy to waste just to be absolutely sure that your C compiler ships with stdlib.h.
A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build system.
Monday, December 19, 2016
Saturday, December 17, 2016
On reversing btrees in job interviews
Hiring new people is hard. No-one has managed to find a perfect way to separate poorly performing job applicants from the good ones. The method used in most big companies nowadays is the coding interview. In it the applicant is given some task, such as reversing a btree, and they are asked to write the code to do that on a whiteboard without a computer.
At regular intervals this brings up Internet snarkiness.
The point these comments make are basically sound, but they miss the bigger point which is this:
When you are asked to reverse a btree on a whiteboard, you are not being evaluated based on whether you can reverse a btree on a whiteboard.
Before we look into what you are evaluated on, some full disclosure may be in order. I have interviewed for Google a few times but each time they have chosen not to hire me. I have not interviewed people with this method nor have I received any training in doing so. Thus I don't know what aspects of interview performance are actually being evaluated, and presumably different companies have different requirements. These are just estimates which may be completely wrong.
The second big test is one of attitude. While we all like to think that programming is about having fun solving hard problems with modern functional languages, creating world altering projects with ponies, rainbows and kittens, reality is more mundane. Over 80% of all costs in a software project are incurred during maintenance. This is usually nonglamorous and monotonous work. Even new projects have a ton of “grunt work” that just needs to be done. The attitude one displays when asked to do this sort of menial task can be very revealing. Some people get irritated because such a task is “beneath their intelligence”. These people are usually poor team players who you definitely do not want to hire.
The most important aspect of team work is communication. Yes, even more important than coding. Reversing a btree is a simple problem but the real work is in explaining what is being done and why. It is weird that even though communication is so important, very little effort is spent in teaching it (in many schools it is not taught at all). Being able to explain how to implement this kind of simple task is important, because a large fraction of your work consists of describing your solutions and ideas to other people. It does not matter if you can come up with the greatest ideas in the world if you can not make other “get” them.
As we see even the most seemingly simple problem can be made arbitrarily hard. The point of the original question is not to test if you can reverse the tree, but find how far you can go from there and where is the limit of what you don't know. If you come out of an interview where you are asked to reverse a tree and the only thing you do is reverse the tree, you have most likely failed.
When the edge of knowledge has been reached comes the next evaluation point: what do you do when you don't know what to do. This is important because a large fraction of programming is solving problems you have not solved before. It can also be very frustrating because it goes against the natural problem solving instinct most people have. Saying “honestly at this point I would look this up on Google” gives you a reply of “Suppose you do that and find nothing, what do you do?”.
Different people react in different ways in this situation. These include:
This test is meant to reveal how the person works under pressure. This is not a hard requirement for many jobs but even then those who can keep a cool head under stressing circumstances have a distinct advantage to those who can't.
The coding interview aims to be a simplification of this process, where the person is tested on things close to “real world” problems. But it is not the real world, only a model. And like all models, it is a simplification that leaves some things out and emphasizes other things more than they should.
The above mentioned lack of Internet connectivity is one. It can be very frustrating to be denied the most obvious and simplest approach (which is usually the correct approach) for a problem for no seemingly good reason. There actually is a reason: to test behavior when information is not available, but it still may seem arbitrary and cold, especially since looking up said information usually yields useful data even if the exact solution is not found.
A different problem has to do with thinking. The interviewers encourage the applicant to think out loud as much as possible. The point of this is to see the thought process, what approaches they are going through and what things they are discarding outright. The reasoning behind this is sound but on the other hand it actively makes thinking about the problem harder for some people.
Every time you start explaining your thought process you have to do a mental context switch and then another one to go back to thinking. Brain research has shown us that different people think in very different ways. Talking while pondering a problem is natural for some people. It is very taxing for others. This gives a natural advantage to people who talk while thinking but there is not, as far as I know, any research results that these sort of people perform better at a given job.
Mental context switches are not the most problematic thing. For some people explaining their own thought process can be impossible. I don't know how common this is (or it it's even a thing in general) but at least for me personally, I can't really explain how I solve problems. I think about them but I can't really conceptualize how the solution is reached. Once a solution presents itself, explaining it is simple, but all details about the path that lead there remain in the dark.
People who think in this manner have a bit of an unfortunate handicap, because it is hard to tell the difference between a person thinking without speaking and a person who has just frozen with no idea what to do. Presumably interviewers are taught to notice this and help the interviewee but since all books and guidelines emphasize explaining your thought process, this might not always succeed perfectly.
At regular intervals this brings up Internet snarkiness.
The point these comments make are basically sound, but they miss the bigger point which is this:
When you are asked to reverse a btree on a whiteboard, you are not being evaluated based on whether you can reverse a btree on a whiteboard.
Before we look into what you are evaluated on, some full disclosure may be in order. I have interviewed for Google a few times but each time they have chosen not to hire me. I have not interviewed people with this method nor have I received any training in doing so. Thus I don't know what aspects of interview performance are actually being evaluated, and presumably different companies have different requirements. These are just estimates which may be completely wrong.
Things tested before writing code
Getting an onsite interview is a challenge in itself. You have to pass multiple phone screens and programming tests just to get there. This test methodology is not foolproof. Every now and then (rarely, but still) a person manages to slip through the tests even if they would be unqualified for the job. Starting simple is an easy way to make sure that the applicant really is up to the challenge.The second big test is one of attitude. While we all like to think that programming is about having fun solving hard problems with modern functional languages, creating world altering projects with ponies, rainbows and kittens, reality is more mundane. Over 80% of all costs in a software project are incurred during maintenance. This is usually nonglamorous and monotonous work. Even new projects have a ton of “grunt work” that just needs to be done. The attitude one displays when asked to do this sort of menial task can be very revealing. Some people get irritated because such a task is “beneath their intelligence”. These people are usually poor team players who you definitely do not want to hire.
The most important aspect of team work is communication. Yes, even more important than coding. Reversing a btree is a simple problem but the real work is in explaining what is being done and why. It is weird that even though communication is so important, very little effort is spent in teaching it (in many schools it is not taught at all). Being able to explain how to implement this kind of simple task is important, because a large fraction of your work consists of describing your solutions and ideas to other people. It does not matter if you can come up with the greatest ideas in the world if you can not make other “get” them.
Writing the code
Writing the code to reverse a btree should not take more than a few minutes. It is not hard, anyone who has done basic CS courses can get it done in a jiffy. But writing the code is an almost incidental part of the interview, in fact it is just a springboard to the actual problem.Things tested after the code is written
Once the basic solution is done, you will get asked to expand it. For a btree questions might include:- If you used stack based iteration, what if the btree is deeper than there is stack space?
- What if you need to be able to abort the reversal and restore the original tree given an external signal during reversal?
- What if the btree is so large it won't fit in RAM? Or on one machine?
- What if you need to reverse the tree while other threads are reading it?
- What if you need to reverse the three while other threads are mutating it?
- And so on.
As we see even the most seemingly simple problem can be made arbitrarily hard. The point of the original question is not to test if you can reverse the tree, but find how far you can go from there and where is the limit of what you don't know. If you come out of an interview where you are asked to reverse a tree and the only thing you do is reverse the tree, you have most likely failed.
When the edge of knowledge has been reached comes the next evaluation point: what do you do when you don't know what to do. This is important because a large fraction of programming is solving problems you have not solved before. It can also be very frustrating because it goes against the natural problem solving instinct most people have. Saying “honestly at this point I would look this up on Google” gives you a reply of “Suppose you do that and find nothing, what do you do?”.
Different people react in different ways in this situation. These include:
- freezing up completely
- whining
- trying random incoherent things
- trying focused changes
- trying a solution that won't solve the problem fully but is better than the current solution (and explicitly stating this before starting work)
This test is meant to reveal how the person works under pressure. This is not a hard requirement for many jobs but even then those who can keep a cool head under stressing circumstances have a distinct advantage to those who can't.
Points of criticism
There are really only two reliable ways of assessing a candidate. The first one is word of mouth from someone who has worked with the person before. The second one is taking the candidate in and make them work with the team for a while. Neither of these approaches is scalable.The coding interview aims to be a simplification of this process, where the person is tested on things close to “real world” problems. But it is not the real world, only a model. And like all models, it is a simplification that leaves some things out and emphasizes other things more than they should.
The above mentioned lack of Internet connectivity is one. It can be very frustrating to be denied the most obvious and simplest approach (which is usually the correct approach) for a problem for no seemingly good reason. There actually is a reason: to test behavior when information is not available, but it still may seem arbitrary and cold, especially since looking up said information usually yields useful data even if the exact solution is not found.
A different problem has to do with thinking. The interviewers encourage the applicant to think out loud as much as possible. The point of this is to see the thought process, what approaches they are going through and what things they are discarding outright. The reasoning behind this is sound but on the other hand it actively makes thinking about the problem harder for some people.
Every time you start explaining your thought process you have to do a mental context switch and then another one to go back to thinking. Brain research has shown us that different people think in very different ways. Talking while pondering a problem is natural for some people. It is very taxing for others. This gives a natural advantage to people who talk while thinking but there is not, as far as I know, any research results that these sort of people perform better at a given job.
Mental context switches are not the most problematic thing. For some people explaining their own thought process can be impossible. I don't know how common this is (or it it's even a thing in general) but at least for me personally, I can't really explain how I solve problems. I think about them but I can't really conceptualize how the solution is reached. Once a solution presents itself, explaining it is simple, but all details about the path that lead there remain in the dark.
People who think in this manner have a bit of an unfortunate handicap, because it is hard to tell the difference between a person thinking without speaking and a person who has just frozen with no idea what to do. Presumably interviewers are taught to notice this and help the interviewee but since all books and guidelines emphasize explaining your thought process, this might not always succeed perfectly.
But does it really matter?
That's the problem with human issues. Without massive research in the issue we don't know.Friday, December 2, 2016
What does configure actually run
One of the claimed advantages of Autotools is that it depends (paraphrasing here) "only on make + sh". Some people add sed to the list as well. This is a nice talking point, but it is actually true? Lets' find out.
Determining this is simple, we just run the GNU Hello program's configure script under strace like this:
strace -e execve -f ./configure 2>stderr.txt > stdout.txt
This puts all command invocations of the process and its children to stderr.txt. Then we can massage it slightly with Python and get the following list of commands.
arch
as
basename
bash
cat
cc
cc1
chmod
collect2
conftest
cp
diff
dirname
echo
expr
file
gawk
gcc
getsysinfo
grep
hostinfo
hostname
install
ld
ln
ls
machine
make
mkdir
mktemp
msgfmt
msgmerge
mv
oslevel
print
rm
rmdir
sed
sh
sleep
sort
tr
true
uname
uniq
universe
wc
which
xgettext
Determining this is simple, we just run the GNU Hello program's configure script under strace like this:
strace -e execve -f ./configure 2>stderr.txt > stdout.txt
This puts all command invocations of the process and its children to stderr.txt. Then we can massage it slightly with Python and get the following list of commands.
arch
as
basename
bash
cat
cc
cc1
chmod
collect2
conftest
cp
diff
dirname
echo
expr
file
gawk
gcc
getsysinfo
grep
hostinfo
hostname
install
ld
ln
ls
machine
make
mkdir
mktemp
msgfmt
msgmerge
mv
oslevel
rm
rmdir
sed
sh
sleep
sort
tr
true
uname
uniq
universe
wc
which
xgettext
This list contains a total of 49 different commands including heavyweights such as diff and gawk.
Thus we find that the answer to the question is no. Configure requires a lot more stuff than just shell plus make. In fact it requires a big chunk of the Unix userland implictly.
Pedantic postscriptum
It could be said that all of these programs are not strictly required and that configure could (potentially) work without them present. This is probably correct, but many of these programs provide functionality which is essential and not provided by either plain shell or Make.
Tuesday, November 8, 2016
Building code faster and why recursive Make is so slow
One of the most common reactions to Meson we have gotten has been that build time is dominated by the compiler, thus trying to make the build system faster is pointless. After all, Make just launches subproject processes and waits for them to finish, so there's nothing to be gained.
This is a reasonable assumption but also false. Even if we ignore the most obvious slow parts, such as Configure that can take up to 15 minutes to run on an embedded device to Libtool, which slows everything down for no good reason, there are still gains to be had.
Meson uses Ninja as its main backend. One of the (many) cool things about it is that it is being actively developed and maintained. A recent addition is the Ninja tracer framework. It takes the output of Ninja's log file and converts it to the timing format understood by Chrome's developer tools. It creates output that looks like this (thanks to Nirbheek for gathering the data).
This is a trace of building all of GStreamer using their new Meson based aggregate repo called gst-build. This build has been done using six parallel processes making up the six horizontal lanes in the diagram. Each colored square represents one build task such as compiling or linking with time increasing from left to right. This picture shows us both why Meson is so efficient and also why Make based builds are slow (though the latter requires some analysis).
The main reason for speed is that Ninja keeps all processor cores working all the time. It can do that because it has a global view of the problem. As an example if we build two targets A and B and B links to A, Ninja can start compiling the source code of B before it has finished linking A. This allows it to keep all cores pegged.
The perceptive reader might have noticed the blank space at the left end of the diagram. During that time only one core is doing anything, all others are idle. The process that is running is the GIR tool processing the gstreamer-1.0 library, which all other subprojects depend on. In theory we should be able to run compiles on other projects but for reliability reasons Meson assumes that some source might include the output of the tool as a header so it does not start compilations until the file is generated. This may seem weird, but there are projects that do these kinds of things and require that they work.
The white gap is also what causes Make to be so slow. Most projects write recursive makefiles because maintaining a non-recursive makefile for even a moderate sized project is a nightmare. When building recursively Make goes into each subdirectory in turn, builds it to completion and only then goes to the next one. The last step in almost every case is linking, which can only be done using one process. If we assume that linking takes one second, then for a project that has 20 subdirectories that adds up to 20 wasted seconds of wall time. For an n core machine it corresponds to (n-1)*(num_directories) seconds of CPU time. Those things add up pretty quickly. In addition to Make, the same issue crops up in Visual Studio project files and pretty much any build system that does not have a global view of the dependency tree.
Those interested in going through the original data can do so. Just open this link with either Chromium or Chrome and the chart will be displayed in devtools. Unfortunately this won't work with Firefox.
This is a reasonable assumption but also false. Even if we ignore the most obvious slow parts, such as Configure that can take up to 15 minutes to run on an embedded device to Libtool, which slows everything down for no good reason, there are still gains to be had.
Meson uses Ninja as its main backend. One of the (many) cool things about it is that it is being actively developed and maintained. A recent addition is the Ninja tracer framework. It takes the output of Ninja's log file and converts it to the timing format understood by Chrome's developer tools. It creates output that looks like this (thanks to Nirbheek for gathering the data).
This is a trace of building all of GStreamer using their new Meson based aggregate repo called gst-build. This build has been done using six parallel processes making up the six horizontal lanes in the diagram. Each colored square represents one build task such as compiling or linking with time increasing from left to right. This picture shows us both why Meson is so efficient and also why Make based builds are slow (though the latter requires some analysis).
The main reason for speed is that Ninja keeps all processor cores working all the time. It can do that because it has a global view of the problem. As an example if we build two targets A and B and B links to A, Ninja can start compiling the source code of B before it has finished linking A. This allows it to keep all cores pegged.
The perceptive reader might have noticed the blank space at the left end of the diagram. During that time only one core is doing anything, all others are idle. The process that is running is the GIR tool processing the gstreamer-1.0 library, which all other subprojects depend on. In theory we should be able to run compiles on other projects but for reliability reasons Meson assumes that some source might include the output of the tool as a header so it does not start compilations until the file is generated. This may seem weird, but there are projects that do these kinds of things and require that they work.
The white gap is also what causes Make to be so slow. Most projects write recursive makefiles because maintaining a non-recursive makefile for even a moderate sized project is a nightmare. When building recursively Make goes into each subdirectory in turn, builds it to completion and only then goes to the next one. The last step in almost every case is linking, which can only be done using one process. If we assume that linking takes one second, then for a project that has 20 subdirectories that adds up to 20 wasted seconds of wall time. For an n core machine it corresponds to (n-1)*(num_directories) seconds of CPU time. Those things add up pretty quickly. In addition to Make, the same issue crops up in Visual Studio project files and pretty much any build system that does not have a global view of the dependency tree.
Those interested in going through the original data can do so. Just open this link with either Chromium or Chrome and the chart will be displayed in devtools. Unfortunately this won't work with Firefox.
Bonus challenge
The output format that Chrome's dev tools understands is straightforward. It would be interesting if someone modified Make to output the same format and ran it on a moderate sized project using Make. GStreamer is an obvious candidate as it has both Meson and Autotools setups in master, though gst-build only works with Meson.
Wednesday, November 2, 2016
Exposing a C++ library with a stable plain C API
There has been a lot of talk recently about using Rust to create shared libraries that have a plain C ABI. This allows you to transition libraries piecewise to Rust. This is seen as a contrast to C++ which has an unstable ABI and thus is hard to maintain and to integrate with other programs and libraries (Python, Ruby etc) that only work with plain C calling conventions.
However just like you can export Rust code with a C ABI by writing the required boilerplate, the same can be done in C++. Since C++ has native support for C, this is very simple. As an example a class that looks like this (lot of stuff omitted for clarity):
class Foo {
public:
Foo();
~Foo();
void poke(int x);
};
becomes this:
struct CFoo;
struct CFoo* foo_create();
void foo_destroy(struct CFoo *f);
void foo_poke(struct CFoo *f, int x, char **error);
The implementation for each of these just casts CFoo* into Foo* and then calls the corresponding method. The last function handles errors roughly in the same way as GLib's GError. In case of an error the error message is put in the error parameter and it is the responsibility of the caller to check it. For simplicity this is a string, but it could be an error object as well.
This is straightforward but the big problem comes in reporting errors. Writing code to convert exceptions to error messages for each and every function is tedious and error-prone. Fortunately there is a simple solution called an hourglass shaped interface using a Lippincott function. With it the implementation of foo_poke looks like this:
void convert_exception(char **error) noexcept {
char *msg;
try {
throw; // The magic happens here
} catch(const std::exception &e) {
msg = strdup(e.what());
} catch(...) {
msg = strdup("An unknown error happened.");
}
*error = msg;
}
However just like you can export Rust code with a C ABI by writing the required boilerplate, the same can be done in C++. Since C++ has native support for C, this is very simple. As an example a class that looks like this (lot of stuff omitted for clarity):
class Foo {
public:
Foo();
~Foo();
void poke(int x);
};
becomes this:
struct CFoo;
struct CFoo* foo_create();
void foo_destroy(struct CFoo *f);
void foo_poke(struct CFoo *f, int x, char **error);
The implementation for each of these just casts CFoo* into Foo* and then calls the corresponding method. The last function handles errors roughly in the same way as GLib's GError. In case of an error the error message is put in the error parameter and it is the responsibility of the caller to check it. For simplicity this is a string, but it could be an error object as well.
This is straightforward but the big problem comes in reporting errors. Writing code to convert exceptions to error messages for each and every function is tedious and error-prone. Fortunately there is a simple solution called an hourglass shaped interface using a Lippincott function. With it the implementation of foo_poke looks like this:
void convert_exception(char **error) noexcept {
char *msg;
try {
throw; // The magic happens here
} catch(const std::exception &e) {
msg = strdup(e.what());
} catch(...) {
msg = strdup("An unknown error happened.");
}
*error = msg;
}
void foo_poke(struct CFoo* f, int x, char **error) {
try {
reinterpret_cast<Foo*>(f)->poke(x);
} catch(...) {
convert_exception(error);
}
}
Here we have moved all error handling away from the wrapper into a standalone helper function. All interface functions have now been reduced into just calling to the real function and, in case of errors, calling the converter. The reason this works is that C++ stores the exception that is currently in flight and allows it to be rethrown. You can think of it as an invisible argument to the converter function.
This simple construct allows us to expose C++ libraries with a plain C ABI with all the stability guarantees and interoperability goodness that entails. A full sample project can be found here. It also contains a C++ "unwrapper" on top of the plain C API that makes exceptions travel transparently over the interface (that is, using the exact same ABI).
Tuesday, October 11, 2016
Unix Userland as C++ Coroutines
The Unix userland concept is simple. It has standalone programs that do only one thing which are then chained together. A classical example is removing duplicates from a file, which looks like this:
cat filename.txt | sort | uniq
This is a very powerful concept. However there is a noticeable performance overhead. In order to do this, the shell needs to launch three different processes. In addition all communication between these programs happens over Unix pipes, meaning a round trip call into the kernel for each line of text transferred (roughly, not an exact number in all cases). Even though processes and pipes are cheap on Unix they still cause noticeable slowdown (as anyone who has run large shell scripts knows).
Meanwhile there has been a lot of effort to put coroutines into C++. A very simple (and not fully exhaustive) way of describing coroutines is to say that they basically implement the same functionality as Python generators. That is, this Python code:
def gen():
yield 1
yield 2
yield 3
could be written in C++ like this (syntax is approximate and might not even compile):
int gen() {
co_yield 1;
co_yield 2;
co_yield 3;
}
If we look more closely into the Unix userland specification we find that even though it is (always?) implemented with processes and pipes, it is not required to. If we use C's as if rule we are free to implement shell pipelines in any way we want as long as the output is the same. This gives us a straightforward guide to implement the Unix userland in C++ coroutines.
We can model each Unix command line application as a coroutine that reads input data, either stdout or stderr, one line at a time, operates on it and outputs the result in the same way. Then we parse a shell pipeline into its constituent commands, connect the output of element n as the input of element n + 1 and keep reading the output of the last pipeline element until EOF is reached. This allows us to run the entire pipeline without spawning a single process.
A simple implementation with a few basic commands is available on Github. Unfortunately real coroutines were not properly available on Linux at the time of writing, only a random nonmerged branch of Clang. Therefore the project contains a simple class based implementation that mimics coroutines. Interested readers are encouraged to go through the code and work out how much boilerplate code a proper coroutine implementation would eliminate.
cat filename.txt | sort | uniq
This is a very powerful concept. However there is a noticeable performance overhead. In order to do this, the shell needs to launch three different processes. In addition all communication between these programs happens over Unix pipes, meaning a round trip call into the kernel for each line of text transferred (roughly, not an exact number in all cases). Even though processes and pipes are cheap on Unix they still cause noticeable slowdown (as anyone who has run large shell scripts knows).
Meanwhile there has been a lot of effort to put coroutines into C++. A very simple (and not fully exhaustive) way of describing coroutines is to say that they basically implement the same functionality as Python generators. That is, this Python code:
def gen():
yield 1
yield 2
yield 3
could be written in C++ like this (syntax is approximate and might not even compile):
int gen() {
co_yield 1;
co_yield 2;
co_yield 3;
}
If we look more closely into the Unix userland specification we find that even though it is (always?) implemented with processes and pipes, it is not required to. If we use C's as if rule we are free to implement shell pipelines in any way we want as long as the output is the same. This gives us a straightforward guide to implement the Unix userland in C++ coroutines.
We can model each Unix command line application as a coroutine that reads input data, either stdout or stderr, one line at a time, operates on it and outputs the result in the same way. Then we parse a shell pipeline into its constituent commands, connect the output of element n as the input of element n + 1 and keep reading the output of the last pipeline element until EOF is reached. This allows us to run the entire pipeline without spawning a single process.
A simple implementation with a few basic commands is available on Github. Unfortunately real coroutines were not properly available on Linux at the time of writing, only a random nonmerged branch of Clang. Therefore the project contains a simple class based implementation that mimics coroutines. Interested readers are encouraged to go through the code and work out how much boilerplate code a proper coroutine implementation would eliminate.
Mini-FAQ
This implementation does not handle background processes/binary data/file redirection/invoking external programs/yourthinghere!
That is correct.
Isn't this just a reinvented Busybox?
Yes and no. Busybox provides all executables in a single binary but still uses processes to communicate between different parts of the pipeline. This implementation does not spawn any processes.
Couldn't you run each "process" in its own thread to achieve parallelism?
Yes. However I felt like doing this instead.
Monday, September 12, 2016
The unspeakable horror of Visual Studio PDB files
Disclaimer: everything said in the following blog post may be completely wrong. Since PDB files are (until very recently) completely undocumented everything below has been determined by trial and error. Doubt everything, believe nothing.
Debugging information, how hard can it be?
When compiling C-like languages, debug information is not a problem. It gets written in the object file along with code and when objects are linked to form an executable or shared library, each individual file's debug info is combined and written in the result file. If necessary, this information can then be stripped into a standalone file.
This seems like the simplest thing in the world. Which it is. Unless you are Microsoft.
Visual Studio stores debug information in standalone files with the extension .pdb. Whenever the compiler is invoked it writes an .obj file and then opens the pdb file to write the debug info in it (while removing the old one). At first glance this does not seem a bad design, and indeed it wasn't in the 80s when it was created (presumably).
Sadly multicore machines break this model completely. Individual compilation jobs should be executable in parallel (and in Unix they are) but with pdb files they can't be. Every compile process needs to obtain some sort of lock to be able to update the pdb file. Processes that could be perfectly isolated now spend a lot of time fighting with each other over access to the pdb file.
Fortunately you can tell VS to write each object file's pdb info in a standalone file which is then merged into the final pdb. It even works, unless you want to use precompiled headers.
VS writes a string inside the obj file a string pointing to the corresponding pdb file. However if you use precompiled headers it fails because the pdb strings are different in the source object and precompiled header object file and VS wants to join these two when generating the main .obj file. VS will then note that the strings are different and refuse to create the source object file because merging two files with serialised arrays is a known unsolvable problem in computer science. The merge would work if you compiled the precompiled header separately for each object file and gave them the same temporary pdb file name. In theory at least, this particular rat hole is not one I wish to get into so I did not try.
This does work if you give both compilations the same target pdb file (the one for the final target). This gives you the choice between a slowdown caused by locking or a slowdown caused by lack of precompiled headers. Or, if you prefer, the choice of not having debug information.
But then it gets worse.
If you choose the locking slowdown then you can't take object files from one target and use them in other targets. The usual reasons are either to get just one object file for a unit test without needing a recompilation or emulating -Wl,--whole-archive (available natively only in VS2015 or later) by putting all object files in a different target. Trying gets you a linking error due to an incorrect pdb file name.
There was a blog post recently that Microsoft is having problems in their daily builds because compiling Windows is starting to take over 24 hours. I'm fairly certain this is one of the main reasons why.
But then it gets worse.
Debug information for foo.exe is written to a file called foo.pdb. The debug information for a shared library foo.dll is also written to a file called foo.pdb. That means you can't have a dll and an exe with the same name in the same directory. Unfortunately this is what you almost always want because Windows does not have rpath so you can't instruct an executable to look up its dependencies elsewhere (Though you can fake it with PATH. Yes, really.)
Fortunately you can specify the name of the output pdb, so you can tell VS to generate foo_exe.pdb and foo_lib.pdb. Unfortunately VS will also generate a dozen other files besides the pdb and whose names come from the target basename and which you can not change. No matter what you do, files will be clobbered. Even if they did not clobber the files would still be useless because VS the IDE assumes the file is called foo.pdb and refuses to work if it is not.
All this because writing debug information inside object files, which is where it belongs, is not supported.
But wait, it gets even worse.
Putting debug info in obj files was supported but is now deprecated.
In conclusion
Meson recently landed support for generating PDB files. Thanks to Nirbheek for getting this ball rolling. If you know that the above is complete bull and that it is possible to do all the things discussed above then please file bugs with information or, preferably, patches to fix them.
Subscribe to:
Posts (Atom)

