Sunday, July 14, 2019

Initializing all local variables with Clang-Tidy

A common source of all kinds of bugs is using variables without properly initializing them. Out of all security problems this one is the simplest to fix, just convert all declarations of type int x; to int x=0;. The main reason for not doing that is laziness, manually going through existing code bases and adding initialization statements is boring and nobody wants to do that.

Fortunately nowadays we don't have to. Clang-tidy provides a nice toolkit for writing source code refactoring tools for C and C++. As an exercise I wrote a checker to do this. It is submitted upstream and is undergoing code review. Implementing it was fairly straightforward. There were only two major problems. The first one was that existing documentation consists mostly of reference manuals. There is no easy to follow tutorials, only Doxygen pages. But if you dig around on the net and work on it a bit, you can get it working.

The second, and bigger, obstacle is that doing anything in the LLVM code base is sloooow. Everything in LLVM and Clang is linked to single, huge, monolithic libraries which take forever to link. Because of reasons I started doing this work on my secondary machine, which is a 4 core i5 with 16 gigs of RAM. I had to limit simultaneous linker jobs to 2 because otherwise it would just crash spectacularly to an out of memory error. Presumably it is impossible to compile the code base on a machine that has only 8 gigs of RAM. It seems that if you want to do any real development on LLVM you need a spare data center to run the compilations, which is unfortunate.

There is an evil hack to work around this, though. Set the CMake build type to Debug and then change CMAKE_CXX_FLAGS_DEBUG and CMAKE_C_FLAGS_DEBUG from -g to -Og. This makes the compilation faster and reduces memory usage to a fraction of the original. The downside is that there is no debug information, but it turns out to not be necessary when writing simple Clang-Tidy checkers.

Once all that is done the actual checker is almost trivial. This is the part that looks up all local variables without initial values:

void InitLocalVariablesCheck::registerMatchers(MatchFinder *Finder) {
  Finder->addMatcher(
      varDecl(unless(hasInitializer(anything()))).bind("vardecl"), this);
}

Then you determine what the initial value should be based on the type of the variable and add a warning and a fixit:

diag(location, "variable %0 is not initialized")
    << MatchedDecl;
diag(location, "insert initial value", DiagnosticIDs::Note)
    << FixItHint::CreateInsertion(
           location.getLocWithOffset(VarName.size()),
           InitializationString);

All in all this amounts to about 100 lines of code plus tests.

But what about performance?

The other major reason not to initialize variables is that it "may cause a runtime performance degradation of unknown magnitude". That is true, but with this tooling the degradation is no longer unknown. You can run the tool and then measure the results. This is trivial for all code bases that have performance benchmarks.

No comments:

Post a Comment