Saturday, December 28, 2019

What can clang-format teach us about the human condition?

Most people who do programming have taken part in at least one code formatting war. Usually these come about when companies want to standardise their code bases and thus want everything formatted according to a single style. Style wars, much like real wars, are not pleasant places to be in. They cause havoc and destruction, make reasonable people into life-long sworn enemies and halt work on anything useful.

In a typical style argument statements like the following are often thrown about:

  • Indentation should be done with tabs, because everyone can set tab width to whatever they want in their editor.
  • The opening brace must be on the same line as its preceding clause. This saves vertical space and thus makes the code more readable.
  • The opening brace must be on its own line. This makes code blocks stand out better and thus makes the code more readable.
  • When laying things like arguments vertically, the separating comma must be at the beginning of the line rather than the end. In this way when you add or remove an entry, the diff is always only one line.
  • In a declaration like int *bob, the asterisk must be next to the variable name, because that is what it binds to.
  • In a declaration like int* bob, the asterisk must be next to the type name, because "pointerness" is logically a feature of the type, not the variable.
  • Class variables must begin with m_ so they stand out better.
  • Class variables must not be separated with a prefix. The syntax highlighter will already draw them in a different color and if you have so many variables in your methods that you can't immediately tell which is which, your methods are too big and must be split.
  • Et cetera, et cetera, ad infinitum, ad nauseaum.
It is unknown when code formatting wars first began. Given that FORTRAN was the first real programming language with an actual syntax and was first released in 1957, the answer probably is "way, way before that". A reasonable guess would be the first or second design meeting on the syntax. Fighting over code style kept raging for almost sixty years after that. The arguments were the same, the discussion was the same, no progress was ever made. Then clang-format was introduced and suddenly everything changed.

This was surprising, because automatic code formatters had existed for decades and clang-format was "the same, just slightly better". Yet it made this problem mostly go away. Why?

Enter the human element

With all existing formatters it was fairly easy to find code where it failed. C macros were especially treacherous in this regard. This meant that either one needed to manually add (and update) comments that disabled formatting for some blocks or the formatter was run only every now and then by hand and the result had to be inspected and fixed by hand after the fact. With clang-format this manual work went effectively to zero. You could just run it at any time, even automatically before every commit. In a weird kind of backwards way once we had the correct solution, we could finally understand what the real problem was.

Every programmer writes code in their own way. Maybe they put braces on the same line, maybe not. Maybe they indent with spaces, maybe not. The details don't matter, the real point is that the writing code in this style is effortless. It just flows from your brain to the screen. Coding in any other style means spending brain energy on either typing in some non-natural style or fixing the code afterwards. This is manual and tedious work, just the kind that programmers hate with a passion. Thus when the threat of an externally mandated code style appears, the following internal monologue takes place:
If they choose a style different than mine, then I will forever have to write in a style that is unnatural to me. This is tedious. However if I spend some energy now and convince everyone else to use my style, then I can keep on doing what I have been doing thus far. All I have to do is to factually explain why my chosen style is the best, and since other programmers are rational whey will understand my point, agree and adopt my chosen style.
Unfortunately everyone else participating in the debate has the exact same idea and things end in a stalemate almost immediately. The sunk-cost fallacy ensures that once a person has publicly committed to a style choice, they will never budge from it.

Note the massive dichotomy at play here. The real reason people have for any style choice is "this is what I have gotten used to" but when they debate their chosen style they always, always use reasoning that aims to be objective and scientific. At this point you might want to pause and reread the sample arguments listed above. They follow this reasoning exactly and most claim to improve some real world objective metric such as readability. They are also all lies. These are all post-rationalising arguments, invented after the fact to make the opinion you already have sound as good as possible. They are not the real reason. They have never been the real reason. It is unlikely they will ever be the real reason. But the debate is carried on as if they were the real reason. This is why it will never end.

The lengths people are willing to go to in their post-rationalising arguments is nothing short of astounding. In this video on indenting with tabs vs spaces many tab advocates say that indenting with tabs is better because "you only need to press tab once rather than press space multiple times". Every single programmer's text editor since 1985 (possibly 1975 and potentially even 1965) has had the feature where pressing the tab key does the logically equivalent indent with spaces. Using this as an argument only shows that you have not done even the most minimal of thinking on the issue, but instead just have already made up your mind and don't want to even consider changing it.

This is why code style discussions never go anywhere. They are not about bringing people together to find the best possible choice. They are about trying to make other people submit to your will by repeatedly bashing them on the head with your style guide. This does not work because the average programmer's head is both thicker and more durable than the average style guide.

1 comment:

  1. I don't know one way or the other if your argument that everything changed with `clang-format` is correct - but _everything_ else here is definitely correct! Up to and _especially_ including the last paragraph, and its last sentence!

    I have found (as a 40+yr programmer) that there is only _one_ important coding style rule (and it isn't even that strong): Consistency counts (within a file). The most common argument for why you need a particular format (braces at end of line or braces on its own line, or whatever) is _readability_. Everyone cites that. But in fact, when I'm looking at code all day - and I look at a lot of large legacy C++ projects that use multiple 3rd party (source) libraries - the only thing that affects readability is consistency within a file. And that, not that much.

    Actually the real killers are #1 bad indentation and #2 overly long and/or overly nested abstractions, even more than consistency of, say, brace location, or whether members have prefix symbols or not.

    `clang-format` fixes #1 for sure, but nothing, not even code reviews, can fix #2 (complaints about overly long methods or too much scope nested in a code review are frequently ignored, in my experience).

    I used to have only TWO HARD RULES for coding style: #1 No hard tabs in files, and #2 ALL source/text files must end with a return character. Lately, I don't need #1.