Thursday, March 26, 2020

It's not what programming languages do, it's what they shepherd you to

How many of you have listened, read or taken part in a discussion about programming languages that goes like the following:

Person A: "Programming language X is bad, code written in it is unreadable and horrible."

Person B: "No it's not. You can write good code in X, you just have to be disciplined."

Person A: "It does not work, if you look at existing code it is all awful."

Person B: "No! Wrong! Those are just people doing it badly. You can write readable code just fine."

After this the discussion repeats from the beginning until either one gets fed up and just leaves.

I'm guessing more than 99% of you readers have seen this, often multiple times. The sad part of this is that even though this thing happens all the time, nobody learns anything and the discussion begins anew all the time. Let's see if we can do something about this. A good way to go about it is to try to come up with a name and a description for the underlying issue.
shepherding An invisible property of a progamming language and its ecosystem that drives people into solving problems in ways that are natural for the programming language itself rather than ways that are considered "better" in some sense. These may include things like long term maintainability, readability and performance.
This is a bit abstract, so let's look at some examples.

Perl shepherds you into using regexps

Perl has several XML parsers available and they are presumably good at their jobs (I have never actually used one so I wouldn't know). Yet, in practice, many Perl scripts do XML (and HTML) manipulation with regexes, which is brittle and "wrong" for lack of a better term. This is a clear case of shepherding. Text manipulation in Perl is easy. Importing, calling and using an XML parser is not. And really all you need to do is to change that one string to a different string. It's tempting. It works. Surely it could not fail. Let's just do it and get on with other stuff. Boom, just like that you have been shepherded.

Note that there is nothing about Perl that forces you to do this. It provides all the tools needed to do the right thing. And yet people don't, because they are being shepherded (unconsciously) into doing the thing that is easy and fast in Perl.

Make shepherds you into embedding shell pipelines in Makefiles

Compiling code with Make is tolerable, but it fails quite badly when you need to generate source code, data files and the like. The sustainable solution would be to write a standalone program in a proper scripting language that has all the code logic needed and call that from Make with your inputs and outputs. This rarely happens. Instead people think "I know, I have an entire Unix userland available [1], I can just string together random text mangling tools in a pipeline, write it here and be done". This is how unmaintainability is born.

Nothing about Make forces people to behave like this. Make shepherds people into doing this. It is the easy, natural outcome when faced with the given problem.

Other examples

  • C shepherds you into manipulating data via pointers rather than value objects.
  • C++ shepherds you into providing dependencies as header-only libraries.
  • Java does not shepherd you into using classes and objects, it pretty much mandates them.
  • Turing complete configuration languages shepherd you into writing complex logic with them, even though they are usually not particularly good programming environments.
[1] Which you don't have on Windows. Not to mention that every Unix has slightly different command line arguments and semantics for basic commands meaning shell pipelines are not actually portable.


  1. Python (and really most scripting languages) shepherds you into using exceptions as control flow

    1. Python shepherds you into thinking about how others will read your code. Optimisation is for readability rather than speed.

    2. Readability (and maintainability) goes sideways with exceptions. Best way of handling errors are ADTs. Which btw are well supported by Python, almost everything you work with in Python represents an algebraic data type.

  2. I like this idea as a way to talk about programming languages, but I think it would be hard to reach a consensus on this. For example would you say that languages for example like python, ruby or haskell shepherds it's users to write overly complicated list comprehensions?

    1. No, because there are other options that are just as easy. Java shepherds you to use inversion of control with all this crappy spring and other dependency injection frameworks which leads to dependency hell and runtime errors instead of compile time errors and other complexity issues.

    2. You don't have to use Spring.

  3. Some of it is also timing too. Java's shepherding into XML and later and less so JSON config files has a legacy in J2EE which it just can't shake as a language. The way it handled the split of code/config then was about the companies that developed J2EE and their needs. That reminds me of the mythical man month essay on the topic of organisation structure impacting code.

    Language is part of it but even if you fix the language you wont necessarily fix its history. Go code will always be riddled with if err!=nil even if a new language spec fixes it for good. Idioms and tutorials drive language and the culture around a language too and they may set a course of behaviour that never changes. The latest C++ specs are quite a lot different to the C++ of old but culturally and historically the volume of code in that language imparts impact on the future code as well. Sometimes just a popular tutorial doing it badly can really hurt a language, your documnetation (or lack of it) drives a lot of problems too which then will forever persist in the language.

  4. I think you are on the right track. Languages tend to have a central concept or problem area that they are trying to solve, so the syntax and structure is optimized in a certain way to address or encourage a way of thinking towards a problem set. If languages were like tools in a toolbox, someone skilled in their craft would use the tool best suited for the problem at hand. We don't do that often in the software industry because companies tend to mandate some subset of possible languages to use to ease training, code maintenance, and hiring concerns. The result is that our solutions are strongly influenced by our use of language. Perhaps the interesting question is if it is possible to determine the degree of "impedance" mis-match between the problem and the language chosen to solve the problem.

  5. One thing I've noticed after some years of working in Python is that that the C++ community shepherds you into thinking about performance for trivial things. For instance, the typical C++ book would introduce building blocks, aka abstractions, and then spend much time discussing the overhead of them, e.g. virtual functions.

    In Python, people just stuff things in a dict and forget about it, until they actually have a slow program.

    But this laid back attitude means that something like hash tables are trivial in Python (dict + tuple for keys), while C++ struggled with it for years. I've personally written code in C++ that was O(n²) or perhaps O(n log n) after fighting with a red-black tree, while the code in Python was O(n), simply because the infrastructure is designed with ergonomics rather than performance in mind.

    I think this is an interesting contradiction.

  6. Excellent insight into how and why people write bad code in a variety of languages. I like the shepherding concept.

  7. This could be quantified. If we agree on practices that are "bad", then looking through github for examples in different languages could actually tell what percentage of the codebases used those "bad" practices, which would be a good measurement of how much the language shepherds people into bad practices.

  8. It's a 38-year old issue, see: "Determined Real Programmer can write FORTRAN programs in any language." #humor

    Be cool, write maintanable code!

  9. I've been complaining about this a lot lately. A language like C# shepherds you into mutable data by default (the "readonly" keyword has to be added), and Java shepherds you into inheritance because methods are virtual by default. All procedural-based languages shepherd you into spaghetti code because it's always easier to just add another "if"/"else" than it is to refactor code out into a separate class or method. When modern best practice includes things like "data should be immutable" or "prefer delegation over inheritance" or "keep cyclomatic complexity low", these should be easiest to do but they are often harder.

  10. I find it interesting how most of the comments reinforce the first part of the essay :-D

  11. Congratulations, you've reformulated the Sapir-Whorf hypothesis

  12. This is a fun way to combine the Blub Paradox and people lazyness. You basically take a feature from Blub language, and you take someone lazy enough to abuse the feature.

    You see, these are two separate things. What you describe is a human problem, not a language one.