Wednesday, August 21, 2024

Meson's New Option Setup ‒ The Largest Refactoring

The problem

Meson has had togglable options from almost the very beginning. These split into two camps. The first one is "common options" like optimizations, warning level, language standard version and so on. The second one is "per project" options that are specific to each project, such as which backend to use. For a long time things were quite nice but as people started using subprojects more and more, the need to configure common options on a per-subproject basis became more and more important.

Meson added a limited way of setting some options per subproject, but it was never really felt like a proper integrated solution. Doing it properly turns out to have a lot of requirements because you want to be able to:

  • Override any shared option for any subproject
  • Do this at runtime from the command line
  • You want to unset any override given
  • Convert existing per-project settings to the new override format
  • Provide an UI that is readable and sensible
  • Do all of this without needing to edit subproject build files

The last one of these is important. It means that you can use deps directly (i.e. from WrapDB) without any local patches.

What benefits do you get out of it?

The benefits are most easily seen via examples. Let's say you are developing a program that uses a dependency that does heavy number crunching. You need to build that (and only that) subproject with optimizations enabled, otherwise your development experience is intolerably slow. This is done by defining an augment, like so:

meson configure -Acruncher:optimization=2

A stronger version of this would be to compile all subprojects with optimizations but the top level project without them. This is how you'd do it:

meson configure -Doptimization=2 -A:optimization=0

Augments can be unset:

meson configure -Usubproject:option

This scheme permits you to do all sorts of useful things, like disable -Werror on specific projects, build some subprojects with a different language version (such as gnu99), compiling LGPL deps as shared libraries and everything else as a static library, and so on.

Implementing

This is a big internal change. How big? Big! This is the largest refactoring operation I have done in my life. It is big enough that it took me over two years of procrastination before I managed to gather enough strength to start work on this. Pretty much all of my Meson work in the last six months or so has been spent on this one issue. The feature is still not done, but the merge request already has 80 commits and 1700+ new lines and even that is an understatement. I have chopped off bits of the change and merged them on their own. All in all this meant that the schedule for most days of my summer vacation went like this:

  • Wake up
  • Work on Meson refactoring branch until fed up
  • Work on my next book until fed up
  • Maybe do something else
  • Sleep
FTR I don't recommend this style of working for anyone else. Or even to myself. But sometimes you just gotta.

The main reason this change is so complex lies in the architecture. In existing code each built target "knew" the option settings needed for it (options could and can be overridden in build files on a per-target basis). This does not work any more. Instead the code needs one place that encapsulates all option data and provides methods like "what is the value of option X when building target Y in subproject Z". Option code was everywhere, so changing this meant touching the entire code base and that the huge change blob must be landed in master atomically.

The only thing that made this change even remotely feasible was that Meson has an extensive test suite. The main code changes were done months ago, and all work since then has gone into making existing unit tests pass. They still don't pass, so work continues. Without this test suite there would have been hundreds of regressing projects, people would be angry and everyone would pin their Meson to an old version and refuse to update. These are the sorts of breakages that kill projects dead. So, write tests, even if it does not seem fun. Without them every project will eventually end up in a fork in the road where the choice is between "death by stagnation" and "death by breaking end users". Most projects are not Python 3. They probably won't survive a similar level of breakage.

Refactoring, types and Python

Python is, at the same time, my favourite programming language and very much not my favourite programming language. Python in the small is nice, readable, wonderful and productive. As the project size grows, the lack of static types becomes aggravating and eventually you end up debugging cases like "why does this argument that should be a dict is an array one out of 500 times at random". Types make these problems go away and make refactoring easy.

But not always.

For this very specific case the complete lack of types actually made the refactoring easier. Meson currently supports more than one hundred different compilers. I needed to change the way compiler classes work, but I did not know how. Thus I started by just using the GNU C compiler. I could change that (and its base class) as much as I wanted without having to care about any other compiler class. As long as I did not use any other compiler their code was not called and it did not matter that their method signatures were completely different. In a static language all type changed would need to be done up front just to make the dang thing compile.

Still, you can have my types when you drag them from my cold, dead fingers. But maybe this is something for language designers of the future to consider. It would be kind of cool to have a strictly typed language where you could add a compiler flag to say "convert all variables into Python style variant dictionaries and make all type checks, method invocations etc work at runtime". Yes, people would abuse the crap out of this feature, but the same can be said about every new feature.

When will this land?

It is not done yet, so we don't know. At the earliest this will be in the next release, but more likely in the one after that.

If you like trying out new things and living dangerously, you can try the code from this MR. Be sure to post comments on that page if you do.

Saturday, August 10, 2024

Refactoring Python dicts to proper classes

When doing a major refactoring in Meson, I came up with a interesting refactoring technique, which I have not seen before. Some search engineing did not find suitable hits. Obviously it is entirely possible that this is a known refactoring but I don't know its name. In any case, here's my version of it.

The problem

Suppose you have a Python class with the following snippet

class Something:
    def __init__(self):
        self.store = {}

Basically you have a dictionary as a member variable. This is then used all around the class that grows and grows. Then you either find a bug in how the dict is used or you want to add some functionality like, to pick an arbitrary requirement, all keys for this object that are strings, must begin with "s_".

Now you have a problem because you need to do arbitrary changes all around the code. You can't easily debug this. You can't add a breakpoint inside this specific dictionary's setter function (or maybe Python's debugger can do that but I don't know how to do that). Reading code that massages dictionaries directly is tricky, because it's all brackets and open code rather than calling named methods like do_operation_x.

The solution, step one

Create a Python class that looks like this:

class MeaningfulName:
    def __init__(self, *args, **kwargs):
        self.d = dict(*args, **kwargs)

    def contains(self, key):
        return key in self.d

    def __getitem__(self, key):
        return self.d[key]

    def __setitem__(self, key, value):
        self.d[key] = value

    ...

Basically you implement all the special methods that do nothing else than forward to the underlying dictionary. Then replace the self.store dictionary with this object. Nothing should have changed. Run tests to make sure. Then commit this to main. Let it sit in the code base for a while in case there are untested code paths that use functionality that you did not write.

Just doing this gives an advantage: it is easy to add breakpoints to methods that mutate the objects's state.

Step two

Pick any of the special dunder methods and rename it to a more meaningful name. Add validation code if you need. Run tests. Fix all errors by rewriting the calling code to use the new named method. Some methods might need to be replaced with multiple new methods that do slightly different things. For example you might want to add methods like set_value and update_if_changed.

Step three

Repeat step two until all dunder methods are gone.