Saturday, August 10, 2024

Refactoring Python dicts to proper classes

When doing a major refactoring in Meson, I came up with a interesting refactoring technique, which I have not seen before. Some search engineing did not find suitable hits. Obviously it is entirely possible that this is a known refactoring but I don't know its name. In any case, here's my version of it.

The problem

Suppose you have a Python class with the following snippet

class Something:
    def __init__(self):
        self.store = {}

Basically you have a dictionary as a member variable. This is then used all around the class that grows and grows. Then you either find a bug in how the dict is used or you want to add some functionality like, to pick an arbitrary requirement, all keys for this object that are strings, must begin with "s_".

Now you have a problem because you need to do arbitrary changes all around the code. You can't easily debug this. You can't add a breakpoint inside this specific dictionary's setter function (or maybe Python's debugger can do that but I don't know how to do that). Reading code that massages dictionaries directly is tricky, because it's all brackets and open code rather than calling named methods like do_operation_x.

The solution, step one

Create a Python class that looks like this:

class MeaningfulName:
    def __init__(self, *args, **kwargs):
        self.d = dict(*args, **kwargs)

    def contains(self, key):
        return key in self.d

    def __getitem__(self, key):
        return self.d[key]

    def __setitem__(self, key, value):
        self.d[key] = value

    ...

Basically you implement all the special methods that do nothing else than forward to the underlying dictionary. Then replace the self.store dictionary with this object. Nothing should have changed. Run tests to make sure. Then commit this to main. Let it sit in the code base for a while in case there are untested code paths that use functionality that you did not write.

Just doing this gives an advantage: it is easy to add breakpoints to methods that mutate the objects's state.

Step two

Pick any of the special dunder methods and rename it to a more meaningful name. Add validation code if you need. Run tests. Fix all errors by rewriting the calling code to use the new named method. Some methods might need to be replaced with multiple new methods that do slightly different things. For example you might want to add methods like set_value and update_if_changed.

Step three

Repeat step two until all dunder methods are gone.

No comments:

Post a Comment