Monday, December 28, 2020

Some things a potential Git replacement probably needs to provide

Recently there has been renewed interest in revision control systems. This is great as improvements to tools are always welcome. Git is, sadly, extremely entrenched and trying to replace will be an uphill battle. This is not due to technical but social issues. What this means is that approaches like "basically Git, but with a mathematically proven model for X" are not going to fly. While having this extra feature is great in theory, in practice is it not sufficient. The sheer amount of work needed to switch a revision control system and the ongoing burden of using a niche, nonstandard system is just too much. People will keep using their existing system.

What would it take, then, to create a system that is compelling enough to make the change? In cases like these you typically need a "big design thing" that makes the new system 10× better in some way and which the old system can not do. Alternatively the new system needs to have many small things that are better but then the total improvement needs to be something like 20× because the human brain perceives things nonlinearly. I have no idea what this "major feature" would be, but below is a list of random things that a potential replacement system should probably handle.

Better server integration

One of Git's design principles was that everyone should have all the history all the time so that every checkout is fully independent. This is a good feature to have and one that should be supported by any replacement system. However it is not revision control systems are commonly used. 99% of the time developers are working on some sort of a centralised server, be it Gitlab, Github or the a corporation's internal revision control server. The user interface should be designed so that this common case is as smooth as possible.

As an example let's look at keeping a feature branch up to date. In Git you have to rebase your branch and then force push it. If your branch had any changes you don't have in your current checkout (because they were done on a different OS, for example), they are now gone. In practice you can't have more than one person working on a feature branch because of this (unless you use merges, which you should not do). This should be more reliable. The system should store, somehow, that a rebase has happened and offer to fix out-of-date checkouts automatically. Once the feature branch gets to trunk, it is ok to throw this information away. But not before that.

Another thing one could do is that repository maintainers could mandate things like "pull requests must not contain merges from trunk to the feature branch" and the system would then automatically prohibit these. Telling people to remove merges from their pull requests and to use rebase instead is something I have to do over and over again. It would be nice to be able to prohibit the creation of said merges rather than manually detecting and fixing things afterwards.

Keep rebasing as a first class feature

One of the reasons Git won was that it embraced rebasing. Competing systems like Bzr and Mercurial did not and advocated merges instead. It turns out that people really want their linear history and that rebasing is a great way to achieve that. It also helps code review as fixes can be done in the original commits rather than new commits afterwards. The counterargument to this is that rebasing loses history. This is true, but on the other hand is also means that your commit history gets littered with messages like "Some more typo fixes #3, lol." In practice people seem to strongly prefer the former to the latter.

Make it scalable

Git does not scale. The fact that Git-LFS exists is proof enough. Git only scales in the original, narrow design spec of "must be scalable for a process that only deals in plain text source files where the main collaboration method is sending patches over email" and even then it does not do it particularly well. If you try to do anything else, Git just falls over. This is one of the main reasons why game developers and the like use other revision control systems. The final art assets for a single level in a modern game can be many, many times bigger than the entire development history of the Linux kernel.

A replacement system should handle huge repos like these effortlessly. By default a checkout should only download those files that are needed, not the entire development history. If you need to do something like bisection, then files missing from your local cache (and only those) should be downloaded transparently during checkout operations. There should be a command to download the entire history, of course, but it should not be done by default.

Further, it should be possible to do only partial checkouts. People working on low level code should be able to get just their bits and not have to download hundreds of gigs of textures and videos they don't need to do their work.

Support file locking

This is the one feature all coders hate: the ability to lock a file in trunk so that no-one else can edit it. It is disruptive, annoying and just plain wrong. It is also necessary. Practice has shown that artists at large either can not or will not use revision control systems. There are many studios where the revision control system for artists is a shared network drive, with file names like character_model_v3_final_realfinal_approved.mdl. It "works for them" and trying to mandate a more process heavy revision control system can easily lead to an open revolt.

Converting these people means providing them with a better work flow. Something like this:
  1. They open their proprietary tool, be it Photoshop, Final Cut Pro or whatever.
  2. Click on GUI item to open a new resource.
  3. A window pops up where they can browse the files directly from the server as if they were local.
  4. They open a file.
  5. They edit it.
  6. They save it. Changes go directly in trunk.
  7. They close the file.
There might be a review step as well, but it should be automatic. Merge requests should be filed and kept up to date without the need to create a branch or to even know that such a thing exists. Anything else will not work. Specifically doing any sort of conflict resolution does not work, even if it were the "right" thing to do. The only way around this (that we know of) is to provide file locking. Obviously this should only be limitable to binary files.

Provide all functionality via a C API

The above means that you need to be able to deeply integrate the revision control system with existing artist tools. This means plugins written in native code using a stable plain C API. The system can still be implemented in whatever SuperDuperLanguage you want, but its one true entry point must be a C API. It should be full-featured enough that the official command line client should be implementable using only functions in the public C API.

Provide transparent Git support

Even if a project would want to move to something else, the sad truth is that for the time being the majority of contributors only know Git. They don't want to learn a whole new tool just to contribute to the project. Thus the server should serve its data in two different formats: once in its native format and once as a regular Git endpoint. Anyone with a Git client should be able to check out the code and not even know that the actual backend is not Git. They should be able to even submit merge requests, though they might need to jump through some minor hoops for that. This allows you to do incremental upgrades, which is the only feasible way to get changes like these done.

10 comments:

  1. Thank you for this thoughtful assessment of the technical landscape. I think you have provided eloquent design points along this roadmap.

    ReplyDelete
  2. The example about a feature branch, isn't thst with force-with-lease is about?

    ReplyDelete
    Replies
    1. Possibly. I tried to read that documentation page and I could not for the life of me understand what it is supposed to do. Which gives us yet another thing the replacement should provide:

      Have documentation that is actually readable and understandable by humans.

      Delete
  3. As an architect (not of software but of buildings) and artist, I totally agree.
    Currently I found no good solution to version binary cad, 3d files, images... currently used scheme in major studios?
    2020-11-09_my house.file
    2020-11-10_my house.file
    2020-11-22_my house Erik fix.file
    2020-12-02_my house changed roof.file

    ...

    ReplyDelete
  4. Are you aware of Perforce/Helix? I don't want to advertise this (commercial) application, but some of the points you're making seem to match its features (server-based, file-locking). I haven't used it in production (especially not what seems to be a git-compatible interface) but have looked at it as a way to handle large binary files.

    Since a creative studio relies on a lot of different applications, using a plain file server still seems to be the most compatible and transparent way to handle files. You just have to make it easy for artists to use a versioning scheme so you don't end up with what you've described (my_file_v5_final_approved_...)

    ReplyDelete
    Replies
    1. From what I remember Perforce is server-only. And terrible. Only used it for a while ages ago, though.

      Delete
  5. I think the best solution would be to have git for mergable/diffable files and perforce for binary assets.

    It would also be good if artist/designers/architects etc. used mergable file formats for their work, but that is not really possible today.

    ReplyDelete
    Replies
    1. The most important thing a version control system gives you is atomic state. For that you need only one system, two separate ones don't really work.

      Delete
    2. I think that atomic state is way less important for binary/unmergable assets (compared to code). So from a game dev perspective a dual system should work way better than a single one.

      Delete
  6. You don't even mention the name of the project "basically Git, but with a mathematically proven model for X"

    I think you might refer to Pijul https://pijul.org/ :

    "Pijul is a free and open source (GPL2) distributed version control system. Its distinctive feature is to be based on a sound theory of patches, which makes it easy to learn and use, and really distributed."

    It gets recent news :
    https://pijul.org/posts/2020-11-07-towards-1.0/
    https://pijul.org/posts/2020-12-19-partials/

    and

    Pijul - The Mathematically Sound Version Control System Written in Rust
    https://initialcommit.com/blog/pijul-version-control-system

    Q&A with the Creator of the Pijul Version Control System
    https://initialcommit.com/blog/pijul-creator

    ReplyDelete