Random access to files is a useless feature
For most use cases this is probably true. But that was not the main point of the experiment. The point was to enable parallel compression without making the final size much bigger. Even more important was to enable parallel decompression. Very few file formats support that without losing compression performance. As a classical example PKZIP supports parallel decompression but its archives are a lot bigger than packed tar files.
Tarballs are just fine and need no changes
Many people were of the opinion that tarballs are "good enough for source distribution". This is probably correct. Unfortunately when people hear the word "tar" they immediately think of source tarballs. That is only a small fraction of tar usage in the world. Many file formats have tar files (or equivalent) hidden inside them. Thus all operations on said files are serial and can use only one CPU at a time.
As an example, the deb file format consists of a bunch of metadata and an embedded tar file. RPM is the same, except that it has a cpio archive rather than tar. This means that installing updates is an inherently serial operation. Update packages can only be installed one at a time, and because they are in a tarlike format, only one CPU can be used for processing them. This is particularly slow on platforms such as the Raspberry Pi that have underpowered multicore CPUs.
When running workloads such as a CI builder, there are multiple steps that are serial when using tar-like file formats. As an example pbuilder has the following steps:
- unpack base image <- serial
- install updates <- serial
- install dependencies <- serial
- unpack source <- serial
- compile <- parallel
- compress final result <- serial
This is not suitable for production due to X
It was never meant to. It was merely an experiment to see if a better file format were possible.
Pack/unpack operations are bound by disk, not by CPU
This is true on some platforms but not all of them. LZMA compression is almost always CPU bound even on the fastest CPUs and so is decompression if you have a fast SSD. Disks are getting faster all the time, CPUs are not, so CPUs will become even more of a bottleneck in the future.
What we have is good enough, the benefits are not worth an overhaul and even if they were inertia would prevent uptake
Maybe. But experiments are fun to do regardless.