Thursday, May 11, 2023

The real reason why open source software is better

The prevailing consensus at the current time seems to be that open source software is of higher quality than corresponding proprietary ones. Several reasons have been put forth on why this is. One main reason given is that with open source any programmer in the world can inspect the code and contribute fixes. Closely tied to this is the fact that it is plain not possible to hide massive blunders in open source projects whereas behind closed walls it is trivial.

All of these and more are valid reasons for improved quality. But there are other, more sinister reasons that are usually not spoken of. In order to understand one of them, we need to first do a slight detour.

Process invocation on Windows

In this Twitter thread Bruce Dawson explains an issue he discovered while developing Chrome on Windows. The tl/dr version:

  • Invoking a new process on Windows takes 16 ms (in the thread other people report values of 30-60 ms)
  • 10 ms of this is taken by Windows Defender that scans the binary to be executed
  • Said executables are stored in a Defender excluded directory
  • 99% of the time the executable is either the C++ compiler or Python
  • The scan results are not cached, so every scan except the first one is a waste of time and energy
  • The Defender scanner process seems to be single threaded making it a bottleneck (not verified, might not be the case)
  • In a single build of Chrome, this wastes over 14 minutes of CPU and wall clock time
Other things of note in the thread and comments:

  • This issue has been reported to Microsoft decades ago
  • The actual engineers working on this can't comment officially but it is implied that they want to fix it but are blocked by office politics
  • This can only be fixed by Microsoft and as far as anyone knows, there is no work being done despite this being a major inconvenience affecting a notable fraction of developers

What is the issue?

We don'r know for sure due to corporate confidentiality reasons, but we can make some educated guesses. The most probable magic word above is "office politics". A fairly probable case is that somewhere in the middle damagement chain of Microsoft there is a person who has decreed that it is more important to create new features than spend resources on this as the code "already works" and then sets team priorities accordingly. Extra points if said person insists on using a Macbook as their work computer so their MBA friends won't make fun of them at the country club. If this is true then what we have is a case where a single person is making life miserable for tens (potentially hundreds) of thousands of people without really understanding the consequences of their decision. If this was up to the people on "the factory floor", the issue would probably have been fixed or at least mitigated years ago.

To repeat: I don't know if this is the case for Microsoft so the above dramatization is conjecture on my part. On the other hand I have seen exactly this scenario play out many times behind the scenes of various different corporations. For whatever reason this seems to a common occurrence in nonpublic hierarchical organisations. Thus we can postulate why open source leads to better code in the end.

In open source development the technologically incompetent can not prevent the technologically competent from improving the product.

Shameless self-promotion

If you enjoyed this text and can read Finnish, you might enjoy my brand new book in which humanity's first interplanetary space travel is experienced as an allegory to a software startup. You can purchase it from at least these online stores: Link #1Link #2. Also available at your local library.

Appendix: the cost and carbon footprint

Windows 11 has a bunch of helpful hints on how to reduce your carbon footprint. One of these is warning if your screen blanking timeout is less than the computer suspend timeout. At the same time they have in the core of their OS the gross inefficiency discussed above. Let us compare and contrast the two.

The display on a modern laptop consumes fairly little power. OTOH running a CPU flat out takes a lot of juice. Suspending earlier saves power when consumption is at its lowest, whereas virus scanners add load at the point of maximum resource usage. Many people close the lid of their computer when not using it so they would not benefit that much from different timeout settings. For developers avoiding process invocations is not possible, that is what the computer is expected to do.

Even more importantly, this also affects every single cloud computer running Windows including every Windows server, CI pipeline and, well, all of Azure. All of those machines are burning oil recomputing pointless virus checks. It is left as an exercise to the reader to compute how much energy has been wasted in, say, the last ten years of cloud operations over the globe (unless Microsoft runs Azure jobs with virus scanners disabled for efficiency, but surely they would not do that). Fixing the issue properly would take a lot of engineering effort and risk breaking existing applications, but MS would recoup the money investment in electricity savings from their own Azure server operations alone fairly quickly. I'm fairly sure there are ex-Googlers around who can give them pointers on how to calculate the exact break-even point.

All of this is to say that having said energy saving tips in the Windows UI is roughly equivalent to a Bitcoin enthusiast asking people to consider nature before printing their emails.

1 comment:

  1. The reason why open-source software is better is that the users' priorities and incentives cannot possibly align with those of the vendors. The rationale for closed source is to lock those conflicts in place.

    ReplyDelete