Tuesday, December 31, 2019

How about not stabbing ourselves in the leg with a rusty fork?

Corporations are funny things. Many things no reasonable person would do on their own are done every day in thousands of business conglomerates around the world. With pride even. Let us consider as an arbitrary example a corporation where every day is started by employees stabbing themselves in the leg with a rusty fork. This is (I hope) not actually done for real, but there could be a company out there where this is the daily routine.

If you think that such a thing could possibly never happen, congratulations on having never worked in a big corporation. Stick with that if you can!

When faced with this kind of pointless and harmful routine, one might suggest not doing it any more or replacing it with some other, more useful procedure. This does not succeed, of course, but that is not the point. The reasons you get back are the interesting thing, because they will tell you what kind of manager and coworkers you are dealing with. Here are some possible options, can you think of more?

The survivor fallacist

This is a multi-billion dollar company. If stabbing oneself in the leg was bad, as you seem to claim, we could not have succeeded.

The minimum energy spender

It would take too much work to get this changed. Just bite the bullet and do it every morning. You're better off this way.

The blame shifter

This is mandated by our head office, we can't do anything about this even if we wanted to.

The metric optimizer

Our next year's bonus metric will measure the number of leg stabbings reduced that year. We must get as many of them in this year as we possibly can.

The traditionalist

We have always done this. We must always do it.

The cornered animal

How dare you! Do you have any idea how much work it is to get pre-rusted forks? They are all made of stainless steel nowadays. Your derogatory insinuations are a slap on the face of all people working to keep this system running!

The folklorist

This is a commonly accepted best practice in software companies, thus we should do it also.

The brainwashee

This is actually a great invention. Getting a nice jolt of adrenaline first thing in the morning really wakes you up and gives you focus for the entire day. Try it for a month or two! You'll see.

The control freak messiah

This procedure was put in place by the founder/CEO. You do not challenge his choices if you know what is good for you.

The team spiritist

If you don't stab yourself in the leg, you are setting up a very bad example that demoralizes everybody else who do their part diligently.

And finally the (sadly) most common one

Our product is special.

Saturday, December 28, 2019

What can clang-format teach us about the human condition?

Most people who do programming have taken part in at least one code formatting war. Usually these come about when companies want to standardise their code bases and thus want everything formatted according to a single style. Style wars, much like real wars, are not pleasant places to be in. They cause havoc and destruction, make reasonable people into life-long sworn enemies and halt work on anything useful.

In a typical style argument statements like the following are often thrown about:

  • Indentation should be done with tabs, because everyone can set tab width to whatever they want in their editor.
  • The opening brace must be on the same line as its preceding clause. This saves vertical space and thus makes the code more readable.
  • The opening brace must be on its own line. This makes code blocks stand out better and thus makes the code more readable.
  • When laying things like arguments vertically, the separating comma must be at the beginning of the line rather than the end. In this way when you add or remove an entry, the diff is always only one line.
  • In a declaration like int *bob, the asterisk must be next to the variable name, because that is what it binds to.
  • In a declaration like int* bob, the asterisk must be next to the type name, because "pointerness" is logically a feature of the type, not the variable.
  • Class variables must begin with m_ so they stand out better.
  • Class variables must not be separated with a prefix. The syntax highlighter will already draw them in a different color and if you have so many variables in your methods that you can't immediately tell which is which, your methods are too big and must be split.
  • Et cetera, et cetera, ad infinitum, ad nauseaum.
It is unknown when code formatting wars first began. Given that FORTRAN was the first real programming language with an actual syntax and was first released in 1957, the answer probably is "way, way before that". A reasonable guess would be the first or second design meeting on the syntax. Fighting over code style kept raging for almost sixty years after that. The arguments were the same, the discussion was the same, no progress was ever made. Then clang-format was introduced and suddenly everything changed.

This was surprising, because automatic code formatters had existed for decades and clang-format was "the same, just slightly better". Yet it made this problem mostly go away. Why?

Enter the human element

With all existing formatters it was fairly easy to find code where it failed. C macros were especially treacherous in this regard. This meant that either one needed to manually add (and update) comments that disabled formatting for some blocks or the formatter was run only every now and then by hand and the result had to be inspected and fixed by hand after the fact. With clang-format this manual work went effectively to zero. You could just run it at any time, even automatically before every commit. In a weird kind of backwards way once we had the correct solution, we could finally understand what the real problem was.

Every programmer writes code in their own way. Maybe they put braces on the same line, maybe not. Maybe they indent with spaces, maybe not. The details don't matter, the real point is that the writing code in this style is effortless. It just flows from your brain to the screen. Coding in any other style means spending brain energy on either typing in some non-natural style or fixing the code afterwards. This is manual and tedious work, just the kind that programmers hate with a passion. Thus when the threat of an externally mandated code style appears, the following internal monologue takes place:
If they choose a style different than mine, then I will forever have to write in a style that is unnatural to me. This is tedious. However if I spend some energy now and convince everyone else to use my style, then I can keep on doing what I have been doing thus far. All I have to do is to factually explain why my chosen style is the best, and since other programmers are rational whey will understand my point, agree and adopt my chosen style.
Unfortunately everyone else participating in the debate has the exact same idea and things end in a stalemate almost immediately. The sunk-cost fallacy ensures that once a person has publicly committed to a style choice, they will never budge from it.

Note the massive dichotomy at play here. The real reason people have for any style choice is "this is what I have gotten used to" but when they debate their chosen style they always, always use reasoning that aims to be objective and scientific. At this point you might want to pause and reread the sample arguments listed above. They follow this reasoning exactly and most claim to improve some real world objective metric such as readability. They are also all lies. These are all post-rationalising arguments, invented after the fact to make the opinion you already have sound as good as possible. They are not the real reason. They have never been the real reason. It is unlikely they will ever be the real reason. But the debate is carried on as if they were the real reason. This is why it will never end.

The lengths people are willing to go to in their post-rationalising arguments is nothing short of astounding. In this video on indenting with tabs vs spaces many tab advocates say that indenting with tabs is better because "you only need to press tab once rather than press space multiple times". Every single programmer's text editor since 1985 (possibly 1975 and potentially even 1965) has had the feature where pressing the tab key does the logically equivalent indent with spaces. Using this as an argument only shows that you have not done even the most minimal of thinking on the issue, but instead just have already made up your mind and don't want to even consider changing it.

This is why code style discussions never go anywhere. They are not about bringing people together to find the best possible choice. They are about trying to make other people submit to your will by repeatedly bashing them on the head with your style guide. This does not work because the average programmer's head is both thicker and more durable than the average style guide.

Tuesday, December 17, 2019

Notes on tax issues on selling digital goods internationally

Note: This blog post should not be considered tax, legal or any other sort of advice. There are no guarantees of any kind, even that any of the information below is correct. Consult a qualified professional before embarking on any international business ventures.

Some tax requirements for selling pretty much anything internationally (that I found out by googling and looking up random government web sites)

Nowadays it is easy to start a web store and sell products such as digital downloads to any country in the world. You might think that simply paying appropriate taxes in your own country would be enough. It's not. There are cases where you need to pay taxes or fees to other countries as well, specifically the ones you sell your products to. Surprisingly this can be the case even for very small amounts of money.

There seem to be three main cases: USA, the EU and individual countries. Let's go through them in increasing order of difficulty.

Individual countries

Most countries have a requirement that if you sell digital goods to them you need to register in said country, collect the appropriate amount of tax on your sales and then report and pay it. Most countries have a lower limit under which you don't need to do anything. This is usually on the order of 10 000 to 100 000 euros per year, which small scale operations won't ever reach. Unfortunately in some countries this limit is zero. That is, if your sales are even one euro, you need to register and do the full bureaucratic dance. These countries include Albania, Russia, South Korea and India among others. Lists of limits per country can be found online. Be careful when reading them, though, as web pages can get out of date quickly.

For small businesses the only realistic choice is to geoblock countries where the tax limit is zero. Dealing with the hassles is just not worth it. This is fairly easy, as most payment providers have good geoblocking tools.

VAT in the European Union

In the EU you can do the same registration to each country as for individual countries discussed above. However there is also a new, simplified system for digital services called VAT MOSS. The idea there is that you don't need to register to each country, instead you can report VAT purchases to your own tax authorities and they take care of the rest. This is highly convenient, because you can then sell to every EU member state but only have to deal with the bureaucracy of one of them.

There is a similar thing for non-EU companies, but I have not looked at how it works in detail for obvious reasons. Just note that the registration limit for EU is also zero, meaning if you must register if you sell anything at all to the EU. Sadly this means that for some people geoblocking all of EU is an entire reasonable thing to do.

Sales tax in the USA

The good news is that the USA does not have a federal sales tax. The bad news is that each state has its own laws on sales taxes. Whether or not you need to pay sales taxes on a given state depends on whether you have a "nexus" in the state. This used to mean something like an office. However then buying stuff over the Internet happened and now having a nexus simply means selling more than a given threshold's worth of goods or services to people in the state. Lists of these limits per state can be found online as well.

This is where things get unpleasant for small players. The limit for Kansas is zero, meaning any sales to Kansas means you have to register, gather sales tax and pay it to the state authorities. Other states have reasonable limits such as 100 000 dollars, but the obligation is also triggered if you have more than 200 sales events in total regardless of their value. This is a lot easier to trigger by accident.

Unfortunately payment processors don't seem to provide state-based geoblocking. Thus if you enable sales to the USA and that leads to even one sale in Kansas, you just got hit by a bunch of legal requirements. Dealing with all of these is not really feasible for small operations. On the other hand blocking all of USA means losing a fairly large chunk of your revenue.

To keep things from being too simple, there are web pages that claim that having an "economic nexus" is actually different for digital products. Based on that page Kansas does not have a sales tax at all for purely digital products, so you could sell arbitrary amount of products there without needing to pay any sales tax. Which one of these is correct? I don't actually know. I have spent the entire day reading up on international tax laws and now my head hurts and I just want to close the computer and have a drink.

What does this mean for tipping services like Patreon, crowdfunding et al?

Again: I don't really know. However a case can reasonably be made (at least by a tax collector that wants to get your money) that paying through one of those platforms constitutes a "sale of services" or something similar and thus subject to a sales tax. For example Patreon's documentation page states that they take care of paying VAT for EU customers but that customers in the USA need to take care of their tax responsibilities themselves. Presumably this also applies to all other countries.

Thus it may be the case that everyone who is running a tipping service and has taken any money from countries or states with a zero limit on sales taxes have unexpectedly been burdened with legal responsibilities to tens of different tax offices around the world.

In any case all of this means that things like gathering funds for open source development is a lot more complicated than appears at first glance.

Sunday, December 8, 2019

There are (at least) three distinct dependency types

Using dependencies is one of the main problems in software development today. It has become even more complicated with the recent emergence of new programming languages and the need to combine them with existing programs. Most discussion about it has been informal and high level, so let's see if we can make it more disciplined and how different dependency approaches work.

What do we mean when we say "work"?

In this post we are going to use the word "work" in a very specific way. A dependency application is said to work if and only if we can take two separate code projects where one uses the other and use them together without needing to write special case code. That is, we should be able to snap the two projects together like Lego. If this can be done to arbitrary projects with a success rate of more than 95%, then the approach can be said to work.

It should be especially noted that "I tried this with two trivial helloworld projects and it worked for me" does not fulfill the requirements of working. Sadly this line of reasoning is used all too often in online dependency discussions, but it is not a response that holds any weight. Any approach that has not been tested with at least tens (preferably hundreds) of packages does not have enough real world usage experience to be taken seriously.

The phases of a project

Every project has three distinctive phases on its way from source code to a final executable.
  1. Original source in the source directory
  2. Compiled build artifacts in the build directory
  3. Installed build artifacts on the system
In the typical build workflow step 1 happens after you have done a Git checkout or equivalent. Step 2 happens after you have successfully built the code with ninja all. Step 3 happens after a ninja install.

The dependency classes

Each of these phases has a corresponding way to use dependencies. The first and last ones are simple so let's examine those first.

The first one is the simplest. In a source-only world you just copy the dependency's source inside your own project, rewrite the build definition files and use it as if it was an integral part of your own code base. The monorepos used by Google, Facebook et al are done in this fashion. The main downsides are that importing and updating dependencies is a lot of work.

The third approach is the traditional Linux distro approach. Each project is built in isolation and installed either on the system or in a custom prefix. The dependencies provide a pkg-config file explaining which defines both the dependency and how it should be used. This approach is easy to use and scales really well, the main downside being that you usually end up with multiple versions of some dependency libraries on the same file system, which means that they will eventually get mixed up and crash in spectacular but confusing ways.

The second approach

A common thing people want to do is to mix two different languages and build systems in the same build directory. That is, to build multiple different programming languages with their own build systems intermixed so that one uses the built artifacts of the other directly from the build dir.

This turns out to be much, much, much more difficult than the other two. But why is that?

Approach #3 works because each project is clearly separated and the installed formats are simple, unambiguous and well established. This is not the case for build directories. Most people don't know this, but binaries in build directories are not the same as the installed ones Every build system conjures its own special magic and does things slightly differently. The unwritten contract has been that the build directory is each build system's internal implementation detail. They can do with it whatever they want, just as long as after install they provide the output in the standard form.

Mixing the contents of two build systems' build directories is not something that "just happens". Making one "just call" the other does not work simply because of the N^2 problem. For example, currently you'd probably want to support C and C++ with Autotools, CMake and Meson, D with Dub, Rust with Cargo, Swift with SwiftPM, Java with Maven (?) and C# with MSBuild. That is already up to 8*7 = 56 integrations to write and maintain.

The traditional way out is to define a data interchange protocol to declare build-dir dependencies. This has to be at least as rich in semantics as pkg-config, because that is what it is: a pkg-config for build dirs. In addition to that you need to formalise all the other things about setup and layout that pkg-config gets for free by convention and in addition you need to make every build system adhere to that. This seems like a tall order and no-one's really working on it as far as I know.

What can we do?

If build directories can't be mixed and system installation does not work due to the potential of library mixups, is there anything that we can do? It turns out not only that we can, but that there is already a potential solution (or least an approach for one): Flatpak.

The basic idea behind Flatpak is that it defines a standalone file system for each application that looks like a traditional Linux system's root file system. Dependencies are built and installed there as if one was installing them to the system prefix. What makes this special is that the filesystem separation is enforced by the kernel. Within each application's file system only one version of any library is visible. It is impossible to accidentally use the wrong version. This is what traditional techniques such as rpath and LD_LIBRARY_PATH have always tried to achieve, but have never been able to do reliably. With kernel functionality this becomes possible, even easy.

What sets Flatpak apart from existing app container technologies such as iOS and Android apps, UWP and so on is its practicality. Other techs are all about defining new, incompatible worlds that are extremely limited and invasive (for example spawning new processes is often prohibited). Flatpak is not. It is about making the app environment look as much as possible like the enclosing system. In fact it goes to great lengths to make this work transparently and it succeeds admirably. There is not a single developer on earth who would tolerate doing their own development inside a, say, iOS app. It is just too limited. By contrast developing inside Flatpak is not only possible and convenient, but something people already do today.

The possible solution, then, is to shift the dependency consumption from option 2 to option 3 as much as possible. It has only one real new requirement: each programming language must have a build system agnostic way of providing prebuilt libraries. Preferably this should be pkg-config but any similar neutral format will do. (For those exclaiming "we can't do that, we don't have a stable ABI", do not worry. Within the Flatpak world there is only one toolchain, system changes cause a full rebuild.)

With this the problem is now solved. All one needs to do is to write a Flatpak builder manifest that builds and installs the dependencies in the correct order. In this way we can mix and match languages and build systems in arbitrary combinations and things will just work. We know it will, because the basic approach is basically how Debian, Fedora and all other distros are already put together.