Monday, March 2, 2020

Meson manual sales numbers and a profitability estimate

The Meson Manual has been available for purchase for about two months now. This is a sufficient amount of time to be able to estimate total sales amounts and the like. As one of the goals of the project was to see if this could be a reasonable way to compensate FOSS maintainers for their work, let's go through the numbers in detail.

Income

At the time of writing, 45 copies of the book have been sold amounting to around 1350€ of income. The first month's sales were about 700€ and the second one's about 500€. This is the common sales pattern where sales start at some level and then gradually decrease as time passes. Depending on how one calculates the drop-off, this would indicate total sales over a year of 2000-2500€.

Expenses

The ecommerce site charges ~10€ per month, which is 120€ per year. Credit card processing fees should land somewhere in the 200-300€ area depending on sales. Accounting services come to about 500€. Then there are many small things that probably sum up to 100-200€ in a year. After these immediate expenses roughly 800-1300€ remains. Unfortunately this is where things get uglier.

The book was launched in my presentation at Linux.conf Australia. Having some sort of a launch is a necessity, because just putting things on the Internet does not work. This cost about 2000€ total in travel and lodging expenses. This single expense takes out all remaining income and then some. One could have gone to a conference nearer by but as Finland is far away from everywhere, the expenses are easily 500€ even for continental Europe. This also assumes that one would have gotten a talk accepted there, which is not at all guaranteed. In fact the vast majority of Meson talks I have submitted have been rejected. Thus depending on how you look, we are now either 700€ in the red or 300-800€ in the black in the best possible case.

Writing a book takes time. A lot of time. The way I got mine was to make a deal with my employer to only work four days a week for several months. I spent that one day a week (specifically Wednesday) writing the book and working out all the requisite bureaucracy. In my day job I'm a consultant and my salary is directly determined by the number of hours billed from customers. Doing the math on this says that writing the manual cost me ~9000€ compared to just being at work.

With this the total compensation of the manual comes out to a loss between 8000€ and 9500€.

So was it worth it?

It depends? As a personal endeavor writing, publishing and selling a full book is very satisfying and rewarding (even though writing it was at times incredibly tedious). But financially? No way. The break-even point seems to be about 10× the current sales. If 100× sales were possible, it might be sufficient amount for more people to take the risk and try to make a living this way. With these sales figures it's just not worth it.

Bonus chapter: visibility and marketing

The biggest problem of any project like this is marketing: how to get your project visible. This is either extremely hard or downright impossible. As an example, let's do some Fermi estimations on reachability on Twitter. Suppose you post about your new project on Twitter, how many of your followers would then buy the product? Ten percent is probably too much, and one tenth of a percent is too small, so let's say one percent. Thus to sell 400 copies you'd need to have 40 000 direct Twitter followers. Retweets do not count as their conversion rate is even lower.

Your only real chance is to get visibility on some news media, but that is also difficult. They get inundated by people and projects that want to get their products to be seen. Thus news sites publish news almost exclusively on large established projects as they are the things that interest their readers the most. Note that this should not be seen as a slag against these sites, it's just the nature of the beast. A news site posting mostly about crowdfunding campaigns of small unknown projects would not be a very tempting site and would go out of business quickly.

Advertising might work, but the downsides include a) it costs more money and b) the target audience for a programming manual is the set of people who are the most likely to use ad blockers or otherwise ignore online ads (I have never bought anything based on an ad I have seen online).

Sunday, March 1, 2020

Unity build test with Meson & LibreOffice

In a previous blog post we managed to build a notable chunk of LibreOffice with Meson. This has since been updated so you can build all top level apps (Writer, Calc, Impress, Draw). The results do not actually run, so the conversion may seem pointless. That is not the case, though, because once you have this build setup you can start doing interesting experiments on a large real world C++ code base. One of these is unity builds.

A unity build is a surprisingly simple technique to speed up builds, especially C++. The idea is that instead of compiling source files individually, you build them in larger batches by creating files like these and compiling them instead:

#include<source1.cpp>
#include<source2.cpp>
#include<source3.cpp>
// etc

One of the main downsides of this technique is writing and maintaining the unity files by hand. Fortunately Meson has builtin support for creating unity files for build targets. In the next release you can even specify how many source files you want to have in a single unity source. Enabling unity builds for a single target is simple and consists of adding the following keyword argument to a target's definition:

override_options: ['unity=on']

In this experiment we used LibreOffice's Writer target, which is a single shared library consisting of almost 700 source files. The unity block size was Meson's default of 4.

The results

Compiling source code as a unity build for the first time usually leads to build failures. This is exactly what happened here as well. There are many reasons for this. The most popular are name clashes from static and anonymous namespace functions and of classes with common names that are used without namespace qualifications. LO turned out to have at least three different classes called Size. All these issues need to be fixed either by renaming or by adding namespace qualifiers. This is boring manual work, but on the other hand you may find duplicated static functions across your code base.

Once all the issues have been fixed we can do actual measurements. The test machine was an 8 thread i7-3770 with an SSD drive and the buildmode was set to debug. The full build times are these:

Regular   10m 
Unity      4m 32s

This unity build is over 50% faster than the regular one, which is a fairly typical result. The difference would have been even bigger if we had used a bigger unity block size. Incremental builds were tested by deleting one object file and rebuilding.

Regular 26s
Unity   22s

Typically incremental unity builds are slower than regular ones, but in this case it is actually faster. This is probably because unity builds produce smaller output files that have less debug data. Thus the linker has less work. Increasing the unity block size would make the incremental build time slower.

Converting the code to compile as a unity build took on the order of three hours. Ironically most of it was spent waiting for compilations to finish. Since this saves around 5 minutes per build, the time investment is recovered after the development team has done 60 full builds. The code changes have been done with as little effort as possible, so polishing this to production quality would probably take 2-3 times as long.

Do try this at home!

The code is available in the unitytest branch of this repo, it has only been tested with Meson trunk on Linux. There are two targets that use unity builds: vcl and sw. You can toggle unity builds as well as the unity block size per target with the override_options keyword argument.

Sunday, February 23, 2020

Open source does not have a reward mechanism for tedious

A common talking point about everyday broken things is "I can't believe we put a man on the moon, but we still don't have <something seemingly straightforward>". The phrase is thankfully going out of style (maybe people are starting to forget that we did, in fact, put a man on the moon) but the sentiment still holds true. There are many things in our every day life that should be a lot better but aren't. This is also the case for open source development.

Since about the year 2000, when the term "open source" came into common usage, tens of thousands of software projects have sprung up, seemingly from thin air. The projects range from simple scripts to incredibly large and complex pieces of software. Some are purely volunteer run such as the Mame and Dolphin emulators whereas others have corporate support such as the Webkit browser engine. Yet, many seemingly simpler things are broken. Connecting projectors to laptops is still a bit of a gamble, touchpads behave wonkily almost any open source project's documentation is either thin or nonexisting, many dev tools break when faced with nonstandard setups and so on.

This does not seem to make sense. Writing a full HTML rendering engine from scratch is a lot harder than, say, writing documentation for the classes of a widget toolkit. Like most real world problems there are many causes and effects, but in this post we are going to examine the fact that the word "hard" has two different meanings. Things can be either be hard because they are difficult or they can be hard because they are tedious.

Many software developers are creators and builders. They are drawn to problems of the first type. The fact that they are difficult is not a downside, it is a challenge to be overcome. It can even be a badge of merit which you can wave around your fellow developers. These projects include things like writing your own operating system or 3D game engine, writing device drivers that saturate the fastest of transfer links, lock free atomic parallelism, distributed file systems that store exabytes of data as well as embedded firmware that has less than 1 kilobyte of RAM. Working on these kinds of problems is rewarding on its own, even if the actual product never finishes or fails horribly when eventually launched. They are, in a single word, sexy.

Most problems are not like that, but are instead the programming equivalent of ditch digging. They consist of a lot of hard work, which is not very exciting on its own but it still needs to be done. It is difficult to get volunteers to work on these kinds of problems and this is where the problem gets amplified in open source. Corporations have a very strong way to motivate people to work on tedious problems and it is called a paycheck. Volunteer driven open source development does not have a way to incentivise people in the same way. This is a shame, because the chances of success for any given software project (and startup) is directly proportional to the amount of tedious work people working on it are willing to do.

Sunday, February 16, 2020

Building even more of LibreOffice with Meson, now with graphics

I have converted some more of LO's build system to Meson as an experiment. This is the current status:


Note that this contains only the main deliverables, i.e. the shared libraries and executables. Unit tests and the like are not converted apart from a few sample tests.

It was mentioned in an earlier blog post that platform abstraction layers are the trickiest ones to build. This turns out to be the case here also. LO has at least three such frameworks (depending on how you count them). SAL is the very basic layer, UNO is a component model used to, for example, expose functionality to Java. Finally VCL is the GUI toolkit abstraction layer. Now that we have the GUI toolkit and its GTK plugin built we can build a VCL sample application and launch it. It looks like this:


This is again fairly typical of existing projects that have custom build systems, where they require a specific layout of files in the build directory or some environment settings that specify where to run things. This does not seem to be a case of incorrect build configuration, since trying to run executables directly from the build dir of a checkout built with the existing Make-based system fails in much the same way. This is a problem because the project builds code generator executables and uses them to create sources during build, and running them directly does not work. This is the sort of issue which is hard to debug as an outsider, and could really use help from people who know what the code is actually doing.

The curious can get full code can be found in this Github repo.

More info on Meson

I have written and self-published a user manual on Meson. It can be purchased via this web site

Sunday, February 9, 2020

Trying to build a slightly larger slice of Libreoffice with Meson

One of the few comments I got on my previous blog post was that building only the sal library is uninteresting because it is so small. So let's go deeper and build the base platform of LO, which is called Uno. Based on docs and slides from conference presentations, it is roughly the marked area in LO's full dependency graph.


The gloves come off

A large fraction of the code is generated from IDL files. So one might imagine that if one has a file like com/sun/foo/bar.idl then one would convert that into a header file com/sun/foo/bar.hpp using a program called idlc. One would be wrong. That is not what happens at all.

When trying to understand what humongous Make builds are doing, the best approach is not to even try. Do not look at the Makefiles, or the macro code they include. Instead run the build, capture all executed commands with Make's verbose mode and reverse engineer what the sysetm is doing based on them. Implementing the same functionality is a lot easier this way, because you know exactly what command lines you must end up with. In this particular case we find that the idl files are compiled to an intermediate binary blob using a program called unoidl-write.  After that the header files (yes, multiple per idl file) are created from this blob with a different program called cppumaker.

This has the unfortunate side effect that when you run the header generator you can not know in advance what files the program will generate and what sort of a directory hierarchy they are put in without a side channel. This makes it very difficult to integrate with build systems, because they need to know outputs exactly in order to make build steps reliable. Java has the exact same problem. Because of the way it handles inner classes, the output set of a Java compilation is unknowable beforehand. If you ever find yourself designing a code generator tool, please make sure you don't have this Sunism in your program, as it makes integration needlessly complicated and unreliable.

Even if you manage to recreate the command lines with a different system, things may still fail in interesting ways. In LO's case the internal SAL file system library has a policy that callers are not allowed to create temporary files in a directory with a relative pathname. This limitation is reported with the helpful error message of "could not open  for writing". (In case blog aggregators break the formatting, there are two spaces between the words "open" and "for".) It is also possible that relative paths appear to succeed but produce error messages such as "can not read file in legacy format" later in the process.

Other fun stuff

If one were to look at the command lines that eventually get invoked, they would look like this:


This may seem overwhelming, so let's add some markers.


An interesting question is how many processes does this spawn per source file? I don't know the correct answer, but a reasonable guess would be either 4 or 7 (5 or 8 if you count the outer shell).

Status

The code can be obtained via Github. It's Linux only and some bits are fairly ugly. It does generate the UNO code and build all the libs, but unit tests, Java and other pieces are missing and dependency tracking does not work for code generators. The build definitions consist of ~550 lines of Meson. There are a few Python helper programs in addition to these. They have a lot of duplicated code, but for a prototype such as this one it's tolerable.

I tried to go a bit further and build basegfx, but then I got stuck. The generated code has #ifdef toggles that define whether some variables are defined as ints or enums. Other libraries fail to build either when the define is set or when it is unset. For some reason basegfx has multiple files that fail both when the define is set and when it is unset (with different error messages, though). Either the code generation step goes awry or there are even more magic defines that have to be set in order for things to work.

Wednesday, February 5, 2020

Building (a very small subset of) LibreOffice with Meson

At FOSDEM I got into a discussion with a LibreOffice dev about whether it would be possible to switch LO's build system to Meson. It would be a lot of manual work for sure, but would there be any fundamental problems. Since a simple test can eliminate a ton of guesswork, I chose to take a look.

Like most cross platform programs, LO has its own platform abstraction layer called Sal. According to experience, these kinds of libraries usually have the nastiest build configurations requiring a ton of configure checks and the like. The most prominent example is GLib, whose configure steps are awe-inspiring.

Sal turned out to be fairly simple to port to Meson. It did not require all that much in platform setup, probably because the C++ stdlib provides a lot more out of the box than libc. After a few hours I could compile all of Sal and run some unit tests. The results of the experiment can be found in this Github repo. The filenames and layouts are probably not the same as in the "real" LO build, but for a simple experiment like this they'll do.

So what did we learn?

  • Basic compilation seems to be straightforward, but there were some perennial favourites including source files that you must not compile standalone, magic compilation defines, using declarations hidden at the bottom of public headers behind #ifdefs and so on.
  • There may be further platform funkiness in the GUI toolkit configuration step.
  • A lot of code seems to be generated from IDL files, which might require some work.
  • Meson's Java support probably needs some work to build all the JARs.
  • Meson should scale at least to building LO itself, building all dependency projects at the same time is a different matter.
The last of these is not really an issue on Linux, since you get almost all deps from the system. On Windows and Mac you can achieve the same by using something like Conan for dependency management. However Meson might be scalable enough to build LO and all deps in a single go, and if it's not then making it do that would be an interesting optimization challenge.

How would one express LO's sheer size with a single number?

It has more than 150 000 lines of Makefiles.

Monday, February 3, 2020

Creating your own physical or PDF manual

This post is part 2 of N describing the creation process of the Meson manual, which you can purchase via this web site. Part 1 can be read here.

There are three main ways of producing a technical book using only free software.

  1. LibreOffice + Scribus
  2. Libreoffice only
  3. LaTeX
The first option of these mimics the way traditional book publishers work. This is highly convenient when the final formatting is done by a dedicated person rather than the author. It is not so nice for solo operators, because migrating changes from LO to the Scribus program is tedious. So you either must only do the layout as the very final step or at some point switch to having the master data in Scribus and doing all edits directly there. These approaches are either cumbersome or require strong resolve to not keep changing the text after it has gone to the DTP program.

The second approach is easier, but the downside is that LibreOffice (or MS Office or other similar programs) do not produce documents that are as aesthetically pleasing as DTP programs. I don't know the specifics, but at least the line and page splitting algorithms seem to produce worse results. This is probably because they need to be fast enough to be used in real time. There are also various reliability and layout issues. For example it is difficult to get figures to remain where you want them as the text is edited, and I have experienced cases where entries in the bibliography disappear for no reason. Also, if you are not very pedantic in using styles, changing the global document appearance is problematic.

Enter LaTeX!

The Meson manual is written in LaTeX. It is a bit quirky, but there are many major features that other systems do not provide (or at least not easily). The main point is that in LaTeX you only write the structure of the document and the system takes care of all formatting, kind of like HTML and CSS work. The default look produced by LaTeX is magical in the way that it provides an air of gravitas to any piece of text automatically.

Splitting the style and formatting is nice in that you get to work in the "markdown" syntax level when editing but can easily see the typeset version of your text at any time. Since LaTeX is plain text, you can use revision control tools to manage your book sources just like source code. Some things that are difficult in GUI apps work effortlessly in LaTeX. Just the way it handles floating figures is great and saves you so much time and effort that it could be worth the price of admission by itself.

One of the big problems in writing a book about programming is to keep your code samples both up to date and working. LaTeX provides a simple and elegant solution to this problem. Since it is just a macro processing system, it provides a way to include text from standalone files on the file system. In the Meson manual all code samples are written in standalone projects and there is a script that builds and runs all of them. Or, in other words, with LaTeX you can write unit tests for your book.

The main downside of LaTeX is that its output looks exactly like LaTeX and if you want your book to look different, it takes a fair bit of work. The syntax takes a bit of getting used to and if your keyboard layout requires multiple keys to type a backslash, typing it can get tiring.  There are tools that can convert e.g. markdown to LaTeX to make the writing process easier, but usually you need to do some fine tuning on the output or use custom LaTeX packages to get the output you want

How many copies of the manual have been sold thus far?

31.

If you are a corporation and want to support the project by buying a bulk license for all your employees, send me an email.