Sunday, February 9, 2020

Trying to build a slightly larger slice of Libreoffice with Meson

One of the few comments I got on my previous blog post was that building only the sal library is uninteresting because it is so small. So let's go deeper and build the base platform of LO, which is called Uno. Based on docs and slides from conference presentations, it is roughly the marked area in LO's full dependency graph.


The gloves come off

A large fraction of the code is generated from IDL files. So one might imagine that if one has a file like com/sun/foo/bar.idl then one would convert that into a header file com/sun/foo/bar.hpp using a program called idlc. One would be wrong. That is not what happens at all.

When trying to understand what humongous Make builds are doing, the best approach is not to even try. Do not look at the Makefiles, or the macro code they include. Instead run the build, capture all executed commands with Make's verbose mode and reverse engineer what the sysetm is doing based on them. Implementing the same functionality is a lot easier this way, because you know exactly what command lines you must end up with. In this particular case we find that the idl files are compiled to an intermediate binary blob using a program called unoidl-write.  After that the header files (yes, multiple per idl file) are created from this blob with a different program called cppumaker.

This has the unfortunate side effect that when you run the header generator you can not know in advance what files the program will generate and what sort of a directory hierarchy they are put in without a side channel. This makes it very difficult to integrate with build systems, because they need to know outputs exactly in order to make build steps reliable. Java has the exact same problem. Because of the way it handles inner classes, the output set of a Java compilation is unknowable beforehand. If you ever find yourself designing a code generator tool, please make sure you don't have this Sunism in your program, as it makes integration needlessly complicated and unreliable.

Even if you manage to recreate the command lines with a different system, things may still fail in interesting ways. In LO's case the internal SAL file system library has a policy that callers are not allowed to create temporary files in a directory with a relative pathname. This limitation is reported with the helpful error message of "could not open  for writing". (In case blog aggregators break the formatting, there are two spaces between the words "open" and "for".) It is also possible that relative paths appear to succeed but produce error messages such as "can not read file in legacy format" later in the process.

Other fun stuff

If one were to look at the command lines that eventually get invoked, they would look like this:


This may seem overwhelming, so let's add some markers.


An interesting question is how many processes does this spawn per source file? I don't know the correct answer, but a reasonable guess would be either 4 or 7 (5 or 8 if you count the outer shell).

Status

The code can be obtained via Github. It's Linux only and some bits are fairly ugly. It does generate the UNO code and build all the libs, but unit tests, Java and other pieces are missing and dependency tracking does not work for code generators. The build definitions consist of ~550 lines of Meson. There are a few Python helper programs in addition to these. They have a lot of duplicated code, but for a prototype such as this one it's tolerable.

I tried to go a bit further and build basegfx, but then I got stuck. The generated code has #ifdef toggles that define whether some variables are defined as ints or enums. Other libraries fail to build either when the define is set or when it is unset. For some reason basegfx has multiple files that fail both when the define is set and when it is unset (with different error messages, though). Either the code generation step goes awry or there are even more magic defines that have to be set in order for things to work.

Wednesday, February 5, 2020

Building (a very small subset of) LibreOffice with Meson

At FOSDEM I got into a discussion with a LibreOffice dev about whether it would be possible to switch LO's build system to Meson. It would be a lot of manual work for sure, but would there be any fundamental problems. Since a simple test can eliminate a ton of guesswork, I chose to take a look.

Like most cross platform programs, LO has its own platform abstraction layer called Sal. According to experience, these kinds of libraries usually have the nastiest build configurations requiring a ton of configure checks and the like. The most prominent example is GLib, whose configure steps are awe-inspiring.

Sal turned out to be fairly simple to port to Meson. It did not require all that much in platform setup, probably because the C++ stdlib provides a lot more out of the box than libc. After a few hours I could compile all of Sal and run some unit tests. The results of the experiment can be found in this Github repo. The filenames and layouts are probably not the same as in the "real" LO build, but for a simple experiment like this they'll do.

So what did we learn?

  • Basic compilation seems to be straightforward, but there were some perennial favourites including source files that you must not compile standalone, magic compilation defines, using declarations hidden at the bottom of public headers behind #ifdefs and so on.
  • There may be further platform funkiness in the GUI toolkit configuration step.
  • A lot of code seems to be generated from IDL files, which might require some work.
  • Meson's Java support probably needs some work to build all the JARs.
  • Meson should scale at least to building LO itself, building all dependency projects at the same time is a different matter.
The last of these is not really an issue on Linux, since you get almost all deps from the system. On Windows and Mac you can achieve the same by using something like Conan for dependency management. However Meson might be scalable enough to build LO and all deps in a single go, and if it's not then making it do that would be an interesting optimization challenge.

How would one express LO's sheer size with a single number?

It has more than 150 000 lines of Makefiles.

Monday, February 3, 2020

Creating your own physical or PDF manual

This post is part 2 of N describing the creation process of the Meson manual, which you can purchase via this web site. Part 1 can be read here.

There are three main ways of producing a technical book using only free software.

  1. LibreOffice + Scribus
  2. Libreoffice only
  3. LaTeX
The first option of these mimics the way traditional book publishers work. This is highly convenient when the final formatting is done by a dedicated person rather than the author. It is not so nice for solo operators, because migrating changes from LO to the Scribus program is tedious. So you either must only do the layout as the very final step or at some point switch to having the master data in Scribus and doing all edits directly there. These approaches are either cumbersome or require strong resolve to not keep changing the text after it has gone to the DTP program.

The second approach is easier, but the downside is that LibreOffice (or MS Office or other similar programs) do not produce documents that are as aesthetically pleasing as DTP programs. I don't know the specifics, but at least the line and page splitting algorithms seem to produce worse results. This is probably because they need to be fast enough to be used in real time. There are also various reliability and layout issues. For example it is difficult to get figures to remain where you want them as the text is edited, and I have experienced cases where entries in the bibliography disappear for no reason. Also, if you are not very pedantic in using styles, changing the global document appearance is problematic.

Enter LaTeX!

The Meson manual is written in LaTeX. It is a bit quirky, but there are many major features that other systems do not provide (or at least not easily). The main point is that in LaTeX you only write the structure of the document and the system takes care of all formatting, kind of like HTML and CSS work. The default look produced by LaTeX is magical in the way that it provides an air of gravitas to any piece of text automatically.

Splitting the style and formatting is nice in that you get to work in the "markdown" syntax level when editing but can easily see the typeset version of your text at any time. Since LaTeX is plain text, you can use revision control tools to manage your book sources just like source code. Some things that are difficult in GUI apps work effortlessly in LaTeX. Just the way it handles floating figures is great and saves you so much time and effort that it could be worth the price of admission by itself.

One of the big problems in writing a book about programming is to keep your code samples both up to date and working. LaTeX provides a simple and elegant solution to this problem. Since it is just a macro processing system, it provides a way to include text from standalone files on the file system. In the Meson manual all code samples are written in standalone projects and there is a script that builds and runs all of them. Or, in other words, with LaTeX you can write unit tests for your book.

The main downside of LaTeX is that its output looks exactly like LaTeX and if you want your book to look different, it takes a fair bit of work. The syntax takes a bit of getting used to and if your keyboard layout requires multiple keys to type a backslash, typing it can get tiring.  There are tools that can convert e.g. markdown to LaTeX to make the writing process easier, but usually you need to do some fine tuning on the output or use custom LaTeX packages to get the output you want

How many copies of the manual have been sold thus far?

31.

If you are a corporation and want to support the project by buying a bulk license for all your employees, send me an email.

Sunday, January 26, 2020

How the Meson manual sales pipeline is set up and how to set your own

Setting up all the pieces to get the Meson manual sales page up and running was a fair bit of work. Since othe people might be interested in setting up something similar for their projects, here are some random notes of things that I had to do. All of this comes with the usual disclaimer that this is not accounting or legal advice, speak to an actual professional before embarking on your own venture.

A company

The first thing you need is a company. IIRC credit card processors either only deal with corporations or they charge a lot more in processing fees for individuals. The choice of corporation type depends on the country you live in.

A sales platform

This is the platform that provides the "web store", manages product downloads and so on. The Meson manual uses SendOwl, but there are other providers as well. In theory you could write this yourself, it is only a web shop after all, but then you get the headache of ops, backups, storing user data in a GDPR compliant manner and all that jazz. Using an existing store is fairly cheap and saves you a ton of work.

A credit card processor

This operator takes care of charging money from users' credit cards and delivering it to you. Your choices are limited to those supported by your sales platform. SendOwl supports PayPal and Stripe and Meson uses the latter because their fees were noticeably lower. Interestingly Stripe requires a "proof of location" via a utility bill. This is an interesting challenge in countries like Finland where this is a completely foreign concept.

Web site hosting

The requirements here depend on how fancy a web site you want to have. Meson's is a single static page with a link to the sales platform. 

Taxes

This is a very complicated topic. In fact even more complicated than that page implies. The crux is that a seller of digital goods may have the responsibility to gather and pay VAT and/or sales tax to the countries they sell to, not just their own country. Most countries have a minimum threshold (such as 100 000 dollars) of sales before a foreign operator needs to register and gather tax. Some don't (meaning the threshold is zero).

The EU is quite simple if you are within the EU. You need to register to a special VAT MOSS program, gather the necessary (country dependent) tax and then report and pay it to the tax authorities of your own country. The sales platform will automatically calculate the correct tax amount and provide a report required by the tax office. For EU residents this is highly convenient. For operators outside of the EU it is a bit more work as you need to register, but only in one EU country of your choice. All bureaucracy is dealt with through this single point.

Countries that have a sufficiently high lower limit you don't have to do anything about. Countries with low limits can be geoblocked in the sales platform.

This leaves the United States, which is special. Each state has their own way of dealing with sales tax and you can't geoblock individual states.

GDPR et al compliance

If you use third party sales platforms and credit card processors, and never store any transaction data on your own servers this is actually fairly simple. Stripe will even autogenerate a compliance document for you automatically.

How many copies of the Meson manual have been sold through the web site by now?

22.

That is about half the amount of people who participated in the failed indiegogo campaign last year.

Tuesday, January 21, 2020

The Meson Manual is now available for purchase

Some of you might remember that last year I ran a crowdfunding campaign to create a full written user manual for Meson. That failed fairly spectacularly, mostly due to the difficulty of getting any sort of visibility for these kinds of projects (i.e. on the Internet, everything drowns).

Not taking the hint I chose to write and publish it on my own anyway. It is now available on this web page for the price of 29.95€ plus a tax that depends on the country of purchase. Some countries which have unreasonable requirements for foreign online sellers such as India, Russia and South Korea have been geoblocked. Sorry about that. However you can still buy the book if you are traveling outside the country in question, but in that case all tax responsibilities for importing fall upon you.

What if you don't care about books?

I don't have a Patreon or any other crowdfunding thing ongoing, because of the considerable legal uncertainties of running a donation based service for the public good in Finland. Selling digital goods for money is fine, so this is a convenient way for people to support my work on Meson financially.

Will the book be made available under a free license?

No. We already have one set of free documentation on the project web site. Everyone is free to use and contribute that documentation. This book contains no text from the existing documentation, it is all new and written from scratch.

Is it available as a hard copy?

No, the only available format is PDF. This is both to save trees and because international shipping of physical items is both time consuming and expensive.

Getting review copies

If you are a journalist and wish to write a review of the book for a publication, send me an email and I'll provide you with a free review copy.

When was the book first made public?

It was announced at the very beginning of my LCA2020 talk. See it for yourself in the embedded video below.

Can you post about this on your favourite social media site / news aggregator / etc?

Yes, by all means. It is hard to get visibility without so I appreciate all the help I can get.

What was that site's URL again?


Tuesday, December 31, 2019

How about not stabbing ourselves in the leg with a rusty fork?

Corporations are funny things. Many things no reasonable person would do on their own are done every day in thousands of business conglomerates around the world. With pride even. Let us consider as an arbitrary example a corporation where every day is started by employees stabbing themselves in the leg with a rusty fork. This is (I hope) not actually done for real, but there could be a company out there where this is the daily routine.

If you think that such a thing could possibly never happen, congratulations on having never worked in a big corporation. Stick with that if you can!

When faced with this kind of pointless and harmful routine, one might suggest not doing it any more or replacing it with some other, more useful procedure. This does not succeed, of course, but that is not the point. The reasons you get back are the interesting thing, because they will tell you what kind of manager and coworkers you are dealing with. Here are some possible options, can you think of more?

The survivor fallacist

This is a multi-billion dollar company. If stabbing oneself in the leg was bad, as you seem to claim, we could not have succeeded.

The minimum energy spender

It would take too much work to get this changed. Just bite the bullet and do it every morning. You're better off this way.

The blame shifter

This is mandated by our head office, we can't do anything about this even if we wanted to.

The metric optimizer

Our next year's bonus metric will measure the number of leg stabbings reduced that year. We must get as many of them in this year as we possibly can.

The traditionalist

We have always done this. We must always do it.

The cornered animal

How dare you! Do you have any idea how much work it is to get pre-rusted forks? They are all made of stainless steel nowadays. Your derogatory insinuations are a slap on the face of all people working to keep this system running!

The folklorist

This is a commonly accepted best practice in software companies, thus we should do it also.

The brainwashee

This is actually a great invention. Getting a nice jolt of adrenaline first thing in the morning really wakes you up and gives you focus for the entire day. Try it for a month or two! You'll see.

The control freak messiah

This procedure was put in place by the founder/CEO. You do not challenge his choices if you know what is good for you.

The team spiritist

If you don't stab yourself in the leg, you are setting up a very bad example that demoralizes everybody else who do their part diligently.

And finally the (sadly) most common one

Our product is special.

Saturday, December 28, 2019

What can clang-format teach us about the human condition?

Most people who do programming have taken part in at least one code formatting war. Usually these come about when companies want to standardise their code bases and thus want everything formatted according to a single style. Style wars, much like real wars, are not pleasant places to be in. They cause havoc and destruction, make reasonable people into life-long sworn enemies and halt work on anything useful.

In a typical style argument statements like the following are often thrown about:

  • Indentation should be done with tabs, because everyone can set tab width to whatever they want in their editor.
  • The opening brace must be on the same line as its preceding clause. This saves vertical space and thus makes the code more readable.
  • The opening brace must be on its own line. This makes code blocks stand out better and thus makes the code more readable.
  • When laying things like arguments vertically, the separating comma must be at the beginning of the line rather than the end. In this way when you add or remove an entry, the diff is always only one line.
  • In a declaration like int *bob, the asterisk must be next to the variable name, because that is what it binds to.
  • In a declaration like int* bob, the asterisk must be next to the type name, because "pointerness" is logically a feature of the type, not the variable.
  • Class variables must begin with m_ so they stand out better.
  • Class variables must not be separated with a prefix. The syntax highlighter will already draw them in a different color and if you have so many variables in your methods that you can't immediately tell which is which, your methods are too big and must be split.
  • Et cetera, et cetera, ad infinitum, ad nauseaum.
It is unknown when code formatting wars first began. Given that FORTRAN was the first real programming language with an actual syntax and was first released in 1957, the answer probably is "way, way before that". A reasonable guess would be the first or second design meeting on the syntax. Fighting over code style kept raging for almost sixty years after that. The arguments were the same, the discussion was the same, no progress was ever made. Then clang-format was introduced and suddenly everything changed.

This was surprising, because automatic code formatters had existed for decades and clang-format was "the same, just slightly better". Yet it made this problem mostly go away. Why?

Enter the human element

With all existing formatters it was fairly easy to find code where it failed. C macros were especially treacherous in this regard. This meant that either one needed to manually add (and update) comments that disabled formatting for some blocks or the formatter was run only every now and then by hand and the result had to be inspected and fixed by hand after the fact. With clang-format this manual work went effectively to zero. You could just run it at any time, even automatically before every commit. In a weird kind of backwards way once we had the correct solution, we could finally understand what the real problem was.

Every programmer writes code in their own way. Maybe they put braces on the same line, maybe not. Maybe they indent with spaces, maybe not. The details don't matter, the real point is that the writing code in this style is effortless. It just flows from your brain to the screen. Coding in any other style means spending brain energy on either typing in some non-natural style or fixing the code afterwards. This is manual and tedious work, just the kind that programmers hate with a passion. Thus when the threat of an externally mandated code style appears, the following internal monologue takes place:
If they choose a style different than mine, then I will forever have to write in a style that is unnatural to me. This is tedious. However if I spend some energy now and convince everyone else to use my style, then I can keep on doing what I have been doing thus far. All I have to do is to factually explain why my chosen style is the best, and since other programmers are rational whey will understand my point, agree and adopt my chosen style.
Unfortunately everyone else participating in the debate has the exact same idea and things end in a stalemate almost immediately. The sunk-cost fallacy ensures that once a person has publicly committed to a style choice, they will never budge from it.

Note the massive dichotomy at play here. The real reason people have for any style choice is "this is what I have gotten used to" but when they debate their chosen style they always, always use reasoning that aims to be objective and scientific. At this point you might want to pause and reread the sample arguments listed above. They follow this reasoning exactly and most claim to improve some real world objective metric such as readability. They are also all lies. These are all post-rationalising arguments, invented after the fact to make the opinion you already have sound as good as possible. They are not the real reason. They have never been the real reason. It is unlikely they will ever be the real reason. But the debate is carried on as if they were the real reason. This is why it will never end.

The lengths people are willing to go to in their post-rationalising arguments is nothing short of astounding. In this video on indenting with tabs vs spaces many tab advocates say that indenting with tabs is better because "you only need to press tab once rather than press space multiple times". Every single programmer's text editor since 1985 (possibly 1975 and potentially even 1965) has had the feature where pressing the tab key does the logically equivalent indent with spaces. Using this as an argument only shows that you have not done even the most minimal of thinking on the issue, but instead just have already made up your mind and don't want to even consider changing it.

This is why code style discussions never go anywhere. They are not about bringing people together to find the best possible choice. They are about trying to make other people submit to your will by repeatedly bashing them on the head with your style guide. This does not work because the average programmer's head is both thicker and more durable than the average style guide.