Monday, September 20, 2021

Glyphtracer 2.0

Ages ago I wrote a simple GUI app called Glyphtracer to simplify the task of creating fonts from scanned images. It seems people are still using it. The app is written in Python 2 and Qt 4, so getting it running becomes harder and harder as time goes by.

Thus I spent a few hours porting it to Python3 and PyQt 5 and bumped the major version to 2.0. The program can be obtained from this Github repo.


Saturday, September 11, 2021

Swappable faceplates for laptops

On the whole, laptops are boring. There are three different types.

  1. Apple aluminium laptops that all look the same.
  2. Other manufacturer's laptops that try to look like Apple laptops that look all the same.
  3. RGB led studded gamer laptops that all look the same.
There is a cottage industry that manufactures stickers that are the size of the entire laptop but those are inconvenient and not particularly slick. You only have one chance to apply and any misalignments or air bubbles are going to be there forever.

But what if some laptop manufacturer chose to do things differently and designed their laptops so that the entire face plate were detachable and replaceable? This would allow everybody to customize their own laptops in exactly the way they want to (this should naturally extend to the inner keyboard plate, but we'll ignore that for now). For example you could create a laptop with the exact Pantone color that you want rather than what the laptop manufacturer saw fit to give you.

Or maybe you'd like to have a laptop cover which lights up not the manufacturer's branding but instead the logo of your favourite band?

Or perhaps you are a fan of classic muscle cars?

The thing that separates these covers from what we currently have is that they would not be pictures of things or stickers. The racing stripe cover could be manufactured in exactly the same way as real car hoods. It would shine like real metal. It would feel different when touched. You could, one imagines, polish it with real car wax. It would be the real thing. Being able to create experimental things in isolation makes all sorts of experimentation possible so you could try holograms, laser engravings, embossings, unusual metals and all sorts of funky stuff easily. In case of failure the outcome is just a lost cover plate you can toss into the recycling bin rather than a very expensive laptop that is either unusably ugly or flat out broken.

But wait, there's more. Corporate PR departments should be delighted with this. Currently whenever people do presentations in conferences their laptops are clearly visible and advertise the manufacturer. The same goes for sales people visiting customers (well, eventually, once Covid 19 passes) and so on. Suppose you could have this instead:

Suddenly you have reclaimed a ton of prime advertising real estate that can be used for brand promotion (or something like that, not really a PR person so I'm not intimately familiar with the terminology). If you are a sufficiently large customer and order thousands of laptops at a time, it might be worth it to get all the custom plates fitted at the main factory during assembly. Providing this service would also increase the manufacturer's profit margin.

Tuesday, August 10, 2021

Hercule Poirot and the Mystery Box, as written by JJ Abrams

Outside, the scorching heat of the British summer could not have made a starker contrast with the icy cold atmosphere of the main living room of Lord Ellington's mansion. All fifteen people sitting around it were staring at the plump Belgian private inspector, who seemed to take great pleasure in waxing his mustache and letting the audience simmer in their own sweat. Finally he spoke.

"You may have wondered why I have called all you here. It has to do with the untimely death of Lady Sybill. We have reason to believe that the three dozen stab wounds in her back were not an accident but that she was in fact ..."

He paused for effect.

"... murdered."

Madame Smith shrieked in terror and dropped her tea cup, which shattered into a million trillion pieces, each of which glittered with the sparkle of a thousand suns. Which was strange, since the curtains were closed and there were no sources of direct light in the room.

"Exactly, mon frére" he said to her, even though his mother tongue was French and he should have known not to call a woman "my brother". And also to spell his sentences properly. But who cares about trivial details such as these when there is a mystery afoot?

"And furthermore, I'm happy to tell you all that our investigation on this has finally come to a conclusion."

Poirot took off his monocle, polished it and then put it back on.

"Oui, we can finally reveal that the identity of the murderer ... will remain unknown forever. Even with our plethora of clues, we could not determine what actually happened. Whoever the murderer was, they got away scott free as we are shutting down the investigation. Mon dieu, that makes this our seventy-sixth unsuccessful murder investigation in a row!"

The audience in the salon seemed perplexed, so captain Hastings chose to interject.

"It could also be space aliens!"

"My money is on an interdimensional time travel accident" countered Inspector Japp with the calm voice of an experienced police officer.

For a while, the room was enveloped in silence. Then the house was blown up by nazis.

Saturday, July 31, 2021

Looking at building O3DE with Meson, part II

After the first post, some more time was spent on building O3DE with Meson. This is the second and most likely last post on the subject. Currently the repository builds all of AzCore basic code and a notable chunk of its Qt code. Tests are not built and there are some caveats on the existing code, which will be discussed below. The rest of the conversion would most likely be just more of the same and would probably not provide all that much new things to tackle.

Code parts and dependencies

Like most projects, the code is split into several independent modules like core, testing, various frameworks and so on. The way Meson is designed is that you traverse the source tree one directory at a time. You enter it, do something, possibly recurse into subdirectories and then exit it. Once exited you can never again return to the directory. This imposes some extra limitations on project structure, such as making circular dependencies impossible, but also makes it more readable.

This is almost always what you want. However there is one exception that many projects have: the lowest layer library has no internal dependencies, the unit testing library uses that library and the tests for the core library use the unit testing library. This is not a circular dependency as such, but if the unit tests are defined in the same subdir as the core library, this causes problems as you can't return to it. This needs to be broken in some way, like the following:

subdir('AzCore')
subdir('AzTest')
subdir('AzCore/tests')

Code generation

Most large projects have a code generator. O3DE is no exception. Its code generator is called AutoGen and it's a Python script that expands XML using Jinja templates. What is strange is that it is only used in three places, only one of which is in the core code. Further, if you look at the actual XML source file it only has a few definitions. This seems like a heavy weighted way to go about it. Maybe someone could summon Jason Turner to constexrpify it to get rid of this codegen.

This part is not converted, I just commented out the bits that were using it.

Extra dependencies

There are several other dependencies used that seem superfluous. As an example the code uses a standalone library for MD5, but it also uses OpenSSL, which provides an MD5 implementation. As for XML parsers, there are three, RapidXML, Expat and the one from Qt (though the latter is only used in the editor).

Editor GUI

Almost all major game engines seem to write their own GUI toolkits from scratch. Therefore it was a bit surprising to find out that O3DE has gone all-in on Qt. This makes it easy to use Meson's builtin Qt 5 support, though it is not without some teething issues. First of all the code has been set up so that each .cpp file #includes the moc file generated from its header:

#include "Components/moc_DockBarButton.cpp"

Meson does things differently and builds the moc files automatically so users don't have to do things like this. They are also written in a different directory than what the existing configuration uses so this include could not work, the path is incorrect. This #include could be removed altogether, but since you probably need to support both at the same time (due to, for example, a transition period) then you'd need to do something like this:

#ifndef MESON_BUILD
#include "Components/moc_DockBarButton.cpp"
#endif

What is more unfortunate is that the code uses Qt internal headers. For some reason or another I could not make them work properly as there were missing private symbols when linking. I suspect that this is because distro Qt libraries have hidden those symbols so they are not exported. As above I just commented these out.

The bigger problem is that O3DE seems to have a custom patches in their version. At least it refers to style enum values that do not exist. Googling for the exact string produces zero relevant matches. If this is the case then the editor can not be used with official Qt releases. Further, if said patches exist, then they would need to be provided to the public as per the LGPL, since the project is providing prebuilt dependency binaries. As mentioned in the first blog post, the project does not provide original sources for their patched dependencies or, if they do, finding them is not particularly easy.

What next?

Probably nothing. It is unlikely that upstream would switch from CMake to Meson so converting more of the code would not be particularly beneficial. The point of this experiment was to see if Meson could compile O3DE. The answer for that is yes, there have not been any major obstacles. The second was to see if the external dependencies could be provided via Meson's Wrap mechanism. This is also true, with the possible exception of Qt.

The next interesting step would be to build the code on multiple platforms. The biggest hurdle here is the dependency on OpenSSL. Compiling it yourself is a bear, and there is not a Wrap for it yet. However once this merge request is merged, then you should be able to build OpenSSL as a Meson subproject transparently. Then you could build the core fully from source on any platform.

Friday, July 30, 2021

How much effort would it take to convert OpenSSL's Perl source code generators to Python?

There is an ongoing discussion to write Meson build definitions to OpenSSL so it can be added to the WrapDB and built transparently as a subproject. One major issue is that OpenSSL generates a lot of assembly during build time with Perl. Having a Perl dependency would be bad, but shipping pregenerated source files would also be bad. Having "some pregenerated asm" that comes from "somewhere" would understandably be bad in a crypto library.

The obvious third option would be to convert the generator script from Perl to Python. This is not the first time this has been proposed and the counterargument has always been that said conversion would take an unreasonable amount of effort and could never be done. Since nobody has tried to do the conversion we don't really know whether that claim is accurate or not. I converted the x86_64 AES asm generator to see how much work it would actually take. The code is here.

The code setup

The asm generator has two different levels of generators. First a Perl script generates asm in a specific format and after that a second Perl script converts it to a different asm type if needed (Nasm, At&T, Intel etc). The first script can be found here and the second one is here. In this test only the first script was converted.

At first sight the task seems formidable. The generator script is 2927 lines of code. Interestingly the asm file it outputs is only 2679 lines. Thus the script is not only larger than its output, it is also less understandable than the thing it generates both because it is mishmash of text processing operations and because it is written in Perl.

Once you get over the original hump, things do get easier. Basically you have a lot of strings with statements that look like this:

movzb `&lo("$s2")`,$acc0

This means that the string needs to be evaluated so that $s2 and $acc0 are replaced with the values of variables s2 and acc0. The backticks mean that you have to then call the lo function with the given value and substitute that in the output string. This is very perlistic and until recently would not have been straightforward to do in Python. Fortunately now we have f-strings so that becomes simply:

f'movzb {lo(s2)},{acc0}'

With that worked out the actual problem is no longer hard, only laborious. First you replace all the Perl multiline strings with Python equivalents, then change the function declarations from something like this:

sub enctransform_ref()
{ my $sn = shift;

to this:

def enctransform_ref(sn):

and then it's a matter of repeated search-and-replaces for all the replacement variables.

The absolute best thing about this is that it is almost trivial to verify that the conversion has been done without errors. First you run the original script and store the generated assembly. Then you run your own script and compare the output. If they are not identical then you know exactly where the bug is. It's like having a unit tests for every print statement for the program.

How long did it take?

From scratch the conversion took less than a day. Once you know how it's done a similar conversion would take maybe a few hours. The asm type converter script seems more complicated so would probably take longer.

A reasonable port would contain these conversions for the most popular algorithms to the most popular CPU architectures (x86, x86_64, arm, aarch64). It would require a notable amount of work but it should be measured in days rather than months or years. I did browse through some of the other asm files and it seems that they have generators that work in quite different ways. Converting them might take more or less work, but probably it would still be within an order of magnitude.

Tuesday, July 20, 2021

A quick look at the O3DE game engine and building it with Meson

Earlier today I livestreamed what it would take to build a small part of the recently open sourced O3DE game engine. The attempt did not get very far, so here is a followup. It should not be considered exhaustive in any way, it is literally just me poking the code for a few hours and writing down what was discovered.

Size

The build system consists of 56 thousand lines of CMake.

Name prefixes

The code has a lot of different naming convention and preserved prefixes. These include "Cry" which seem to be the oldest and come from the original CryEngine. The most common one is "az", which probably stands for Amazon. The "ly" prefix probably stands for "Lumberyard" which was the engine's prior name. Finally there is "PAL", which is not really explained but seems to stand for "platform abstraction layer" or something similar.

Compiling

The code can be obtained from the upstream repo. There's not much more you can do with it since it does not actually build on Linux (tested on latest Fedora) but instead errors out with a non-helpful CMake error message.

When I finally managed to compile something, the console practically drowned in compiler warnings. These ranged from the common (missing virtual destructors in interface classes) to the bizarre (superfluous pragma pops that come from "somewhere" due to macros).

Compiler support

The code explicitly only supports Visual Studio and Clang. It errors out when trying to build with GCC. Looking through the code it seems like it is mostly a case of adding some defines. I tried that but pretty quickly ran into page-long error messages. A person with more knowledge of the inner workings of GCC could probably make it work with moderate effort.

Stdlib reimplementations

O3DE is based on CryEngine, which predates C++ 11. One place where this shows up is that rather than using threading functionality in the standard library they have their own thread, mutex, etc classes that are implemented with either pthread or Windows threads. There may be some specific use cases (thread affinity?) why you'd need to scoop to plain primitives but otherwise this seem like legacy stuff nobody has gotten around to cleaning.

Yes, there is a string class. Several, in fact. But you already knew that.

Dependencies

This is where things get weird. The code uses, for example, RapidXML and RapidJSON. For some reason I could not get them to work even though I used the exact same versions that are listed in the CMake definition. After a fair bit of head scratching things eventually became clear. For example the system has its own header for RapidXML called rapidxml.h whose contents are roughly the following:

#define RAPIDXML_SKIP_AZCORE_ERROR

// the intention is that you only include the customized version
// of rapidXML through this header, so that
// you can override behavior here.
#include <rapidxml/rapidxml.h>

Upstream does not provide its header in a rapidxml subdirectory, it is under include. The same happens when the header is installed on the system. Thus the include as given can not work. Even more importantly, the upstream header is not named rapidxml.h but instead rapidxml.hpp.

It turns out that O3DE has its own dependency system which takes upstream source, makes arbitrary changes to it, builds it and then is provided as a "package" which is basically a zip file with headers and prebuilt static libs. These are downloaded from not-at-all-suspicous-looking URLs like this one. What changes are done to these packages is not readily apparent. There are two different repos with info but they don't seem to have everything.

When using external libraries like this there are two typically two choices. Either you patch the original to "fit in" with the rest of your code or you can write a very small adapter wrapper. This project does both and with preprocessor macros no less.

The whole dependency system seems to basically be a reimplementation of Conan using pure CMake. If that sentence on its own does not make cold sweat run down your back then let it be noted that one of the dependencies obtained in this way is OpenSSL.

The way the system is set up prevents you from building the project using system dependencies. This includes Qt as the editor GUI is written with it. Neither can you build the entire project from source yourself because the existing source only works with its own special prebuilt libraries and the changes applied do not seem to be readily available as patches.

Most of this is probably because CryEngine was originally written for internal use on Windows only. This sort of an approach works for that use case but not all that well for a multiplatform open source project.

Get the code

My experimental port that compiles only one static library (AzCore) can be downloaded from this Github repo. It still only supports Clang but adding GCC support should be straightforward as you don't need to battle the build mechanism at least.

Thursday, July 8, 2021

A followup to the Refterm blog post

My previous blog post was about some measurements I made to Refterm. It got talked about on certain places on the net from where it got to Twitter. Then Twitter did the thing it always does, which is to make everything terrible. For example I got dozens of comments saying that I was incompetent, an idiot, a troll and even a Microsoft employee. All comments on this blog are manually screened but this was the only time when I just had to block them all. Going through those replies  seemed to indicate that absolutely everyone involved [1] had bad communication and also misunderstood most other people involved. Thus I felt compelled to write a followup explaining what the blog post was and was not about. Hopefully this will bring the discussion down to a more civilized footing.

What we agree on

Let's start by writing down all the things we agree on.

  1. The current Windows terminal is slow.
  2. It can be made faster.
  3. A GPU-based renderer (such as the one in Refterm) can render terminal text hundreds of times faster than the current implementation in Windows terminal.
Note that even the people working on Microsoft Terminal acknowledged all of these to be true before any code on Refterm had been written. From what I can tell, #3 is what Refterm set out to prove and it was successfull at it.

So what's the issue then?

Once the code was out many people started making these kinds of statements.

Windows terminal should switch to using this method because it is obviously superior and not doing that is just laziness and excuses.

Now, and this is important, the original people who worked on Refterm never made these kinds of claims. They did not! And further, I never claimed that they did. Other people ("the talking heads on the Internet") made those claims and then mental misattribution took over. This is unfortunate but sadly almost inevitable whenever these kinds of debates happen. That then leads to the obvious follow up question:

Could the rendering mechanism used in Refterm be put in Windows terminal proper? If not, why not? If yes, what would the outcome be like and would the code need changing?

This is what my original blog post was about. Since this was outside of the original project's design goals I should have stated the goals out explicitly. I did not, and that is a flaw on my part, sorry about that.

The problems with prototypes

Implementing a simple prototype of an existing program (or a part of it), achieving great success and then extrapolating from that to the whole program (and to reiterate: the original Refterm authors did not do this speculation, other people did) has a well known pitfall. I could write an entire blog post about it. Fortunately I don't have to since Matthew Garrett already wrote one. I recommend that everyone reads that before continuing with this post.

The tl/dr version is that when you bring a protype up to sufficient feature parity with an existing implementation you will encounter unexpected problems. The algorithms or approaches you took might not be feasible. You might need to add a bunch of functionality that you never considered or had even heard of. Until you have the entire implementation you don't know whether you approach will work. In fact you can't know it. Anyone who claims to know is lying, either to others or to themselves. (Reminder again: the Refterm authors did not make these kinds of estimates.)

We can try to come up with some of the obstacles and problems one could have when moving the prototype implementation into a real one and then examine those. They can't prove fitness but they can reveal un-fitness. The points discussed in the blog post were just some that I came up with. There are undoubtedly many others.

Resource usage

Let's start this by acknowledging a notable flaw in the original post [2]. When evaluating memory usage the post only compared it against other types of apps, not other terminals. I ran some new measurements by starting both the Windows cmd.exe shell as well as the Git-Scm's MSYS2 terminal, running a few simple commands and looking at memory consumption with the Task Manager. Refterm took 350 MB of ram, MSYS2 took 4 MB and cmd.exe took 7 MB.

People really love their terminals. I have seen setups where a single developer has 10+ different terminals open at the same time all as different processes (with several tabs each). So even if 300 MB of ram usage for a single app would be fine, using 3 GB of ram in this case would not be. Thus one would either need to dramatically reduce memory usage or have something like a shared glyph cache between all the various processes. That requires shared GPU resources, some sort of an IPC communication mechanism, multiprocess cache invalidation and all other fun stuff (or that is my understanding at least as a Windows and GPU neophyte, if there is a simpler way do let me know).

This piece of information is useful and important on its own. It gives new information and understanding of what the code does and does not do.

A retort to this that was actually made by Refterm developers was that "there are variables and knobs in the code you can tweak to affect the performance". To this I say: no. The first tests should always be done with the exact setup the code ships with. There are two reasons for this. First of all, that makes experiments made by different people directly comparable with each other. Secondly, the original author(s) know the code best and thus it makes sense to choose those parameter values that they went with. 

Code layout and nonstandardness

Let's start again with the thing we all agree on:

For your own projects you can choose whatever code layout, build system, organization and so on that you want. Do whatever works best for you and don't let anyone tell you otherwise!

Things get more complicated when you start including other people, especially outside your own circle of devs. An open source project is a typical example. An anonymous commenter told me the following:

This is also the simplest possible code structure, very simple to work with for new contributors.

This sentence is interesting in that it is both true and not true. If you have a person who has no prior programming knowledge then yes, the layout is simplest possible and easy to get started with. On the other hand if the potential contributor is already accustomed to the "standard way of setting up a project" then things change and the nonstandard layout can get confusing [3] and can be a barrier to entry for new people. This is the nature of teamwork: sometimes it might make sense to do the thing that is inconvenient for you personally for the benefit of others. Sometimes it does not make sense. Like most things in life being nonstandard is not an absolute thing, it has its advantages but also disadvantages.

I actually encountered a technical disadvantage caused by this. I wanted to compile and run Refterm under Address Sanitizer, which is a really great tool for finding bugs. Asan is integrated into the latest VS and all you need to do is to add /fsanitize=address flags to the compiler to use it. This does not work for Refterm but instead leads to a bunch of linker errors. The Asan library depends on the C runtime and Refterm is explicitly set up not to use it. It took me a fair bit of time to work out that the way to get it working is to go through the code and replace the WinMainCRTStartup function with a "normal" WinMain function and then the linker would do the right thing [4].

That SIMD memcpy thing

I pondered for a while whether I should mention the memcpy thing and now I really wish I hadn't. But not for the reasons you might think.

The big blunder I did was to mention SIMD by name, because the issue was not really about SIMD. The compiler does convert the loop to SIMD automatically. I don't have a good reference, but I have been told that Google devs have measured that 1% of all CPU usage over their entire fleet of computers is spent on memcpy. They have spent massive amounts of time and resources on improving its performance. At least as late as 2013, performance optimizing memcpy was still subject to fundamental research (or software patents at least). For reference here is the the code for the glibc version of memcpy, which seems to be doing some special tricks.

If this is the case and the VS stdlib provides a really fast memcpy then rolling your own does cause a performance hit (though in this particular case the difference is probably minimal, possibly even lost in the noise). On the other hand it might be that VS can already optimize the simple version to the optimal code in which case the outcome is the same for both approaches. I don't know what actually happens and finding out for sure would require its own set of tests and benchmarks.

Concluding and a word about blog comments

That was a very long post and it did not even go through all the things I had in mind. If you have any comments, feel free to post them below, but note the following:

  • All comments are prescreened and only appear after manually approved.
  • Any comments that contain insults, whining, offensive tone or any other such thing will be trashed regardless of its other merits
  • The same goes for any other post whose contents makes it obvious that the commenter has not read the whole text but is just lashing out.

[1] Yes, this includes me. It most likely includes you as well.

[2] Thanks to an anonymous commenter for pointing this out.

[3] I can't speak for others, bu it was for me.

[4] There may have been other steps, but this was the crucial one.