Monday, March 20, 2023

The joy of font debugging

Remember how in the previous blog post it was said that creating text in PDF would be "just a matter of setting some parameters"?

Well let's start by creating the main text in two justified columns.

Ok, nice. Next we add an author name.

Whoopsies. After some debugging one can find out that this only happens if you only use fewer than 32 characters from the font you are subsetting. Obviously. But no matter, after fixing this minor niggle we only need to display the title, author names and finally an email address. This is more of the same so nothing can go wrong.

After some hours of debugging it becomes clear that the values of left side bearings are sometimes read from the source file using incorrect offsets (while still being 100% memory safe, all accesses are inside the source data). Good Now that that's fully fix...

This is where things get extremely weird. No matter where you look or how deeply you peruse the binary data files, nothing seems to be incorrect. Maybe this is a bug in the Noto Mono font used here? So you try Liberation Mono. It fails too. And then, just to be sure, you try Ubuntu Mono. It works correctly. As does Free Mono.

Hmmmmmhmhm.

Opening the file in Fontforge says that the width of all characters is 1228 font units. That is also what Freetype reports. Which is comforting because in the TrueType file format struct fields that are designated as 32 bit integers might be either a) 32 bit integer b) 26.6. fixed point or c) 16.16 fixed point. You can't ever really be sure which, though, because it depends on values of bitfields far and away from the actual structs themselves.

Things get even weirder if you test exporting a PDF that uses those broken fonts either with Cairo or LibreOffice. In the generated PDF files the width of characters in this font to 600. Not 1228. Trying to read their source to find out how and why they do this is problematic, because they support many different font formats and thus convert the input data to their internal representation and then generate the output from those. Trying to understand how the input data correlates with the output data can give you a major headache without even trying too hard.

The actual solution is even weirder than all of the above. TrueType fonts store the horizontal metrics of glyphs in a table called hmtx. It stores both the glyph advance and left side bearings. As a special case you can only specify the latter and use a common value for the former. This provides space savings of UP TO 2 BYTES PER CHARACTER but the downside is more complex parsing. Further, going through Freetype's public structs reveals that they contain a field called x_scale. After a lot of trial error you can eventually decipher the actual algorithm needed:

If the character has both glyph advance and left side bearings defined than you use them as-is but if it only has left side bearings defined then you must divide the default width value with the scale.

Then finally.

Addendum: Freetype

Freetype has many flags and knobs to specify whether you want metrics in original font coordinates or "output coordinates". I could not come up with a combination that would have provided consistent values, some postprocessing seems to always be needed. This might be a bug, it might be a limitation of the TrueType format, it might be something completely different. I don't really know, and I don't have the energy to dig further to uncover the underlying issue.

No comments:

Post a Comment