PDF has a fairly straightforward model for rendering text. Basically you specify a font to use, define a transformation matrix if needed and specify the text string to render. The PDF renderer will then load the embedded font file, extract the curves needed for your letters and render them on canvas. And it works quite nicely.
Assuming you are using plain ASCII. Because 127 glyphs should be enough for everybody.
If, for some weirdo reason, you belong to that ~80% minority population of planet Earth whose native tongue uses characters beyond ASCII, you are not going to be in a happy place. The whole situation is nicely summarized in this Computerphile video.
It may be comforting to know that nothing has changed since 1993. Dealing with fonts and text in PDF is still very much a case of voodoo incantations and guesswork. It does not really help that phrases containing words like "PDF", "text", and "fonts" are completely ungoogleable because they find a million hits on people trying to create PDF files with existing tools as opposed how font generation works under the hood.
How does one create text with PDF?
What most people generating PDF would want is to be able to specify a font file, starting location and the text to render in UTF-8. Some people might want finer control over text rendering and instead provide an array of Unicode glyphs and coordinates. Cairo provides both of these. PDF provides neither. Even the ASCII subset PDF does provide is mostly useless by itself for the following reasons.
Let's start with font metadata. When you embed a font inside a PDF file you also need to provide the width of each character as a PDF array in the font specification dictionary. This data is exactly the same as is in the font file itself. The PDF renderer must know how to parse said files to get the glyph curves out but the specification still mandates that every PDF creator program must replicate this data. Don't believe me? This is what the PDF specification document says in subsection 9.3.4:
The width information for each glyph shall be stored both in the font dictionary and in the font program itself.
The two sets of widths shall be identical.
Why is this? A reasonable assumption is that the person who was tasked with adding TrueType font support at Adobe back in the day got lazy and just insisted that font embedders must provide this information in PDF metadata format because the code to process those was already implemented. But it gets worse. The glyph width data is specified with three items, FirstChar, LastChar and Widths. The last of these contains the widths of all glyphs between the two endpoints. This means that if you wanted to create a PDF document containing the single word παν語 (the stylised form of "Pango") then you'd need an array with some 35 thousand entries, all but four of which would be unused. You still need to define them, or at least, you would need to if PDF text worked this way. It does not. We'll come back to this later.
As a palette cleanser let's note that even though PDF documentation always speaks of glyph widths, this array does not contain glyph widths. What it actually contains is glyph advancements, but given that the PDF text model predates TrueType font technology we can let that one pass.
What we can't ignore, though, is kerning. If you want your text to look good, you have to do proper kerning and the font file has all the information necessary to do that. PDF does not read that (and, as far as I can tell, you can't make it do so) but instead it requires that you specify kerning inline in your text rendering command. This means that in practice you can't tell PDF to render more than one glyph of text and expect it to do the right thing. Instead you have to manually process each glyph and add kerning directives between each pair of characters as necessary.
The magic 255
Creating PDF text with glyph IDs and locations seems like a reasonable approach then. The PDF documentation even says that you can specify glyph ids directly with one of two different quoting methods: octal numbering or UTF-16BE values. This works fine until you try to specify a glyph number bigger than 255, at which point fails incomprehensibly. I spent days trying to debug this. There are several different indices that could be used, mapped to each other and so on. I could not for the life of me figure out a combination that worked. Every attempt to use glyph indices failed. There is no useful documentation for this, you basically have to read code for existing PDF creators and examine their output with a hex editor or bespoke tools. The various PDF validators I tried were not very helpful either because their error messages were of the type "this PDF file is not valid in some way lol ;-)".
Eventually I tried creating a LO document that had more than 255 unique glyphs and exported that. The generated PDF file had two different subsets of the used font. The first had 255 characters and the second one had the rest. Cairo's PDF output is roughly similar. This leads me to believe that, contrary to what the documentation implies, it is not possible to have more than 255 glyphs in a single font in PDF. Or, if it is possible, then the way you go about getting that done is so bizarrely complicated that nobody has managed to get it working. Instead what you have to do is to create font subsetting code from scratch that packs all used glyphs into blocks of 255, map each input glyph to these packed ids, and write a cmap file so that the PDF renderer can convert the packed glyph ids back to Unicode values. You can't even use Freetype to do the subsetting, since it does not provide font creation functionality, only reading and rendering. Instead you get to do binary data mangling by hand. In big endian. Obviously.
If someone reading this knows for sure whether all of the above is true or if there is a simpler way (converting text to curves by hand does not count), do let me know in the comments.