Wednesday, February 8, 2023

More PDF, C API and Python

After a whole lot of bashing my head against the desk I finally managed to find out what Acrobat reader's "error 14" means, I managed to make both font subsetting and graphics generation work. Which means you can now do things like this:

After this improving the code to handle full PDF graphics seems mostly to be a question of adding functions for all primitives in the PDF graphics model. The big unknown thing is PDF form support, of which I know nothing. Being a new design it probably is a lot less convoluted than PDF fonts were.


The code is a few thousand lines of C++ 20. It requires surprisingly few dependencies:

  • Fmt
  • Zlib
  • Libpng
  • Freetype
  • LittleCMS
Some of these are not actually necessary. Fmtlib will be in the standard library in C++23. Libpng is only used to load PNG images from disk. The library could require its users to load graphics themselves and pass images in as pixel arrays. Interestingly doing font subsetting requires parsing the raw data of TrueType files by hand, so Freetype is not strictly mandatory, though it does make some things easier.

The only things you'd actually need are Zlib and LittleCMS. If one wanted to support CCIT Group 4 compression for 1 bit images, you'd need a dependency on libtiff.

A plain C API

The unfortunate side of library development is that if you want your library to be widely used, you have to provide a plain C API. For PDF it's not all that bad as you can mostly copy what Cairo does as its C API is quite nice to use. You might want to design this early on as getting the C API as easy and reliable to use as possible has effects on how the internal architecture works. As an example you should make all objects independent of each other. If the end user has to do things like "be sure to destroy all objects of type X before calling function F on object Y", then, because this is C, they are going to get it wrong and cause segfaults (at best).

Python integration

Once you have the C API, though, you can do all sorts of fun things, such as using Python's ctypes module. It takes a bit of typing and drudgery, but eventually you can create a "dependencyless" Python wrapper. With it you can do this to create an empty PDF file:

o = PdfOptions()
g = PdfGenerator(b"python.pdf", o)

That's all you can do ATM, as these are the only methods exposed in the C API. Just implementing these made it very clear that the API is not good and needs to be changed.

No comments:

Post a Comment