Designing APIs is hard. Designing good APIs that future people will not instantly classify as "total crap" is even harder. There are typically many competing requirements such as:

API stability
ABI stability (if you are into that sort of thing, some are not)
Maximize the amount of functionality supported
Minimize the number of functions exposed
Make the API as easy as possible to use
Make the API as difficult as possible to use incorrectly (preferably it should be impossible)
Make the API as easy as possible to use from scripting languages

Recently I have been trying to create a proper API for PDF generation so let's use that as an example.

Cairo, simple but limited

The API that Cairo exposes is on the whole pretty good. It has a fair bit of functions, but only one main "painter", the Cairo context. Cairo is a general drawing library with many backends, but the drawing commands map very closely to the ones in PDF. This is probably because Cairo's drawing model is patterned after PostScript, which is almost the same as PDF. Having only one context type means that the users do not have to manually keep track of life times between different object types, which is the source of many C bugs.

This approach works nicely with Cairo but not so well if you want to expose the full functionality of PDF directly, specifically patterns. In PDF you can specify a "pattern object". The basic use case for it is if you need to draw a repeating shape, like a brick wall, by specifying how to draw a single tile and then telling the PDF interpreter to "fill in" the area you specify with this pattern. (Cairo also has pattern support which behaves mostly the same but is ideologically slightly different. We'll ignore those for the rest of this text.)

When defining a pattern you can use almost but not exactly the same drawing commands as when doing regular painting on page surfaces. There are also at least two different pattern types with slightly varying semantics. Since we want to expose PDF functionality directly, we need to have one function for each command, like pdf_draw_cmd_l(ctx, x, y) to draw a line. The question then becomes how does one expose all this as types and functions.

Keep everything in a single object

The simplest thing objectwise would be to keep everything in a single god object and have functions like pdf_draw_page_cmd_l, pdf_draw_pattern1_cmd_l and pdf_draw_pattern2_cmd_l. This is a terrible API because everything is smooshed together and you need to remember to finish patterns before using them. Don't do this.

Fully separate object types

Another approach is to make each concept their own separate type. Then you can have functions like pdf_page_cmd_l(page, x, y), pdf_pattern_cmd_l(pattern, x, y) and so on. This also makes it easy to prevent using commands that are not supported. If, say, a command called bob is not supported on patterns, then all you have to do is to not implement the corresponding function pdf_pattern_cmd_bob.

The big downside is that there are a lot of drawing commands in PDF and in this approach almost all of them need to be defined three times, once for each context type. Their implementations are identical, so they all need to call a fourth function or the code needs to be triplicated.

A common context class

One approach is to abstract this have a PaintContext class that internally knows whether it is used for page or pattern painting. This reduces the number of functions back to one. pdf_ctx_cmd_l(ctx, x, y). The main downside is that now it is possible to accidentally call a function that requires a page drawing context with a pattern drawing context and the type system will not stop you.

A second problem is that you can call the aforementioned bob command with a pattern context. The library needs to detect that and return an error code if it happens. What this means is that a bunch of functions that previously could not fail, can now return error codes. For consistency you might want to change all paint commands to return error codes instead, but then >90% of them never return anything except success.

A common base class

The "object oriented" way of doing this would be to have a common base class for the painting functionality and then inherit that. In this approach functions that can take any context would have names like pdf_ctx_cmd_l(ctx, x, y) wheres functions that don't get specializations like pdf_page_cmd_bob. Since C does not have any OO functionality this would need to be reimplemented from scratch, probably using some Gobject-style preprocessor macro hackery like pdf_ctx_cmd_l(PDF_CTX(page), x, y) or alternatively pdf_ctx_cmd_l(pdf_page_get_ctx(page), x, y). This works, but means a lot of typing for end users and macros are type unsafe even by C standards. If you use the wrong type, woe is you. Macros make providing wrappers harder because they require you to always compile some glue code rather than using something simple like Python's ctypes.

Is there a way to cheat?

I have not managed to come up with a way. Do let me know if you do.

3 comments:

RemcoFebruary 14, 2023 at 1:05 PM
I guess a Kobayashi-style solution would be to implement the library in a different programming language, one with an expressive type system, and then (auto)generate the most optimal C-API as a downstream of that. You don't have to care about code duplication in the API, because it is generated anyway.
GonzusApril 19, 2023 at 6:12 PM
I would go the "fully separate object types" route, because it is what makes the API usage the safest / simplest. True, the implementation becomes messier, but that can be hidden and only affects the implementer; you could even use macros to hide the triple implementation. "The Needs of the Many Outweigh the Needs of the Few" and all that.

Nibble Stew

Monday, February 13, 2023

Plain C API design, the real world Kobayashi Maru test