Monday, December 23, 2024

CapyPDF 0.14 is out

I have just released version 0.14 of CapyPDF. This release has a ton of new functionality. So much, in fact, that I don't even remember them all. The reason for this is that it is actually starting to see real world usage, specifically as the new color managed PDF exporter for Inkscape. It has required a lot of refactoring work in the color code of Inkscape proper. This work has been done mostly by Doctormo, who has several videos on the issue.

The development cycle has consisted mostly of him reporting missing features like "specifying page labels is not supported", "patterns can be used for fill, but not for stroke" and "loading CMYK TIFF images with embedded color profiles does not work" and me then implementing said features or finding out how how setjmp/longjmp actually works and debugging corrupted stack traces when it doesn't.

Major change coming in the next version

The API for CapyPDF is not stable, but in the next release it will be extra unstable. The reason is C strings. Null terminated UTF-8 strings are a natural text format for PDF, as strings in PDF must not contain the zero glyph. Thus there are many functions like this in the public C API:

void do_something(const char *text);

This works and is simple, but there is a common use case it can't handle. All strings must be zero terminated so you can't point to a middle of an existing buffer, because it is not guaranteed to be zero terminated. Thus you always have to make a copy of the text you want to pass. In other words this means that you can't use C++'s string_view (or any equivalent string) as a source of text data. The public API should support this use case.

Is this premature optimization? Maybe. But is is also a usability issue as string views seem to be fairly common nowadays. There does not seem to be a perfect solution, but the best one I managed to crib seems to be to do this:

void do_something(const char *text, int32_t len_or_negative);

If the last argument is positive, use it as the length of the buffer. If i is negative then treat the char data as a zero terminated plain string. This requires changing all functions that take strings and makes the API more unpleasant to use.

If someone has an idea for a better API, do post a comment here.

1 comment:

  1. For string handling, you can take inspiration from GLib, for example GString:

    https://docs.gtk.org/glib/method.String.append_len.html
    https://docs.gtk.org/glib/method.String.append.html

    Only -1 is allowed by the API, not all negative values. It doesn't make sense to call the function with -2 for instance. The negative value can only come from a literal number, or it is passed along as-is from a variable.

    About the int32_t type: GLib uses gssize instead in such situations ("signed size").

    > corresponding to the ssize_t defined in POSIX or the similar SSIZE_T in Windows.

    ReplyDelete