Thursday, June 29, 2023

PDF and embedded videos

PDF supports playing back video content since version 1.5. I could do the whole shtick and shpiel routine of "surely this is working technology as the specification is over 20 years old by now". But you already know that it is not the case. Probably you came here just to see how badly broken things are. So here you go:

Time specification (or doing pointlessly verbose nested dictionaries before it was cool)

A video in PDF is a screen annotation. It is defined with a PDF object dicrionary that houses many a different subdictionaries as keys. In addition to playing back the entire video you can also specify to play back only a subsection of it. To do the latter you need an annotation dictionary with an action subdictionary of subtype rendition, which has a subdictionary of type rendition (no, that is not a typo), which has a subdictionary of a mediasource, which has three subdictionaries: a MediaClip and the start and end times.

There are three different ways of specifying the start and end times: time (in seconds), frames or bookmarks. We'll ignore the last one and look at frames first. As most people are probably not familiar with the PDF dictionary syntax, I'm going to write those out in JSON. The actual content is the same, it is just written in a different syntax. Here's what it looks like:

  "B": {
    "S": "F",
    "F": 1000

Here we define the start time "B", which is a MediaOffset. It has a subtype that uses frames ("S" is "F") and finally we have the frame number of 1000. Thus you might expect that specifying the time in seconds would mean changing the value of S to T and the key–value pair "F" -> 1000 to something like "V" -> 33.4. Deep in your heart you know that not to be true, because the required format is this.

  "B": {
    "S": "T",
    "T": {
      "S": "S",
      "V": 33.4

Expressing the time in seconds requires an entire whole new dictionary just to hold the number. The specification says that the value of that dictionary's "S" must always be "S". Kinda makes you wonder why they chose to do this. Probably to make it possible to add other time formats in the future. But in the 20 years since this specification was written no such functionality has appeared. Even if it did, you could easily support it by adding a new value type in the outer dictionary (something like "T2" in addition to "T").

Most people probably have heard the recommendation of not doing overly general solutions to problems you don't have yet. This is one of the reasons why.

Does it work in practice?

No. Here's what happens when I tried using a plain h264 MP4 file (which is listed as a supported format on Adobe's site and which plays back in Windows and macOS using the system's native video player).


Instead of a video, Okular displays a screenshot of the user desktop with additional rendering glitches. Those are probably courtesy of bugs in either DRM code or the graphic card's device driver.


Evince shows the first frame of the video and even a player GUI, but clicking on it does nothing and the video can't be played.

Apple Preview

Displays nothing. Shows no error message.


Displays nothing. Shows no error message.


Displays nothing. Shows no error message.

Acrobat Reader, macOS

Plays back videos correctly but only if you play it back in its entirety. If you specify a subrange it instead prints an error message saying that the video can't be presented "in the way the author intended" and does nothing.

Acrobat Reader, Windows

Video playback works. Even the subrange file that the macOS version failed on plays and the playback is correctly limited to the time span specified.


Video playback is always wonky. Either the video is replaced with a line visualisation of the audio track or the player displays a black screen until the video stream hits a key frame and plays correctly after that.

In conclusion

Out of the major PDF implementations, 100% are broken in one way or another.

Friday, June 23, 2023

PDF subpage navigation

A common presentation requirement is that you want to have a list of bullet points that appear one by one as you click forward. Almost all PDF presentations that do this fake it by having multiple pages, one for each state. So if you have a presentation with one page and five bullet points, the PDF has six pages, one for the empty state and a further one for each bullet point appearing.

This need not be so. The PDF specification has a feature called subpage navigation (PDF 2.0 spec, This kind of makes you wonder. Why are people not using this clearly useful functionality? After trying to implement it in CapyPDF the answer became obvious. Quite quickly.

While most of the PDF specification is fairly readable, this section is not. It is very confusing. Once you get over the initial bafflement you realize that the spec is actually self-contradictory. I was so badly confused that eventually I filed a bug report against the PDF specification itself.

That does not really help in the short term so I did the only thing that can be done under these circumstances, namely looking at how existing software handles the issue. The answer turned out to be: they don't. The only PDF viewer that does anything with subpage navigation is Acrobat Reader. Even more annoyingly just getting your hands on a PDF document that has subpage navigation defined is next to impossible. None of the presentation software I tried exports subpage navigation tags. Trying to use a search engine leads to disappointment as well. They won't let you search for "a PDF whose internal data structures contain the word PresSteps" but instead will helpfully search for "PDF files whose payload text contains the word PresSteps". As you might expect this finds various versions of the PDF specification and nothing much else.

The most probable reason why PDF subpage navigation is not used is that nobody else is using it.

How does it actually work then?

Nobody knows for real. But after feeding Acrobat Reader a bunch of sample documents and seeing what it actually does we can create a hypothesis.

The basic operations used are optional content groups and navigation nodes. The former are supported by all PDF viewers, the latter are not. Basically each navigation node represents a state and when the user navigates forwards or backwards the PDF viewer runs a specified action that can be used to either hide or display an optional content group. A reasonable assumption for getting this working, then, would be to set the default visibility of all optional content groups to hidden and then have transitions that enable the bullet points one by one.

That does not work. Instead you have to do this:

Writing arbitrary state machines as graphs just to for appearing bullet points? We're in the big leagues now! The reason the root node exists is that when you enter a page, the root node's transition is performed automatically (in one part of the spec but not another, hence the self-inconsistency). Note especially how there is no forward navigation for state 2 or backwards navigation for state 0. This is intentional, that is how you get the subpage navigation algorithm to leave the current page. I think. It kinda works in Acrobat Reader but you have to press the arrow key twice to leave the page. 

On the other hand you could implement a full choose-your-own-adventure book as a single page PDF using only subpage navigation.

Wednesday, June 14, 2023

Functionality currently implemented in CapyPDF

CapyPDF has a fair bit of functionality and it might be difficult to tell from the outside what works and what does not. Here is a rough outline of implemented functionality.

In the public C API (and Python)

  • Basic draw commands in RGB, gray and CMYK
  • ICC profile support
  • Loading PNG, JPG and TIFF images (including CMYK TIFFs)
  • Embed JPG files directly without unpacking first
  • Using images as a paint mask
  • Handle color profiles embedded in images
  • Using builtin PDF fonts
  • Using TrueType fonts with font subsetting
  • All PDF font operators like extra padding, raise/lower and set as clipping path
  • Page transitions

Implemented but not exposed

The items listed here vary in implementation completeness from "sort of ready, just not exposed yet" to "the Mmest of MVP". You can only get to them by poking at the innards directly.

  • Document navigation tree
  • Overprinting
  • Structured and annotated PDF
  • Additional color channels (called separations in the PDF spec)
  • Form XObjects
  • File embedding
  • Annotations (only a few types)
  • L*a*b* color space support
  • ICC colors in primitive paint operations
  • Type 2, 3 and 4 shadings (i.e. gradients)
  • Color patterns

Saturday, June 10, 2023

A4PDF has been renamed to CapyPDF

As alluded to in the previous post, A4PDF has changed its name. The new project name is CapyPDF. The name refers to capybaras.

Original picture from here. I was in the process of drawing a proper mascot logo, but work on that stalled. Hopefully it'll get done at some point.

There is a new release, but it does not provide new functionality, just the renaming and some general code cleanups.