Thursday, June 29, 2023

PDF and embedded videos

PDF supports playing back video content since version 1.5. I could do the whole shtick and shpiel routine of "surely this is working technology as the specification is over 20 years old by now". But you already know that it is not the case. Probably you came here just to see how badly broken things are. So here you go:

Time specification (or doing pointlessly verbose nested dictionaries before it was cool)

A video in PDF is a screen annotation. It is defined with a PDF object dicrionary that houses many a different subdictionaries as keys. In addition to playing back the entire video you can also specify to play back only a subsection of it. To do the latter you need an annotation dictionary with an action subdictionary of subtype rendition, which has a subdictionary of type rendition (no, that is not a typo), which has a subdictionary of a mediasource, which has three subdictionaries: a MediaClip and the start and end times.

There are three different ways of specifying the start and end times: time (in seconds), frames or bookmarks. We'll ignore the last one and look at frames first. As most people are probably not familiar with the PDF dictionary syntax, I'm going to write those out in JSON. The actual content is the same, it is just written in a different syntax. Here's what it looks like:

{
  "B": {
    "S": "F",
    "F": 1000
  }
}

Here we define the start time "B", which is a MediaOffset. It has a subtype that uses frames ("S" is "F") and finally we have the frame number of 1000. Thus you might expect that specifying the time in seconds would mean changing the value of S to T and the key–value pair "F" -> 1000 to something like "V" -> 33.4. Deep in your heart you know that not to be true, because the required format is this.

{
  "B": {
    "S": "T",
    "T": {
      "S": "S",
      "V": 33.4
    }
  }
}

Expressing the time in seconds requires an entire whole new dictionary just to hold the number. The specification says that the value of that dictionary's "S" must always be "S". Kinda makes you wonder why they chose to do this. Probably to make it possible to add other time formats in the future. But in the 20 years since this specification was written no such functionality has appeared. Even if it did, you could easily support it by adding a new value type in the outer dictionary (something like "T2" in addition to "T").

Most people probably have heard the recommendation of not doing overly general solutions to problems you don't have yet. This is one of the reasons why.

Does it work in practice?

No. Here's what happens when I tried using a plain h264 MP4 file (which is listed as a supported format on Adobe's site and which plays back in Windows and macOS using the system's native video player).

Okular


Instead of a video, Okular displays a screenshot of the user desktop with additional rendering glitches. Those are probably courtesy of bugs in either DRM code or the graphic card's device driver.

Evince

Evince shows the first frame of the video and even a player GUI, but clicking on it does nothing and the video can't be played.

Apple Preview

Displays nothing. Shows no error message.

Firefox

Displays nothing. Shows no error message.

Chromium

Displays nothing. Shows no error message.

Acrobat Reader, macOS

Plays back videos correctly but only if you play it back in its entirety. If you specify a subrange it instead prints an error message saying that the video can't be presented "in the way the author intended" and does nothing.

Acrobat Reader, Windows

Video playback works. Even the subrange file that the macOS version failed on plays and the playback is correctly limited to the time span specified.

But!

Video playback is always wonky. Either the video is replaced with a line visualisation of the audio track or the player displays a black screen until the video stream hits a key frame and plays correctly after that.

In conclusion

Out of the major PDF implementations, 100% are broken in one way or another.

No comments:

Post a Comment