Friday, June 23, 2023

PDF subpage navigation

A common presentation requirement is that you want to have a list of bullet points that appear one by one as you click forward. Almost all PDF presentations that do this fake it by having multiple pages, one for each state. So if you have a presentation with one page and five bullet points, the PDF has six pages, one for the empty state and a further one for each bullet point appearing.

This need not be so. The PDF specification has a feature called subpage navigation (PDF 2.0 spec, 12.4.4.2). This kind of makes you wonder. Why are people not using this clearly useful functionality? After trying to implement it in CapyPDF the answer became obvious. Quite quickly.

While most of the PDF specification is fairly readable, this section is not. It is very confusing. Once you get over the initial bafflement you realize that the spec is actually self-contradictory. I was so badly confused that eventually I filed a bug report against the PDF specification itself.

That does not really help in the short term so I did the only thing that can be done under these circumstances, namely looking at how existing software handles the issue. The answer turned out to be: they don't. The only PDF viewer that does anything with subpage navigation is Acrobat Reader. Even more annoyingly just getting your hands on a PDF document that has subpage navigation defined is next to impossible. None of the presentation software I tried exports subpage navigation tags. Trying to use a search engine leads to disappointment as well. They won't let you search for "a PDF whose internal data structures contain the word PresSteps" but instead will helpfully search for "PDF files whose payload text contains the word PresSteps". As you might expect this finds various versions of the PDF specification and nothing much else.

The most probable reason why PDF subpage navigation is not used is that nobody else is using it.

How does it actually work then?

Nobody knows for real. But after feeding Acrobat Reader a bunch of sample documents and seeing what it actually does we can create a hypothesis.

The basic operations used are optional content groups and navigation nodes. The former are supported by all PDF viewers, the latter are not. Basically each navigation node represents a state and when the user navigates forwards or backwards the PDF viewer runs a specified action that can be used to either hide or display an optional content group. A reasonable assumption for getting this working, then, would be to set the default visibility of all optional content groups to hidden and then have transitions that enable the bullet points one by one.

That does not work. Instead you have to do this:

Writing arbitrary state machines as graphs just to for appearing bullet points? We're in the big leagues now! The reason the root node exists is that when you enter a page, the root node's transition is performed automatically (in one part of the spec but not another, hence the self-inconsistency). Note especially how there is no forward navigation for state 2 or backwards navigation for state 0. This is intentional, that is how you get the subpage navigation algorithm to leave the current page. I think. It kinda works in Acrobat Reader but you have to press the arrow key twice to leave the page. 

On the other hand you could implement a full choose-your-own-adventure book as a single page PDF using only subpage navigation.

No comments:

Post a Comment