Sunday, April 16, 2023

PDF forms, the standard that seemingly isn't

Having gotten the basic graphical output or A4PDF working I wanted to see if I could make PDF form generation work.

This was of course a terrible idea but sadly I lacked foresight.

After a lot of plumbing code it was time to start defining form widgets. I chose to start simple and create a form with a single togglable check button. This does not seem like an impossibly difficult problem and the official PDF specification even has a nice code sample for this:

The basic idea is simple. You define the "widget" and give it two "state objects" that contain PDF drawing operations. The idea is that the PDF renderer will draw one of the two on top of the basic PDF document depending on whether the checkbox is toggled or not. The code above sets things up so that the appearance of the checkbox is one of two different DingBat symbols. Their values are not shown in the specification, but presumably they are a checked box and an empty square.

I created a test PDF with LibreOffice's form designer and then set about trying to recreate it. LO's form generator uses OpenSymbol for the checked status of the checkbox and an empty appearance for the off state. A4PDF uses the builtin Helvetica "X" character.The actual files can be downloaded here.

What we have here is a failure to communicate

No matter how much I tried I could not make form generation actually work. The output was always broken in weird ways I could not explain. Unfortunately this part of the PDF spec is not very helpful, because it does not give out full examples, only snippets, and search engines are worthless at finding any technical content when using "PDF" as a search term. It may even be that information about this is not available in public web sites. Who knows?

Anyhow, when regular debugging does not work, it's time approach things sideways. Let's start by opening the LibreOffice test document with Okular:

This might seem to be working just fine, but people with sharp eyes might notice a problem. That check mark is not from OpenSymbol. FWICT it is the standard Qt checkbox widget. Still, the checkbox works and its appearance is passable. But what happens if you increase the zoom level?

Oh dear. That "Yes" text is the PDF-internal label given to the "on" state. Why is it displayed? No idea. It's time to bring out the heavy guns and see how things work in The Gold Standard of PDF Rendering, Adobe Reader.

Nope, that's not the OpenSymbol checkmark either. Adobe Reader seems to be ignoring the spec and drawing its own checkmarks instead. After seeing this I knew I had to try this on every PDF renderer I could reasonably get my hands on. Here's the results:

  • Okular
    • LO: incorrect appearance, breaks when zooming
    • A4PDF: shows both the "correct" checkmark as well as the Qt widget on top of each other, takes a noticeable amount of time after clicking until the widget state is updated
  • Evince
    • LO: does not respond to clicks
    • A4PDF: works correctly
  • Adobe Reader win64
    • LO: incorrect appearance
    • A4PDF: incorrect appearance, does not always respond to button clicks
  • Firefox
    • LO: Incorrect appearance
    • A4PDF: Incorrect appearance
  • Chromium
    • LO: Incorrect appearance
    • A4PDF: works correctly
  • Apple Preview:
    • LO: works correctly (though the offset is a bit wonky, probably an issue in the drawing commands themselves)
    • A4PDF: works correctly

The only viewer that seems to be working correctly in all cases is Apple Preview.

PDF has a ton of toggleable flags and the like to make things invisible when printing and so on. It is entirely possible that the PDF files are "incorrect" in some way. But still, either the behaviour should either be the same on all viewers or they should report format errors. No errors are reported, though, even by this online validator.

No comments:

Post a Comment