Saturday, April 29, 2023

The unbearable tightness of printing

Let's say you want to print a full colour comic book in the best possible quality. For simplicity we'll use this image as an example.

As you can probably guess, just putting this image in a PDF does not work, even if it had sufficient resolution. Instead what you need to do is to create two images. One for linework that is monochrome and has least 600 PPI and one for colours, which is typically a 300 PPI colour managed CMYK TIFF.

The colour image is drawn first and then the monochrome image is drawn on top of it. In this way you get both smooth colours and crisp linework. Most people would stop here, but this where the actual work begins. It is also where things start to wander into undocumented (or, rather, "implementation defined") territory.

Printing in true black

In computer monitors the blackest colour possible is when all colour components are off, or (0, 0, 0) in RGB values. Thus you might expect that the blackest CMYK colour is either (0, 0, 0, 1) or (1, 1, 1, 1). Surprisingly it is neither. The former looks grayish when printed whereas the latter can't be printed at all because of physical limitations. If you put too much ink in one place on the page, the underlying paper gets too wet, warps and might even rip. And tear.

Instead what you need to do is to use a colour called rich black. Each print shop has their own values for this, as the exact amount of inks to use to get the deepest black colour is dependent on the inks, paper and printing machine used. We'll use the value (0.1, 0.1, 0.1, 1.0) for rich black in this text.

Thus we need three different images rather than two.

First the colour image is laid down, then the image holding the areas that should be printed in rich black. This is a 300PPI colour image with the colour value (0.1, 0.1, 0.1, 0) on pixels that should be painted with rich black. Finally the line work is drawn on top the other two. The first two images can be combined into one. This is usually done by graphic artists when preparing their artwork to print. However the middle image can be automatically generated from the linework image with some Python so we're doing that to reduce manual work and reduce the possibility of human error.

If you create a PDF with these images you are still not done. In fact the output would be identical to the previous setup. There are still more quirks to handle.

Trapping and overprinting

Since all of the colours are printed separately they are suspect to misregistration. That is, the various colours might shift relative to each other during the printing process. This causes visual artifacts in the edges between two colours. This is a fairly complicated topic, Wikipedia has more details. This issue can be fixed by trapping, that is, "spreading" the colour under the "edge" of the linework. Like so:

If you look closely at the middle image, the gray area is slightly smaller than in the previous picture. This shrunk image can be automatically generated from the linework image with morphological erode/dilate operations. Now we have everything needed to print things properly, but if you actually try it it still won't work.

The way the PDF imaging model works is that if you draw on the canvas with any colour, all colour channels of the existing colour on the page get affected. That is, if the existing colour on the canvas is (0.1, 0.1, 0.1, 0) and you draw on top of it with (0, 0, 0, 1) the output is (0, 0, 0, 1). All the work we did getting the proper rich black colour under the linework gets erased as if it was never there.

PDF has a feature called overprinting to handle this exact case (you could also use the "multiply" filter but it requires the use of transparency, which is still prohibited in some workflows). It does pretty much what it says on the tin. When overprinting is enabled any draw operations accumulate over the existing inks. Thus the final step is to enable overprinting for the final line work image and then Bob's your uncle?

In theory yes. In practice lol no, because this part of the PDF specification is about as hand-wavy as things go. There are several toggles that affect how overprinting gets handled. What they actually do is only explained in descriptive text. One of the outcomes of this is that every single generally available PDF viewer renders the output incorrectly. Poppler, Ghostscript, Apple Preview and even Adobe Acrobat Reader all produce outputs that are incorrect in different ways. They don't even warn you that the PDF uses overprinting and that the output might be incorrect. This makes development and debugging this use case somewhat challenging.

The only way to get correct output is to use Adobe Acrobat Pro and tell it to enable overprint simulation. Fortunately I have a friend who has a 10 year old version (remember, back when you could actually buy software once and keep using it as opposed to a monthly license that can get yanked at any time?). After pestering him with an endless flow of test PDFs I finally managed to work out the exact steps needed to make this work:

Create a 300 PPI image with the colours, a 300 or 600 PPI monochrome image with the rich black areas and a 600 DPI monochrome image for the linework (the rich black image can be autogenerated from the linework image and/or precomposited in the colour image)
Load and draw the colour image as usual
Load the rich black image and store it as a PDF ImageMask rather than a plain image
Set nonstroke colour to (0.1, 0.1, 0.1, 0), set the rich black image as a stencil and fill it
Load the linework image as an imagemask
Enable overprinting mode
Set overprinting mode to 1
Set nonstroke colour to (0, 0, 0, 1)
Draw the line image as a stencil

If you deviate from any of the above steps, the output will be silently wrong. If you process the resulting PDF with anything except Adobe's tool suite the end result might become silently wrong. As an example here is the output of colour separation using Adobe Acrobat and Ghostscript.

Acrobat has preserved the rich black values under the linework whereas Ghostscript has instead cleared the colour value to zero losing the "rich" part of black. Interestingly Ghostscript seems to handle overprinting correctly in basic PDF shape drawing operations but not in stencil drawing operations.

Or maybe it does and Acrobat is incorrect here. The only way to know for sure would be to print test samples on a dozen or so commercial offset printing presses, inspecting the plates manually and then seeing what ends up on paper. Sadly I don't have the resources for that.

Sunday, April 16, 2023

PDF forms, the standard that seemingly isn't

Having gotten the basic graphical output or A4PDF working I wanted to see if I could make PDF form generation work.

This was of course a terrible idea but sadly I lacked foresight.

After a lot of plumbing code it was time to start defining form widgets. I chose to start simple and create a form with a single togglable check button. This does not seem like an impossibly difficult problem and the official PDF specification even has a nice code sample for this:

The basic idea is simple. You define the "widget" and give it two "state objects" that contain PDF drawing operations. The idea is that the PDF renderer will draw one of the two on top of the basic PDF document depending on whether the checkbox is toggled or not. The code above sets things up so that the appearance of the checkbox is one of two different DingBat symbols. Their values are not shown in the specification, but presumably they are a checked box and an empty square.

I created a test PDF with LibreOffice's form designer and then set about trying to recreate it. LO's form generator uses OpenSymbol for the checked status of the checkbox and an empty appearance for the off state. A4PDF uses the builtin Helvetica "X" character.The actual files can be downloaded here.

What we have here is a failure to communicate

No matter how much I tried I could not make form generation actually work. The output was always broken in weird ways I could not explain. Unfortunately this part of the PDF spec is not very helpful, because it does not give out full examples, only snippets, and search engines are worthless at finding any technical content when using "PDF" as a search term. It may even be that information about this is not available in public web sites. Who knows?

Anyhow, when regular debugging does not work, it's time approach things sideways. Let's start by opening the LibreOffice test document with Okular:

This might seem to be working just fine, but people with sharp eyes might notice a problem. That check mark is not from OpenSymbol. FWICT it is the standard Qt checkbox widget. Still, the checkbox works and its appearance is passable. But what happens if you increase the zoom level?

Oh dear. That "Yes" text is the PDF-internal label given to the "on" state. Why is it displayed? No idea. It's time to bring out the heavy guns and see how things work in The Gold Standard of PDF Rendering, Adobe Reader.

Nope, that's not the OpenSymbol checkmark either. Adobe Reader seems to be ignoring the spec and drawing its own checkmarks instead. After seeing this I knew I had to try this on every PDF renderer I could reasonably get my hands on. Here's the results:

Okular

LO: incorrect appearance, breaks when zooming
A4PDF: shows both the "correct" checkmark as well as the Qt widget on top of each other, takes a noticeable amount of time after clicking until the widget state is updated

Evince

LO: does not respond to clicks
A4PDF: works correctly

Adobe Reader win64

LO: incorrect appearance
A4PDF: incorrect appearance, does not always respond to button clicks

Firefox

LO: Incorrect appearance
A4PDF: Incorrect appearance

Chromium

LO: Incorrect appearance
A4PDF: works correctly

Apple Preview:

LO: works correctly (though the offset is a bit wonky, probably an issue in the drawing commands themselves)
A4PDF: works correctly

The only viewer that seems to be working correctly in all cases is Apple Preview.

PDF has a ton of toggleable flags and the like to make things invisible when printing and so on. It is entirely possible that the PDF files are "incorrect" in some way. But still, either the behaviour should either be the same on all viewers or they should report format errors. No errors are reported, though, even by this online validator.

Monday, April 3, 2023

Some details about creating print-quality PDFs

At its core, PDF is an image file format. In theory it is not at all different from the file formats of Gimp, Krita, Photoshop and the like. It consists of a bunch of raster and vector objects on top of each other. In practice there are several differences, the biggest of which is the following:

In PDF you can have images that have different color spaces and resolutions (that is, PPI values). This is by design as it is necessary to achieve high quality printing output.

As a typical example, comic books that are printed in color consist of two different images. The "bottom" one contains only the colors and is typically 300 PPI. On top of that you have the black linework, which is a 1 bit image at 600 or even 1200 PPI. Putting both the linework and colors in the same image file would not work. In the printout the lines would be fuzzy, even if the combined image did contain 1200 PPI.

A deeper explanation can be found in the usual places but the short version is that these two different image types need to be handled in completely opposite ways to make them look good when printed. When converting colors images to printing plates the processing software prioritizes smoothness. On the other hand for monochrome images the system prioritizes sharpness. Doing this wrong means either getting color images that are blocky or linework that is fuzzy.

When working on A4PDF it was clear from the start that it needs to be able to create PDF files that can be used for commercial quality print jobs. To test this I wrote a Python script that recreates the cover of my recently published book originally typeset with Scribus. The end result was about 100 lines of code in total.

The background image

The main image is a single file without any adornments. It was provided by the illustrator as a single 8031 by 5953 image file. A fully color managed workflow demands the image to be in CMYK format and have a corresponding ICC color profile. There is basically only one file format that supports this use case: TIFF. Interestingly the specification for this file format was finalized in 1992. It is left as an exercise to the reader to determine how many image file formats have been introduced since that time.

A4PDF extracts the embedded ICC profile and stores it in the PDF file. It could also convert the image from the image's ICC colorspace to the specified output color space if they are different, but currently does not.

Text objects

All text color is white and is specified in CMYK colorspace, though it could also be specified in DeviceGray. Defining any object in RGB (even if the actual color was full white) could make the printing house reject the file as invalid and thus unsuitable for printing.

The author name in the front cover uses PDF's character spacing to "spread out" the text. The default character spacing in this font is too tight for use in covers.

PDF can only produce horizontal text. Creating vertical text, as in the spine, requires you to modify the drawing state's transformation matrix stack. In practice this is almost identical to OpenGL, though the PostScript drawing model that PDF uses predates OpenGL by 8 years or so. In this case the text needs a rotate + translate.

The bar code

Ideally this should be defined with PDF's vector drawing operations. Unfortunately that would require me to implement reading SVG files somehow. It turned out to be a lot less effort to export the SVG from Inkscape at 600 PPI and then convert that to a 1 bit image with the Gimp. The end result is pretty much the same.

This approach works in Scribus as well, but not in LibreOffice. It converts all 1 bit images to 8 bit grayscale meaning that it might be fuzzy when printed. LO used to do this correctly but the behaviour was intentionally changed at some point.

The logo

This is the logo used in the publisher's sci-fi books. You can probably guess that the first book in the series was Philip K. Dick's Do Androids Dream of Electric Sheep?

Like the bar code, this should optimally be defined with vector operations, but again for simplicity a raster image is used instead. However it is different from the bar code image in that it has an alpha channel so that the background starfield shows through. When you export a 1 bit image that has an alpha channel to a PNG, Gimp writes it out as an indexed image with 4 colors (black opaque, white opaque, black transparent, white transparent). A4PDF detects files of this type and stores them in the PDF as 1 bit monochrome images with a 1 bit alpha channel.

This is something even Scribus does not seem to handle correctly. In my testing it seemed to convert these kinds images to 8 bit grayscale instead.

Saturday, April 1, 2023

Got the Star Trek - The Motion Picture Director's Edition box set? You might wan to check your discs

TL/DR

Star Trek The Motion Picture The Complete Adventure box set claims to contain a special, longer cut of the film. However it seems that this is not the case for some editions. The British edition does contain the longer cut, but the Scandinavian one seems not to. The back of the box still claims that the box set does contain the longer cut.

The claim and evidence

This is the box set in question.

At the back of the box we find the following text:

I bought this at the end of last year but I could not find the longer edition anywhere in the menus. I mentioned this to a friend of mine who has the same box set and he had found it immediately. After a lot of debugging we discovered that he has the British edition of the box set whereas I have the Scandinavian one.

The British box set, which has the longer edition, consists of the following discs:

EU151495BLB Blu-ray The Director's Edition Bonus Disc
EU151496ULB 4K UltraHD The Director's Edition Feature Film, Special Features
EU151460ULB 4K UltraHD Feature Film, Special Features (Special Longer Version, Theatrical Version)
EU151496BLB Blu-ray The Director's Edition Feature Film, Special Features
EC150885BLB Blu-ray Feature Film, Special Features

My Scandinavian box set has the following discs:

eu151495blb bonus disc
eu151496ulb director's edition 4k ultrahd, Feature film
eu150884ulb 4k, regular version, feature film
eu151496blb blu-ray, director's edition
eu150885blb blu-ray, regular version, feature film

The only differences here are that one disc has a serial number with letters EC instead of EU and that the British edition has this disc:

Note how it says "special longer version" in microscopic letters under the Paramount logo. This text does not appear on any disc in the Scandinavian edition.

The Scandinavian edition does not have this disc. Instead it has the disc with the product id EU150884ULB whereas the corresponding British disc has the product id EU151460ULB .The missing content can not be on any of the other discs, because they are the same in both editions (the EU/EC issue notwithstanding).

I reported this to the store I bought the box set from. After a lot of convincing they agreed to order a replacement box set. When it arrived we inspected the contained discs and they were the same as in the box set I already had. This would imply that the entire print run is defective. After more convincing we managed to get the Finnish importer to report the issue to the distributor in Sweden.

They eventually replied that the longer edition is in "the extra disc in the set that is in a cardboard sleeve". This is not the case, as that disc contains only extras, is the same in the British box set and further is a regular Blu-Ray, not a 4k UHD one as it should be and as it is on the British edition.

We reported all this back up the chain and have not heard anything back for several weeks now.

What to do if you have a non-British (and presumably non-US) edition of this box set?

Check whether your discs contain the special longer edition. The British disc has this this fairly unambiguous selection screen for it when you insert the disc:

If your box set does not contain this version of the film, report the issue to where you bought the box set from. If this really is a larger issue (as it would seem), the more bug reports they get the faster things will get fixed.

The film store said that they have sold tens of these box sets and that I was the first to notice the issue so don't assume that other people have already reported it. This is not an issue of the physical copy that I have because the replacement box set was defective in the same way.

Speculation: why is it broken?

The British disc that has the special longer edition does not contain Finnish subtitles (or possibly any non-English languages). When the box was being assembled someone found this out, decided that they can't ship a disc without localisation and replaced the disc with a regular version of the film that does not have the longer cut. But they did not change the back of the box, which states that the box set contains the longer edition which it does not seem to have.