Monday, April 3, 2023

Some details about creating print-quality PDFs

At its core, PDF is an image file format. In theory it is not at all different from the file formats of Gimp, Krita, Photoshop and the like. It consists of a bunch of raster and vector objects on top of each other. In practice there are several differences, the biggest of which is the following:

In PDF you can have images that have different color spaces and resolutions (that is, PPI values). This is by design as it is necessary to achieve high quality printing output.

As a typical example, comic books that are printed in color consist of two different images. The "bottom" one contains only the colors and is typically 300 PPI. On top of that you have the black linework, which is a 1 bit image at 600 or even 1200 PPI. Putting both the linework and colors in the same image file would not work. In the printout the lines would be fuzzy, even if the combined image did contain 1200 PPI.

A deeper explanation can be found in the usual places but the short version is that these two different image types need to be handled in completely opposite ways to make them look good when printed. When converting colors images to printing plates the processing software prioritizes smoothness.  On the other hand for monochrome images the system prioritizes sharpness. Doing this wrong means either getting color images that are blocky or linework that is fuzzy.

When working on A4PDF it was clear from the start that it needs to be able to create PDF files that can be used for commercial quality print jobs. To test this I wrote a Python script that recreates the cover of my recently published book originally typeset with Scribus. The end result was about 100 lines of code in total.

The background image

The main image is a single file without any adornments. It was provided by the illustrator as a single 8031 by 5953 image file. A fully color managed workflow demands the image to be in CMYK format and have a corresponding ICC color profile. There is basically only one file format that supports this use case: TIFF. Interestingly the specification for this file format was finalized in 1992. It is left as an exercise to the reader to determine how many image file formats have been introduced since that time.

A4PDF extracts the embedded ICC profile and stores it in the PDF file. It could also convert the image from the image's ICC colorspace to the specified output color space if they are different, but currently does not.

Text objects

All text color is white and is specified in CMYK colorspace, though it could also be specified in DeviceGray. Defining any object in RGB (even if the actual color was full white) could make the printing house reject the file as invalid and thus unsuitable for printing.

The author name in the front cover uses PDF's character spacing to "spread out" the text. The default character spacing in this font is too tight for use in covers.

PDF can only produce horizontal text. Creating vertical text, as in the spine, requires you to modify the drawing state's transformation matrix stack. In practice this is almost identical to OpenGL, though the PostScript drawing model that PDF uses predates OpenGL by 8 years or so. In this case the text needs a rotate + translate. 

The bar code

Ideally this should be defined with PDF's vector drawing operations. Unfortunately that would require me to implement reading SVG files somehow. It turned out to be a lot less effort to export the SVG from Inkscape at 600 PPI and then convert that to a 1 bit image with the Gimp. The end result is pretty much the same.

This approach works in Scribus as well, but not in LibreOffice. It converts all 1 bit images to 8 bit grayscale meaning that it might be fuzzy when printed. LO used to do this correctly but the behaviour was intentionally changed at some point.

The logo

This is the logo used in the publisher's sci-fi books. You can probably guess that the first book in the series was Philip K. Dick's Do Androids Dream of Electric Sheep?

Like the bar code, this should optimally be defined with vector operations, but again for simplicity a raster image is used instead. However it is different from the bar code image in that it has an alpha channel so that the background starfield shows through. When you export a 1 bit image that has an alpha channel to a PNG, Gimp writes it out as an indexed image with 4 colors (black opaque, white opaque, black transparent, white transparent). A4PDF detects files of this type and stores them in the PDF as 1 bit monochrome images with a 1 bit alpha channel.

This is something even Scribus does not seem to handle correctly. In my testing it seemed to convert these kinds images to 8 bit grayscale instead.

No comments:

Post a Comment