Friday, December 13, 2024

CMYK me baby one more time!

Did you know that Jpeg supports images in the CMYK colorspace? And that people are actually using them in the wild? This being the case I needed to add support to them into CapyPDF. The development steps are quite simple, first you create a CMYK Jpeg file, then you create a test document that embeds it and finally look at the result in a PDF renderer.

Off to a painter application then. This is what the test image looks like.

Then we update the Jpeg parsing code to detect cmyk images and write the corresponding metadata to the output PDF. What does then end result look like then?

Aaaaand now we have a problem. Specifically one of an arbitrary color remapping. It might seem this is just a case of inverted colors. It's not (I checked), something weirder is going on. For reference Acrobat Reader's output looks identical.

At this point rather than poke things at random and hoping for the best, a good strategy is to get more test data. Since Scribus is pretty much a gold standard on print quality PDF production I went about recreating the test document in it.

Which failed immediately on loading the image.

Here we have Gwenview and Scribus presenting their interpretations of the exact same image. If you use Scribus to generate a PDF, it will convert the Jpeg into some three channel (i.e. RGB) ICC profile.

Take-home exercise

Where is the bug (or a hole in the spec) in this case:

  • The original CMYK jpeg is correct, but Scribus and PDF renderers read it in incorrectly?
  • The original image is incorrect and Gwenview has a separate inverse bug that cancel each other out?
  • The image is correct but the metadata written in the file by CapyPDF is incorrect?
  • The PDF spec has a big chunk of UB here and the final result can be anything?
  • Aliens?
I don't know the correct answer. If someone out there does, do let me know.

1 comment:

  1. Without doing any research or investigation, my gut reaction is that the wrong image loaders are mapping arbitrary color values into red, green, and blue channels. Code-wise something along the lines of "Whatever the first color channel is will become red. I'll treat the second color channel as green. Then blue. Then ignore whatever is in the fourth color channel.

    ReplyDelete