Friday, June 11, 2021

Typesetting a full book part II, Scribus

Some time ago I wrote a blog post on what it's like to typeset an entire book using nothing but LibreOffice. One of the comments mentioned that LO does not do a great job of aligning text. This is again probably because it needs to copy MS Word's behaviour, which means greedy line splitting. Supposedly Scribus does this a lot better, but the only way to be really sure was to typeset the whole text with Scribus. So that's what I did (using the latest 1.5 release from Flathub).

Workflow for Scribus

Every program has the things it is good for and things it's not that good for. Scribus' strengths lie in producing output with fairly short pieces of text with precise layout requirements, especially if there are many images. A traditional "single flow of text" is not that, so there are some things you need to plan for.

First of all, a Scribus document should not be created until the text is almost completely finished. Doing big changes (like adding text to existing chapters, changing physical page size etc) can become quite tedious. Scribus also does not do long pieces of text particularly smootly. I tried loading all 350is pages to a single linked frame sequence. It sort of worked, but things got quite laggy quite quickly. Eventually I converged on a layout where every chapter was its own set of linked frames. The text was imported directly from LO files that held one chapter each. The original had just one big LO file, so I had to split it up by hand for the import. If the original had been done with master documents, this would have been simpler.

The table of contents had to be done by hand again. Scribus has support for tables, but they could not be used, because tables drew outlines around each cell and I could not find a way to switch that off. Websearching found several pages with info, but none of them worked. It also turns out that you can not add page references to table cells, only to text frames. No, I don't know why either. The option was greyed out in the menus and trying to sneakily copypaste a page reference from a text frame to a table caused a segfault.

Issues discovered

While LO was surprisingly bug free, Scribus was less so and I encountered many bugs and baffling missing features, such as:
  • Scribus would sometimes create empty text frames far outside the document (i.e. to page 600 on a 300 page document)
  • Text frames got a strange empty character at their end which would cause text overflow warnings, deleting it did not help as the empty characters kept reappearing
  • Adding a page reference to an anchor point would always link to the page where the linked frame sequence started, not where the anchor was placed
  • Text is not hyphenated automatically, only by selecting a text frame and then selecting extras > hyphenate text in the main menu, one would imagine hyphenation being a paragraph style property instead
  • I managed to create an anchor point that does not exist anywhere except the mark list, but deleting it leads to an immediate segfault
None of these obstacles were insurmountable, but they made for a nonsmooth experience. Eventually the work was done and here is how they compare (LO on the left, Scribus on the right).
As you can probably tell, Scribus creates more condensed output. The settings were the same for both programs (automatically translated from LO styles by Scribus, not verified by hand) and LO's output file was 339 pages compared to 326 for Scribus.

Which one should you use then?

Like most things in life, that depends. If your document has a notable amount of mathematics, then you most likely want to go with LaTeX. If the document is something like a magazine or you require the highest typographical quality possible, then Scribus is a good choice. For "plain old books" the question becomes more complicated.

If you need a fully color managed workflow, then Scribus is the only viable option. If the default output of LO is good enough for you, the document has few figures and you are fine with needing to have a great battle at the end to line the images up, LO provides a fairly smooth experience.  You have to use styles properly, though, or the whole thing will end up in tears. LO is especially suitable for documents with lots of levels, headings and cross references between the two. LaTeX is also very good with those, but its unfortunate downside is that defining new styles is really hard. So is changing fonts, so you'd better be happy with Computer Modern. If the document has lots of images, then LaTeX's automatic figure floats make a ton of manual work completely disappear.

Original data

The original source documents as well as the PDF output for both programs can be found in this Github repo

3 comments:

  1. calibre would be an other open-source option.

    calibre has an integrated e-book editor that can be used to edit books in the EPUB and AZW3(Kindle) formats. The editor shows you the HTML and CSS that is used internally inside the book files, with a live preview that updates as you make changes. It also contains various automated tools to perform common cleanup and fixing tasks.

    https://manual.calibre-ebook.com/edit.html

    ReplyDelete
  2. An interesting alternative using a completely different philosophy is to create a book in FictionBook format (which at the moment means FB2, since FB3 development seems to have stalled; which is a pity, because FB3 is supposed to be much better at describing how a book is to be laid out).

    ReplyDelete
  3. > So is changing fonts, so you'd better be happy with Computer Modern.

    FWIW in XeLaTeX or LuaLaTeX (which I have to use anyway for proper non-latin support) setting the default font is as simple as

    \usepackage{fontspec}
    \setmainfont{Liberation Serif}

    which works with pretty much any TTF/OTF font from the system.

    > The original source documents as well as the PDF output for both programs can be found in this Github repo.

    Thanks!

    ReplyDelete