This is a little story that happened a few days ago, it explains well how I
usually get involved into ports in OpenBSD.
1 - Lurking into ports/graphics/
At first, I was looking in various ports there are in the graphics category,
searching for an image editor that would run correctly on my offline laptop.
Grafx2 is laggy when using the zoom mode and GIMP won’t run, so I just open
ports randomly to read their pkg/DESCR file.
This way, I often find gems I reuse later, sometimes I have less luck and I
only tried 20 ports which are useless to me. It happens I find issues in ports
looking randomly like this…
2 - Find the port « comix »
Then, the second or third port I look at is « comix », here is the DESCR file.
Comix is a user-friendly, customizable image viewer. It is specifically
designed to handle comic books, but also serves as a generic viewer. It
reads images in ZIP, RAR or tar archives (also gzip or bzip2 compressed)
as well as plain image files.
That looked awesome, I have lot of books as PDF I want to read but it’s not
convenient in a “normal” PDF reader, so maybe comix would help!
3 - Using comix
Once comix was compiled (a mix of python and gtk), I start it and I get errors
opening PDFs… I start it again from console, and in the output I get the
explanation that PDF files are not usable in comix.
Then I read about the CBZ or CBT files, they are archives (zip or tar)
containing pictures, definitely not what a PDF is.
4 - mcomix > comix
After a few searches on the Internet, I find that comix last release is from
2009 and it never supported PDF, so nothing wrong here, but I also found comix
had a fork named mcomix.
mcomix forked a long time ago from comix to fix issues and add support for new
features (like PDF support), while last release is from 2016, it works and
still receive commits (last is from late 2019). I’m going for using comix!
5 - Installing mcomix from ports
Best way to install a program on OpenBSD is to make a port, so it’s correctly
packaged, can be deinstalled and submit to ports@ mailing list later.
I did copy comix folder into mcomix, use a brain dead sed command to replace all
occurrence of comix by mcomix, and it mostly worked! I won’t explain little
details, but I got mcomix to work within a few minutes and I was quite happy!
Fun fact is that comix port Makefile was mentioning mcomix as a suggestion
for upgrade.
6 - Enjoying a CBR reader
With mcomix installed, I was able to read some PDF, it was a good experience
and I was pretty happy with it. I’ve spent a few hours reading, a few moments
after mcomix was installed.
7 - mcomix works but not all the time
After reading 2 longs PDFs, I got issues with the third, some pages were not
rendered and not displayed. After digging this issue a bit, I found about
mcomix internals. Reading PDF is done by rendering every page of the PDF using
mutool binary from mupdf software, this is quite CPU intensive, and for
some reason in mcomix the command execution fails while I can do the exact same
command a hundred time with no failure. Worse, the issue is not reproducible in
mcomix, sometimes some pages will fail to be rendered, sometimes not!
8 - Time to debug some python
I really want to read those PDF so I take my favorite editor and start
debugging some python, adding more debug output (mcomix has a -W parameter
to enable debug output, which is very nice), to try to understand why it
fails at getting output of a working command.
Sadly, my python foo is too low and I wasn’t able to pinpoint the issue. I just
found it fail, sometimes, but I wasn’t able to understand why.
9 - mcomix on PowerPC
While mcomix is clunky with PDF, I wanted to check if it was working on
PowerPC, it took some times to get all the dependencies installed on my old
computer but finally I got mcomix displayed on the screen… and dying on PDF
loading! Crash seems related to GTK and I don’t want to touch that, nobody will
want to patch GTK for that anyway so I’ve lost hope there.
10 - Looking for alternative
Once I knew about mcomix, I was able to search the Internet for alternatives of
it and also CBR readers. A program named zathura seems well known here and
we have it in the OpenBSD ports tree.
Weird thing is that it comes with two different PDF plugins, one named
mupdf and the other one poppler. I did try quickly on my amd64 machine
and zathura was working.
11 - Zathura on PowerPC
As Zathura was working nice on my main computer, I installed it on the PowerPC,
first with the poppler plugin, I was able to view PDF, but installing this
plugin did pull so many packages dependencies it was a bit sad. I deinstalled
the poppler PDF plugin and installed mupdf plugin.
I opened a PDF and… error. I tried again but starting zathura from the
terminal, and I got the message that PDF is not a supported format, with a lot
of lines related to mupdf.so file not being usable. The mupdf plugin work on
amd64 but is not usable on powerpc, this is a bug I need to report, I don’t
understand why this issue happens but it’s here.
12 - Back to square one
It seems that reading PDF is a mess, so why couldn’t I convert the PDF to CBT
files and then use any CBT reader out there and not having to deal with that
PDF madness!!
13 - Use big calibre for the job
I have found on the Internet that Calibre is the most used tool to convert a
PDF into CBT files (or into something else but I don’t really care here). I
installed calibre, which is not lightweight, started it and wanted to change
the default library path, the software did hang when it displayed the file
dialog. This won’t stop me, I restart calibre and keep the default path, I
click on « Add a book » and then it hang again on file dialog. I did report
this issue on ports@ mailing list, but it didn’t solve the issue and this mean
calibre is not usable.
14 - Using the command line
After all, CBT files are images in a tar file, it should be easy to reproduce
the mcomix process involving mutool to render pictures and make a tar of that.
IT WORKED.
I found two ways to proceed, one is extremely fast but may not make pages in
the correct order, the second requires CPU time.
Making CBT files - easiest process
The first way is super easy, it requires mutool (from mupdf package) and it
will extract the pictures from the PDF, given it’s not a vector PDF, not sure
what would happen on those. The issue is that in the PDF, the embedded pictures
have a name (which is a number from the few examples I found), and it’s not
necessarily in the correct order. I guess this depend how the PDF is made.
$ mutool extract The_PDF_file.pdf
$ tar cvf The_PDF_file.tar *jpg
That’s all you need to have your CBT file. In my PDF there was jpg files in it,
but it may be png in others, I’m not sure.
Making CBT files - safest process (slow)
The other way of making pictures out of the PDF is the one used in mcomix, call
mutool for rendering each page as a PNG file using width/height/DPI you
want. That’s the tricky part, you may not want to produce pictures with larger
resolution than the original pictures (and mutool won’t automatically help you
for this) because you won’t get any benefit. This is the same for the DPI. I
think this could be done automatically using a correct script checking each PDF
page resolution and using mutool to render the page with the exact same
resolution.
As a rule of thumb, it seems that rendering using the same width as your screen
is enough to produce picture of the correct size. If you use large values, it’s
not really an issue, but it will create bigger files and take more time for
rendering.
$ mutool draw -w 1920 -o page%d.png The_PDF_file.pdf
$ tar cvf The_PDF_file.tar page*.png
You will get PNG files for each page, correctly numbered, with a width of 1920
pixels. Note that instead of tar, you can use zip to create a zip file.
15 - Finally reading books again
After all this LONG process, I was finally able to read my PDF with any CBR
reader out there (even on phone), and once the process is done, it uses no cpu
for viewing files at the opposite of mcomix rendering all the pages when you
open a file.
I have to use zathura on PowerPC, even if I like it less due to the continuous
pages display (can’t be turned off), but mcomix definitely work great when not
dealing with PDF. I’m still unsure it’s worth committing mcomix to the ports
tree if it fails randomly on random pages with PDF.
16 - Being an open source activist is exhausting
All I wanted was to read a PDF book with a warm cup of tea at hand.
It ended into learning new things, debugging code, making ports, submitting
bugs and writing a story about all of this.