About me: My name is Solène Rapenne, pronouns she/her. I like learning and sharing knowledge. Hobbies: '(BSD OpenBSD Qubes OS Lisp cmdline gaming security QubesOS internet-stuff). I love percent and lambda characters. OpenBSD developer solene@. No AI is involved in this blog.

Contact me: solene at dataswamp dot org or @solene@bsd.network (mastodon).

I'm a freelance OpenBSD, FreeBSD, Linux and Qubes OS consultant, this includes DevOps, DevSecOps, technical writing or documentation work. If you enjoy this blog, you can sponsor my open source work financially so I can write this blog and contribute to Free Software as my daily job.

Host your own wikipedia backup

Written by Solène, on 13 November 2019.
Tags: #openbsd #wikipedia #life

Comments on Fediverse/Mastodon

Wikipedia and openzim

If you ever wanted to host your own wikipedia replica, here is the simplest way.

As wikipedia is REALLY huge, you don’t really want to host a php wikimedia software and load the huge database, instead, the project made the openzim format to compress the huge database that wikipedia became while allowing using it for fast searches.

Sadly, on OpenBSD, we have no software reading zim files and most software requires the library openzim to work which requires extra work to get it as a package on OpenBSD.

Hopefully, there is a python package implementing all you need as pure python to serve zim files over http and it’s easy to install.

This tutorial should work on all others unix like systems but packages or binary names may change.

Downloading wikipedia

The project Kiwix is responsible for wikipedia files, they create regularly files from various projects (including stackexchange, gutenberg, wikibooks etc…) but for this tutorial we want wikipedia: https://wiki.kiwix.org/wiki/Content_in_all_languages

You will find a lot of files, the language is contained into the filename. Some filenames will also self explain if they contain everything or categories, and if they have pictures or not.

The full French file is 31.4 GB worth.

Running the server

For the next steps, I recommend setting up a new user dedicated to this.

On OpenBSD, we will require python3 and pip:

$ doas pkg_add py3-pip--

Then we can use pip to fetch and install dependencies for the zimply software, the flag --user is rather important as it allows any user to download and install python libraries in its home folder instead of polluting the whole system as root.

$ pip3.7 install --user --upgrade zimply 

I wrote a small script to start the server using the zim file as a parameter, I rarely write python so the script may not be high standard.

File server.py:

from zimply import ZIMServer
import sys
import os.path
    
if len(sys.argv) == 1:
    print("usage: " + sys.argv[0] + " file")
    exit(1)
    
if os.path.exists(sys.argv[1]):
    ZIMServer(sys.argv[1])
else:
    print("Can't find file " + sys.argv[1])

And then you can start the server using the command:

$ python3.7 server.py /path/to/wikipedia_fr_all_maxi_2019-08.zim

You will be able to access wikipedia on the url http://localhost:9454/

Note that this is not a “wiki” as you can’t see history and edit/create pages.

This kind of backup is used in place like Cuba or Africa areas where people don’t have unlimited internet access, the project lead by Kiwix allow more people to access knowledge.