About me: My name is Solène Rapenne, pronouns she/her. I like learning and sharing knowledge. Hobbies: '(NixOS BSD OpenBSD Lisp cmdline gaming security QubesOS internet-stuff). I love percent and lambda characters. OpenBSD developer solene@.

Contact me: solene+www at dataswamp dot org or @solene@bsd.network (mastodon). If for some reason you want to support my work, this is my paypal address: donate@perso.pw.

Consider sponsoring me on Patreon to help me writing this blog and contributing to Free Software as my daily job.

My open-source machine learning toolbox

Written by Solène, on 04 October 2022.
Tags: #linux #opensource #machinelearning #ml

Comments on Fediverse/Mastodon

1. Introduction §

I recently got interested into what's possible with machine learning programs, and this has been an exciting journey. Let me share about a few programs I added to my toolbox.

They all work well on NixOS, but they might require specific instructions to work except for upscayl and whisper that are in nixpkgs. However, it's not that hard, but may not be accessible to everyone.

2. Whisper §

This program analyzes audio content of an audio or video file, and make a transcript of it. It supports many languages, I tried it with English, French and Japanese, and it worked very reliably.

Not only it creates a transcript text file, but it also generates a subtitles (.srt) file, you can create video subtitles automatically. It has a translation function which pass all the transcript text to Google translate and give you the result in English.

It's quite slow using a CPU, but it definitely works, using a GPU gives an 80 times speed boost.

It requires a weight to work, it exists in different sizes: tiny, small, base, medium, large, and each has an English only variant that is smaller. It will download them automatically on demand in the ~/.cache/whisper/ directory.

whisper GitHub project page

3. Stable-diffusion §

This program can be used to generate pictures from a sentence, it's actually very effective. You need a weight file which is like a database on how to interpret stuff in the sentence.

You need an account on https://huggingface.co/CompVis/stable-diffusion-v-1-4-original to download the free weight file (4 GB).

a man on a horse, black and white

Solid Snakes on a unicorn in a cyberpunk style

stable-diffusion GitHub project page

stable-diffusion GitHub project page with openvino support for CPU based rendering

4. DeOldify.NET §

This program can be used to colorize a picture. The weights are provided. This works well without a GPU.

I tried to use it on mangas, it works to some extent, it adds some shading and identify things with colors, but the colorization isn't reliable and colors may be weird. However, this improves readability for me 👍🏻.

a man on a horse, black and white but colorized with DeOldify

DeOldify.NET GitHub project page

5. Upscayl §

This program upscales a picture to 4 times its resolution, the result can be very impressive, but in some situation it gives a "plastic" and unnatural feeling.

I've been very impressed by it, I've been able to improve some old pictures taken with a poor phone.

a man on a horse, black and white but colorized with DeOldify and upscaled with Upscayl

Upscayl GitHub project page

6. Going further §

If you know some tools in that kind that could interest me, please share! :) Especially if it's something to colorize mangas 😁.