About me: My name is Solène Rapenne, pronouns she/her. I like learning and sharing knowledge. Hobbies: '(BSD OpenBSD Qubes OS Lisp cmdline gaming security QubesOS internet-stuff). I love percent and lambda characters. OpenBSD developer solene@. No AI is involved in this blog.

Contact me: solene at dataswamp dot org or @solene@bsd.network (mastodon).

I'm a freelance OpenBSD, FreeBSD, Linux and Qubes OS consultant, this includes DevOps, DevSecOps, technical writing or documentation work. If you enjoy this blog, you can sponsor my open source work financially so I can write this blog and contribute to Free Software as my daily job.

Port of the week: pup

Written by Solène, on 22 April 2021.
Tags: #internet

Comments on Fediverse/Mastodon

1. Introduction §

Today I will introduce you to the utility "pup" providing CSS selectors filtering for HTML documents. It is a perfect companion to curl to properly fetch only a specific data from an HTML page.

On OpenBSD you can install it with pkg_add pup and check its documentation at /usr/local/share/doc/pup/README.md

pup official project

2. Examples §

pup is quite easy to use once you understand the filters. Let's see a few examples to illustrate practical uses.

2.1. Fetch my blog titles list to a JSON format §

The following command will returns a JSON structure with an array of data from the tags matching "a" tags with in "h4" tags.

curl https://dataswamp.org/~solene/index.html | pup "h4 a json{}"

The output (only an extract here) looks like this:

[
 {
  "href": "2021-04-18-ipfs-bandwidth-mgmt.html",
  "tag": "a",
  "text": "Bandwidth management in go-IPFS"
 },
 {
  "href": "2021-04-17-ipfs-openbsd.html",
  "tag": "a",
  "text": "Introduction to IPFS"
 },
 [truncated]
 {
  "href": "2016-05-02-3.html",
  "tag": "a",
  "text": "How to add a route through a specific interface on FreeBSD 10"
 }
]

2.2. Fetch OpenBSD -current specific changes §

The page https://www.openbsd.org/faq/current.html contains specific instructions that are required for people using OpenBSD -current and you may want to be notified for changes. Using pup it's easy to make a script to compare your last data to see what has been appended.

curl https://www.openbsd.org/faq/current.html | pup "h3 json{}"

Output sample as JSON, perfect for further processing with a scripting language.

[
 {
  "id": "r20201107",
  "tag": "h3",
  "text": "2020/11/07 - iked.conf \u0026#34;to dynamic\u0026#34;"
 },
 {
  "id": "r20210312",
  "tag": "h3",
  "text": "2021/03/12 - IPv6 privacy addresses renamed to temporary addresses"
 },
 {
  "id": "r20210329",
  "tag": "h3",
  "text": "2021/03/29 - [packages] yubiserve replaced with yubikeyedup"
 }
]

I provide a RSS feed for that

3. Conclusion §

There are many possibilities with pup and I won't list them all. I highly recommend reading the README.md file from the project because it's its documentation and explains the syntax for filtering.