OpenBSD package manager pkg_add is known to be quite slow and using much bandwidth, I'm trying to figure out easy ways to improve it and I may nailed something today by replacing ftp(1) http client by curl.
Testing protocol §
I used on an OpenBSD -current amd64 the following command "pkg_add -u -v | head -n 70" which will check for updates of the 70 first packages and then stop. The packages tested are always the same so the test is reproducible.
The traditional "ftp" will be tested, but also "curl" and "curl -N".
The bandwidth usage has been accounted using "pfctl -s labels" by a match rule matching the mirror IP and reset after each test.
What happens when pkg_add runs §
Here is a quick intro to what happens in the code when you run pkg_add -u on http://
- pkg_add downloads the package list on the mirror (which could be considered to be an index.html file) which weights ~2.5 MB, if you add two packages separately the index will be downloaded twice.
- pkg_add will run /usr/bin/ftp on the first package to upgrade to read its first bytes and pipe this to gunzip (done from perl from pkg_add) and piped to signify to check the package signature. The signature is the list of dependencies and their version which is used by pkg_add to know if the package requires update and the whole package signify signature is stored in the gzip header if the whole package is downloaded (there are 2 signatures: signify and the packages dependencies, don't be mislead!).
- if everything is fine, package is downloaded and the old one is replaced.
- if there is no need to update, package is skipped.
- new package = new connection with ftp(1) and pipes to setup
Using FETCH_CMD variable it's possible to tell pkg_add to use another command than /usr/bin/ftp as long as it understand "-o -" parameter and also "-S session" for https:// connections. Because curl doesn't support the "-S session=..." parameter, I used a shell wrapper that discard this parameter.
Raw results §
I measured the whole execution time and the total bytes downloaded for each combination. I didn't show the whole results but I did the tests multiple times and the standard deviation is near to 0, meaning a test done multiple time was giving the same result at each run.
operation time to run data transferred --------- ----------- ---------------- ftp http:// 39.01 26 curl -N http:// 28.74 12 curl http:// 31.76 14 ftp https:// 76.55 26 curl -N https:// 55.62 15 curl https:// 54.51 15
There are a few surprising facts from the results.
- ftp(1) not taking the same time in http and https, while it is supposed to reuse the same TLS socket to avoid handshake for every package.
- ftp(1) bandwidth usage is drastically higher than with curl, time seems proportional to the bandwidth difference.
- curl -N and curl performs exactly the same using https.
Using http:// is way faster than https://, the risk is about privacy because in case of man in the middle the download packaged will be known, but the signify signature will prevent any malicious package modification to be installed. Using 'FETCH_CMD="/usr/local/bin/curl -L -s -q -N"' gave the best results.
However I can't explain yet the very different behaviors between ftp and curl or between http and https.
Extra: set a download speed limit to pkg_add operations §
By using curl as FETCH_CMD you can use the "--limit-rate 900k" parameter to limit the transfer speed to the given rate.