Download files listed in a http index with wget

Written by Solène, on 16 June 2020.
Tags: #wget #internet

Sometimes I need to download files through http from a list on an “autoindex” page and it’s always painful to find a correct command for this.

The easy solution is wget but you need to use the correct parameters because wget has a lot of mirroring options but you only want specific ones to achieve this goal.

I ended up with the following command:

wget --continue --accept "*.tgz" --no-directories --no-parent --recursive http://ftp.fr.openbsd.org/pub/OpenBSD/6.7/amd64/

This will download every tgz files available at the address given as last parameter.

The parameters given will filter to only download the tgz files, put the files in the current working directory and most important, don’t try to escape to the parent directory to start downloading again. The `–continue`` parameter allow to interrupt wget and start again, downloaded file will be skipped and partially downloaded files will be completed.

Do not reuse this command if files changed on the remote server because continue feature only work if your local file and the remote file are the same, this simply look at the local and remote names and will ask the remote server to start downloading at the current byte range of your local file. If meanwhile the remote file changed, you will have a mix of the old and new file.

Obviously ftp protocol would be better suited for this download job but ftp is less and less available so I find wget to be a nice workaround for this.