About me: My name is Solène Rapenne, pronouns she/her. I like learning and sharing knowledge. Hobbies: '(BSD OpenBSD Qubes OS Lisp cmdline gaming security QubesOS internet-stuff). I love percent and lambda characters. OpenBSD developer solene@. No AI is involved in this blog.

Contact me: solene at dataswamp dot org or @solene@bsd.network (mastodon).

I'm a freelance OpenBSD, FreeBSD, Linux and Qubes OS consultant, this includes DevOps, DevSecOps, technical writing or documentation work. If you enjoy this blog, you can sponsor my open source work financially so I can write this blog and contribute to Free Software as my daily job.

Securing backups using S3 storage

Written by Solène, on 19 October 2024.
Tags: #security #network #backup

Comments on Fediverse/Mastodon

1. Introduction §

In this blog post, you will learn how to make secure backups using Restic and a S3 compatible object storage.

Backups are incredibly important, you may lose important files that only existed on your computer, you may lose access to some encrypted accounts or drives, when you need backups, you need them to be reliable and secure.

There are two methods to handle backups:

  • pull backups: a central server connects to the system and pulls data to store it locally, this is how rsnapshot, backuppc or bacula work
  • push backups: each system run the backup software locally to store it on the backup repository (either locally or remotely), this is how most backups tool work

Both workflows have pros and cons. The pull backups are not encrypted, and a single central server owns everything, this is rather bad from a security point of view. While push backups handle all encryption and accesses to the system where it runs, an attacker could destroy the backup using the backup tool.

I will explain how to leverage S3 features to protect your backups from an attacker.

2. Quick intro to object storage §

S3 is the name of an AWS service used for Object Storage. Basically, it is a huge key-value store in which you can put data and retrieve it, there are very little metadata associated with an object. Objects are all stored in a "bucket", they have a path, and you can organize the bucket with directories and subdirectories.

Buckets can be encrypted, which is an important feature if you do not want your S3 provider to be able to access your data, however most backup tools already encrypt their repository, so it is not really useful to add encryption to the bucket. I will not explain how to use encryption in the bucket in this guide, although you can enable it if you want. Using encryption requires more secrets to store outside of the backup system if you want to restore, and it does not provide real benefits because the repository is already encrypted.

S3 was designed to be highly efficient for retrieving / storage data, but it is not a competitor to POSIX file systems. A bucket can be public or private, you can host your website in a public bucket (and it is rather common!). A bucket has permissions associated to it, you certainly do not want to allow random people to put files in your public bucket (or list the files), but you need to be able to do so.

The protocol designed around S3 was reused for what we call "S3-compatible" services on which you can directly plug any "S3-compatible" client, so you are not stuck with AWS.

This blog post exists because I wanted to share a cool S3 feature (not really S3 specific, but almost everyone implemented this feature) that goes well with backups: a bucket can be versioned. So, every change happening on a bucket can be reverted. Now, think about an attacker escalating to root privileges, they can access the backup repository and delete all the files there, then destroy the server. With a backup on a versioned S3 storage, you could revert your bucket just before the deletion happened and recover your backup. In order to prevent this, the attacker should also get access to the S3 storage credentials, which is different from the credentials required to use the bucket.

Finally, restic supports S3 as a backend, and this is what we want.

2.1. Open source S3-compatible storage implementations §

There is a list of open source and free S3-compatible storage, I played with them all, and they have different goals and purposes, they all worked well enough for me:

Seaweedfs GitHub project page

Garage official project page

Minio official project page

A quick note about those:

  • I consider seaweedfs to be the Swiss army knife of storage, you can mix multiple storage backends and expose them over different protocols (like S3, HTTP, WebDAV), it can also replicate data over remote instances. You can do tiering (based on last access time or speed) as well.
  • Garage is a relatively new project, it is quite bare bone in terms of features, but it works fine and support high availability with multiple instances, it only offers S3.
  • Minio is the big player, it has a paid version (which is extremely expensive) although the free version should be good enough for most users.

3. Configure your S3 §

You need to pick a S3 provider, you can self-host it or use a paid service, it is up to you. I like backblaze as it is super cheap, with $6/TB/month, but I also have a local minio instance for some needs.

Create a bucket, enable the versioning on it and define the data retention, for the current scenario I think a few days is enough.

Create an application key for your restic client with the following permissions: "GetObject", "PutObject", "DeleteObject", "GetBucketLocation", "ListBucket", the names can change, but it needs to be able to put/delete/list data in the bucket (and only this bucket!). After this process done, you will get a pair of values: an identifier and a secret key

Now, you will have to provide the following environment variables to restic when it runs:

  • AWS_DEFAULT_REGION which contains the region of the S3 storage, this information is given when you configure the bucket.
  • AWS_ACCESS_KEY which contains the access key generated when you created the application key.
  • AWS_SECRET_ACCESS_KEY which contains the secret key generated when you created the application key.
  • RESTIC_REPOSITORY which will look like s3:https://$ENDPOINT/$BUCKET with $ENDPOINT being the bucket endpoint address and $BUCKET the bucket name.
  • RESTIC_PASSWORD which contains your backup repository passphrase to encrypt it, make sure to write it down somewhere else because you need it to recover the backup.

If you want a simple script to backup some directories, and remove old data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly backups:

restic backup -x /home /etc /root /var
restic forget --prune -H 5 -d 2 -w 2 -m 2

Do not forget to run restic init the first time, to initialize the restic repository.

4. Conclusion §

I really like this backup system as it is cheap, very efficient and provides a fallback in case of a problem with the repository (mistakes happen, there is not always need for an attacker to lose data ^_^').

If you do not want to use S3 backends, you need to know Borg backup and Restic both support an "append-only" method, which prevents an attacker from doing damages or even read the backup, but I always found the use to be hard, and you need to have another system to do the prune/cleanup on a regular basis.

5. Going further §

This approach could work on any backend supporting snapshots, like BTRFS or ZFS. If you can recover the backup repository to a previous point in time, you will be able to access to the working backup repository.

You could also do a backup of the backup repository, on the backend side, but you would waste a lot of disk space.