I notoriously like btrfs. Everything I do runs on a btrfs filesystem of some sorts. However, btrfs exhibits bugs, now and then. I’ve done more data recovery on btrfs volumes than I dare to admit, and that includes going around the infamous RAID5/6 write-hole.

I’ve been extremely lucky in my life so far. There’s this saying “there are those who have lost data, and those who still have to”. So far, I’m still in the second category, and I don’t plan to move to the former any time soon.

In order to stay in the no-loss category, people tend to back-up their data. Up to today, I have been using snapper, protecting against accidental deletion, and very occasionally running Déjàdup to dump my homedir on my home server. The former doesn’t protect against my laptop SSD being damaged, and the latter becomes useless if my apartment burns down while my laptop is there. My home server itself contains quite a bit of important data too, and that thing is still without back-up.

The reason for not having any off-site backup is not lazyness, but rather a lack of a KISS tool that I like. In my ideal world, I can point my back-up tool to a btrfs subvolume, and when things go haywire, ask it to go back in time, potentially restoring the full subvolume.

Introducing butterback

butterback (for btrfs back-up) is this back-up tool. It will do three things.

  • Maintaining local snapshots in function of backing them up (not yet implemented);
  • Encrypting said snapshots;
  • Dumping them in The Cloud™ (currently only Amazon Glacier)

This blog post is an attempt to document the current design and my (limited) vision.

snapper for the cloud

At time of writing, butterback only sends full, read-only subvolumes. This is temporary, and will become a non-default soon. The goal is to semi-intelligently decide what to do with a given subvolume, which is already reflected in the CLI:

butterback-backup 0.1.0

USAGE:
    butterback backup [OPTIONS] --key <key> --region <region> --subvolume <subvolume> --vault <vault> <SUBCOMMAND>

FLAGS:
    -h, --help
            Prints help information

    -V, --version
            Prints version information


OPTIONS:
    -i, --inventory <inventory>
            The filepath where `butterback` may store its inventory to compute incremental backups

            Defaults to `inventory.yaml` [default: inventory.yaml]
    -k, --key <key>
            The path to the public key that should encrypt this backup

    -r, --region <region>
            Name of the AWS region to upload to

    -s, --subvolume <subvolume>
            The path to the subvolume

    -t, --temp <temp>
            The path to where `butterback` should make volume snapshots

            Defaults to `subvolume/.butterback/`
    -v, --vault <vault>
            Name of the Glacier Vault to upload to


SUBCOMMANDS:
    full
    help           Prints this message or the help of the given subcommand(s)
    incremental

In incremental mode, the aim is to manage local snapshots in function of backing up the subvolume passed by --subvolume, not unlike snapper’s hourly snapshots. But unlike snapper, hourly snapshots do not need to stay on the local system, and should be discarded as soon as they are not needed for generating further back-ups.

In full mode, the end user will be responsible for managing snapshots.

Encryption

Making back-ups to a public cloud provider has a challenge of maintaining confidentiality. Assuming the standard “honest but curious” cloud provider, this means we have to encrypt and authenticate the transmitted data.

The design is relatively simple, and it has some composability in mind for the future. With butterback key-gen, you generate a key pair (butterback.priv, butterback.pub). The private key can be stowed away (securely, multiple locations), and is only used for restoring volumes. The public key is only used for backing up volumes. The reason for using a key pair (as opposed to a symmetric master key) is extendability and composability: Some advantages:

  • The machine making back-ups can be rendered unable to restore them without external help. This means that deleting a file on said machine makes it irretrievable without the private key, while the machine can still make back-ups.
  • The private key can be (trivially, because it’s a 25519 scalar) secret-shared, for instance printing five copies for storing at five different locations, but still requiring at least two or three copies. Think: RAID for keys.
  • The possibility for future introduction of more complex key systems.

The key pair is a ristretto255 key pair, which is practically Curve25519 with the cofactor eliminated. For every back-up, an ephemeral key pair is generated, which is pulled through Shake256 to yield a symmetric key. The symmetric key is then used as key in the Chacha20Poly1305 authenticated cipher, used to encrypt and authenticate the btrfs send stream in chunks of 1MB, with a simple counter as nonce.

This is a simple construction that should protect well enough, even against reordering attacks. I did not bother implementing something against trunaction attacks, while truncating after a 1MB + 16B block does not trigger a decryption or MAC error, btrfs itself will probably end up finding this problem. And after all, we’re in an “honest but curious” setting; confidentiality and simplicity are the main goals.

This is probably also a good place to document the volume image format:

ephemeral, compressed ristretto255 key | 32 B
[chunk of encrypted `btrfs send` | 1MB + 16B] * (N - 1)
[chunk of encrypted `btrfs send` | max 1MB + 16B/min 17B]

Cloud

Amazon S3 Glacier (Glacier) is a storage solution for “cold data.”

Other cloud providers have competing products. I pick Glacier because the name sticked, and nothing is withholding butterback to implement other providers. Pull requests welcome. Amazon also has Glacier Deep Archive, which could be interesting for storing “older” volumes in the future.

The amazing this about this, is its price. I would be looking at less than a euro per month to store everything except my Linux ISOs.

Conclusion

This is very preliminary software (at time of writing). Don’t use it, but do be interested, ping me on Twitter/Gitlab/Matrix/email, and let me know what you think.

The thing is written in Rust, and therefore shouldn’t take a lot of memory and CPU time. It looks like on my Threadripper 1920X, I’m not going to have any trouble hitting 300Mbps. If you have any reason to use my software for stuff that pushes beyond 3TB per day, please do get in touch.

Alternative software

This is a list that I may maintain (or may leave as is) to document related programs.

Marc’s bash script

While it does some (arguably enough) volume management, it uses ssh and doesn’t have encryption. It’s more meant to transfer from your own PC to your own server, it seems. For that reason, it may also be interesting to use it in cooperation with Butterback.

btrbck

Seems unmaintained, maybe there’s a fork for it. Also has volume management, but like Marc’s script, it’s also ssh without encryption, so also for your own server. It has quite a bit of management built-in for managing retention periods and whatnot.

btrbk (added 11 June 2021)

Encrypted btrfs send for multiple sources and targets, but public cloud is no such target. Uses GnuPG for encryption, with its pros and cons. More complex, more features, seems well maintained too. It might make sense to add public cloud providers to btrbk.