ZFS Fileserver: Automated snapshots

24 March 2024
zfs,
fileserver,
lxc,
proxmox,
snapshots,
backup server

Backing up the new ZFS Fileserver #

In the last post, I got the LXC fileserver working with a ZFS dataset passed through as a bind mount. I ran some quick benchmarks and discovered that the container is significantly faster than the old VM in many workloads.

Before ditching the VM fileserver completely, I still have to nail down a backup solution. Proxmox Backup Server can backup the container's primary disk and its configuration file, but it won't be able to backup the data on the bind mount.

ZFS snapshots #

Since my Proxmox Backup Server uses ZFS as well, I should be able to just use built-in zfs tools to snapshot the fileserver and then use zfs send/receive to send them to the backup server. Then I should be able to prune them with a little script.

There are a number of solutions to automate this process, including zfs_autobackup and Sanoid / Syncoid.

But before I try one of those auto-magic scripts, I want to do it manually a few times just so I understand the process.

Learning to snapshot and zfs send/receive #

From what I understand, the first snapshot I take of the dataset will be a full copy of the data. I'll be able to just send incrementals to the remote server after that.

So let's take the first snapshot. Connected to the Proxmox host with ssh:

root@thinkstation:~# zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
fast                     2.43T  4.79T  4.00G  /fast
fast/fileserver            96K  4.79T    96K  /fast/fileserver

There she is. Let's snapshot:

zfs snapshot fast/fileserver@initial

No output... where'd it go?

root@thinkstation:~# zfs list -t snapshot
NAME                      USED  AVAIL  REFER  MOUNTPOINT
fast/fileserver@initial     0B      -    96K  -
root@thinkstation:~#

Cool! But where is that?

root@thinkstation:/fast/fileserver/.zfs/snapshot# ls
initial

So snapshots are stored in the root of the dataset, in a hidden folder called .zfs/snapshot -- got it.

Now how do I send this to the zfs pool on the proxmox backup server?

First I need to create a new dataset on the backup server for the snapshots. I created another dataset inside that one just for the fileserver.

root@thinkstation2:~# zfs create bigpool/snapshots
root@thinkstation2:~# zfs create bigpool/snapshots/fileserver
root@thinkstation2:~# zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
bigpool                       2.76T  18.9T   104K  /bigpool
bigpool/pbs                   2.72T  18.9T  2.72T  /bigpool/pbs
bigpool/snapshots              192K  18.9T    96K  /bigpool/snapshots
bigpool/snapshots/fileserver    96K  18.9T    96K  /bigpool/snapshots/fileserver

Ok, let's try to send it.

zfs send fast/fileserver@intial | ssh thinkstation2 zfs receive bigpool/snapshots/fileserver

I get a warning about overwriting the dataset, must add -F to continue. Sure!

Did it work?

root@thinkstation2:~# ls /bigpool/snapshots/fileserver/.zfs/snapshot/
initial

Heck yea! Now I know how to create the snapshots. How do I restore them?

Restoring a snapshot #

First, I'll try a local snapshot restore. I created a file on the fileserver named "ghost". Let's rollback the fileserver to the initial state and make the ghost disapppear.

zfs rollback fast/fileserver@initial

It worked! The ghost disappeared:

brian@thinkpad-x390 ~/test $ ls
ghost-file
brian@thinkpad-x390 ~/test $ ls
brian@thinkpad-x390 ~/test $

Now can we do the same thing with the remote snapshot?

root@thinkstation:~# ssh thinkstation2 "zfs send bigpool/snapshots/fileserver@initial" | zfs receive -F fast/fileserver

It worked again! No ghost:

brian@thinkpad-x390 ~/test $ touch ghost
brian@thinkpad-x390 ~/test $ ls
brian@thinkpad-x390 ~/test $

Alright! I just wanted to make sure I understood the process before I started running random scripts from github. Knowing how to manually snapshot, send and receive, restore, and delete seems like enough to get started with some automation.

Automating the process #

Sanoid/Syncoid looks like a solid choice to test out first. The author's great posts on various zfs forums are what caused me to go down that volblocksize rabbit hole in the last few posts.

Sanoid creates and prunes snapshots according to policy. The policy is contained in /etc/sanoid.conf, and it's pretty easy to understand.

Syncoid then syncs those snapshots from one machine to another.

I will be using both tools to accomplish my goal here.

Sanoid will run on both the fileserver (Proxmox host) and the backup server. It will create the snapshots on the fileserver, and it will prune the snapshots on the backup server.

Syncoid will only run on the backup server, to pull the snapshots at a certain interval. There are security benefits to using this "pull" method, namely that if the primary server is compromised, the attacker does not also have access to the backup server. The backup server only needs a ssh key and read access in order to complete the pull. If you were "pushing", the primary server would have write access to snapshots on the backup server.

Before trying to configure Sanoid, I tested Syncoid. I ssh'd into backup server and pulled the initial snapshot from the fileserver:

root@thinkstation2:~# syncoid --no-privilege-elevation thinkstation:fast/fileserver bigpool/snapshots/fileserver

root@thinkstation2:~# ls /bigpool/snapshots/fileserver/.zfs/snapshot
initial  syncoid_thinkstation2_2024-03-24:21:36:29-GMT-04:00

That worked great. Next, I need to configure the Sanoid policy on the fileserver to create and retain snapshots.

/etc/sanoid/sanoid.conf on the fileserver:

root@thinkstation:/etc/sanoid# cat sanoid.conf
[fast/fileserver]
        use_template = production


## templates

[template_production]
        frequently = 4
        hourly = 24
        daily = 3
        monthly = 0
        yearly = 0
        autosnap = yes
        autoprune = yes

[template_backup]
        autoprune = yes
        frequently = 0
        hourly = 72
        daily = 30
        monthly = 6
        yearly = 0
        autosnap = no

It's short and sweet. Name the pool/dataset at the top, and specify a template. Then write the templates at the bottom.

The production template I made here will take a snapshot every 15 minutes (frequently = 4, 4 per hour), 1 per hour, 1 per day. It will retain 4 of the frequents, 24 of the hourlies, and the last 3 dailies. That should be good enough for rewinding mistakes -- I don't want to clutter it up with any more than that.

The backup template will retain 0 of the 15-minute snapshots, 72 of the hourlies, 30 dailies, and 6 monthlies. That should be plenty for my homelab.

autosnap and autoprune are important options here. autosnap tells Sanoid to create the snapshots at the specified intervals. autoprune tells Sanoid it's okay to delete snapshots that have expired according to the policy.

For my setup, I want autosnap enabled on the fileserver so it will create snapshots. It needs to be disabled on the backup server though, or else the backup server will start snapshotting the snapshots! autoprune needs to be enabled on both machines, since they both need to delete snapshots according to their respective policies.

Default systemd units look for Sanoid in /usr/local/sbin, so it didn't launch at first. Added a symbolic link there to /sbin/sanoid for now.

This config has been running in the background while I type all this mess, so let's see if it's working:

root@thinkstation:/fast/fileserver/.zfs/snapshot# ls -t
autosnap_2024-03-25_23:00:06_frequently
autosnap_2024-03-25_23:00:06_hourly
autosnap_2024-03-25_22:45:05_frequently
autosnap_2024-03-25_22:30:00_frequently
autosnap_2024-03-25_22:15:06_frequently
autosnap_2024-03-25_22:00:06_frequently
initial

The times are in UTC to avoid DST issues. You can list them newest to oldest with ls -t (ls -tr oldest to newest).

Looking good so far! Now I'll just copy and paste that same sanoid.conf to the backup server. Only need to change the path to the pool and the template:

[bigpool/snapshots/fileserver]
        use_template = backup

Let's run the syncoid command again and see if it worked:

root@thinkstation2:/bigpool/snapshots/fileserver/.zfs/snapshot# ls -lt

drwxrwxrwx 1 root root 0 Mar 25 19:51 syncoid_thinkstation2_2024-03-25:19:51:20-GMT-04:00
drwxrwxrwx 1 root root 0 Mar 25 19:00 autosnap_2024-03-25_23:00:06_hourly
drwxrwxrwx 1 root root 0 Mar 25 18:00 autosnap_2024-03-25_22:00:06_hourly
drwxrwxrwx 1 root root 0 Mar 25 17:30 autosnap_2024-03-25_21:30:00_daily
drwxrwxrwx 1 root root 0 Mar 25 17:30 autosnap_2024-03-25_21:30:00_hourly
drwxrwxrwx 1 root root 0 Mar 24 16:49 initial

Perfect. It's deleting everything except what's specified in the 'backup' policy template.

Now we're ready to schedule this process.

Running syncoid periodically #

Now the backup server just needs a systemd timer that will run the syncoid service periodically. Since I'm only pulling hourly or older snapshots to the backup server, a timer that runs hourly should be fine.

First I made the syncoid.service file. Just copied the sanoid.service file and changed a couple words. (symbolic link in /usr/local/sbin for /sbin/syncoid)

root@thinkstation2:/etc/systemd/system# cat syncoid.service 

[Unit]
Description=Replicate snapshots
Requires=zfs.target
After=zfs.target
Wants=sanoid-prune.service
Before=sanoid-prune.service
ConditionFileNotEmpty=/etc/sanoid/sanoid.conf

[Service]
Environment=TZ=UTC
Type=oneshot
ExecStart=/usr/local/sbin/syncoid --no-privilege-elevation thinkstation:fast/fileserver bigpool/snapshots/fileserver

Then I made the timer file, which will call the service file:

root@thinkstation2:/etc/systemd/system# cat syncoid.timer

[Unit]
Description=Run Syncoid Every Hour
Requires=syncoid.service

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

Let's see if they work:

root@thinkstation2:~# systemctl status syncoid


Mar 25 22:08:54 thinkstation2 systemd[1]: Finished syncoid.service - Replicate snapshots.

Great! And is pruning working?

root@thinkstation2:/bigpool/snapshots/fileserver/.zfs/snapshot# ls -lt

drwxrwxrwx 1 root root 0 Mar 25 22:08 syncoid_thinkstation2_2024-03-26:02:08:49-GMT00:00
drwxrwxrwx 1 root root 0 Mar 25 22:00 autosnap_2024-03-26_02:00:06_hourly
drwxrwxrwx 1 root root 0 Mar 25 21:00 autosnap_2024-03-26_01:00:06_hourly
drwxrwxrwx 1 root root 0 Mar 25 20:00 autosnap_2024-03-26_00:00:01_daily

Heck yea! That was easier than I thought it would be.

I'll let this run for a few days to make sure I didn't miss anything, but then it's time to send that fileserver VM to the trash and start using the new container!

For now, I will still use Proxmox Backup Server for everything else.

Bonus: Enable "previous versions" in Windows #

Samba can use vfs shadow_copy2 to enable the "Previous Versions" feature when you right-click on files on Windows machines.

It was super easy to turn on, just added the following at the bottom of the file share in /etc/samba/smb.conf:

;[enable shadow copies in windows]
vfs objects = shadow_copy2
shadow:snapdir = .zfs/snapshot
shadow:sort = desc
shadow:format = autosnap_%Y-%m-%d_%H:%M:%S_hourly

The container has samba 4.2 so it will only allow me to select one snapshot type (frequently, hourly, daily, etc). Hourly seems good.

That was really fun to learn! Thanks for reading and happy homelabbing.

Previous: ZFS Fileserver: LXC+Dataset vs VM+Zvol
Next: ZFS on Thinkpad: Automated Snapshots