Understanding ZFS: Replication, Archive and Backup
Posted on November 6, 2008
As with other features of ZFS, the traditionally complex is made simple and straight forward. This simplification can coax administrators into a false complacency.
In ZFS, backup, archive, migration… any activity that fundamentally involves the movement of data from one system to another, is a replication activity. I propose that the traditional idea of weekly backups is, in fact, just really slow crappy replication. An HA Cluster replicates every 5 seconds, but your website replicates once a week…. its really the same thing, just via different interval and possibly different tools. So understand that when I say “replication” I refer to all forms of data movement, both intra- and inter- system.
ZFS replication is preformed through the use of two simplistic subcommands: zfs send and zfs recv. These are commands that utilize STDIN and STDOUT…. and why? Pipes my friend, pipes. Rather than bake piles of functionality into these commands, Matt Ahrens and the ZFS team opted to instead make them very simple and utilize the traditional UNIX ideology of connecting things together for something even better.
Lets look at a simple intra-system example of replication. Lets say that I have a workstation with an couple internal disks, perhaps a RAIDZ, who knows, and I then attach a USB or Firewire external drive on which I create a pool called “backups”. Lets now migrate a simple dataset from my local “data” pool to my external drives “backups” pool:
root@quadra ~$ zfs list -r data NAME USED AVAIL REFER MOUNTPOINT data 222K 218M 19K /data data/home 114K 200M 24K /data/home data/home/benr 18K 200M 18K /data/home/benr data/home/conradr 18K 200M 18K /data/home/conradr data/home/glennr 18K 200M 18K /data/home/glennr data/home/novar 18K 200M 18K /data/home/novar data/home/tamr 18K 200M 18K /data/home/tamr root@quadra ~$ zfs list -r backups NAME USED AVAIL REFER MOUNTPOINT backups 67.5K 218M 18K /backups root@quadra ~$ zfs snapshot data/home/benr@001 root@quadra ~$ zfs send data/home/benr@001 | zfs recv -d backups root@quadra ~$ zfs list -r backups NAME USED AVAIL REFER MOUNTPOINT backups 191K 218M 19K /backups backups/home 106K 218M 18K /backups/home backups/home/benr 88K 218M 88K /backups/home/benr backups/home/benr@001 0 - 88K -
Lets step through this together.
Replication is always based on a static point in time, meaning a snapshot. We create a snapshot of the dataset(s) we want to replicate, in this case the snapshot “001” of benr’s home directory. Using the zfs send command we can send that snapshot to STDOUT. Using a UNIX Pipe, that STDOUT gets sent to the STDIN of the zfs recv command, which has been told via the -d backups argument that I want to preserve the dataset name and heirarchy under the “backups” dataset. This could just as easily a “backups/data-pool” dataset under which things are created, like so:
root@quadra ~$ zfs destroy -r backups/home root@quadra ~$ zfs create backups/data-pool root@quadra ~$ zfs send data/home/benr@001 | zfs recv -d backups/data-pool root@quadra ~$ zfs list -r backups NAME USED AVAIL REFER MOUNTPOINT backups 217K 218M 20K /backups backups/data-pool 125K 218M 19K /backups/data-pool backups/data-pool/home 106K 218M 18K /backups/data-pool/home backups/data-pool/home/benr 88K 218M 88K /backups/data-pool/home/benr backups/data-pool/home/benr@001 0 - 88K -
What about incremental? I mean, I’ll want to freshen the copy right? This is done by created another snapshot, and then telling zfs send to only actually send the difference between the two:
root@quadra ~$ cp -r /etc/security/* /data/home/benr root@quadra ~$ zfs snapshot data/home/benr@002 root@quadra ~$ zfs list -r data/home/benr NAME USED AVAIL REFER MOUNTPOINT data/home/benr 379K 199M 355K /data/home/benr data/home/benr@001 24K - 88K - data/home/benr@002 0 - 355K - root@quadra ~$ zfs send -i data/home/benr@001 data/home/benr@002 | zfs recv -d backups/data-pool root@quadra ~$ zfs list -r backups/data-pool NAME USED AVAIL REFER MOUNTPOINT backups/data-pool 417K 217M 19K /backups/data-pool backups/data-pool/home 398K 217M 19K /backups/data-pool/home backups/data-pool/home/benr 379K 217M 355K /backups/data-pool/home/benr backups/data-pool/home/benr@001 24K - 88K - backups/data-pool/home/benr@002 0 - 355K -
So here I used ZFS send/recv almost exactly as before, but this time I tell zfs send about another snapshot from which to create an incremental. Notice that the zfs recv command didn’t change at all.
But what if I want to send it to another system? Easy, pipe the data through ssh (or rsh, or whatever) like so:
root@quadra ~$ zfs send data/home/benr@002 | ssh root@thumper.cuddletech.com zfs recv -d backups/data-pool
So thats the basics… but what does this mean? Lets get creative!
Firstly, we can write a script that every 30 seconds creates a new snapshot, and then thanks to pre-shared SSH keys can use SSH like above to recv the data elsewhere. Add a little error checking and presto! A really nice, simplistic replication scheme. Even if you have a lot of data change, if you copy it every 30 seconds its unlikely to build up into huge chunks that will take very long to move. When it comes to data that changes frequently, the key is to move early and often!
Now, say we don’t need that, simple backups are fine. We can create a script that creates a new snapshot each day at midnight, named the day of the week. When Wed comes around the old “wed” snapshot is removed and a new one created, and then we way create a simple script that zfs send/recv’s the Friday snapshot every weekend. Simple to do, plus we have those daily snapshots to fall back on in a pinch, hopefully keeping us from going out to a remote copy.
So we’ve used pipes in a simple way, to securly transport our datastream from one system to another. Consider other unique possiblities, such as piping zfs send… into gzip before sending across the network!
Or…. say what you really want is a portable dump of your ZFS dataset(s). Remember that we’re outputting a datastream from zfs send… just re-direct STDOUT to a file!
root@quadra ~$ zfs send data/home/benr@002 > /tmp/home-benr.zdump root@quadra ~$ ls -lh /tmp/home-benr.zdump -rw-r--r-- 1 root root 421K Nov 6 15:14 /tmp/home-benr.zdump
Now lets test a restore from this “zdump”:
root@quadra ~$ zfs create backups/dump-restore root@quadra ~$ cat /tmp/home-benr.zdump | zfs recv -d backups/dump-restore root@quadra ~$ zfs list -r backups/dump-restore NAME USED AVAIL REFER MOUNTPOINT backups/dump-restore 392K 217M 19K /backups/dump-restore backups/dump-restore/home 373K 217M 18K /backups/dump-restore/home backups/dump-restore/home/benr 355K 217M 355K /backups/dump-restore/home/benr backups/dump-restore/home/benr@002 0 - 355K -
Works like a charm! Again, we can use pipes for fun here too. Lets say that we really want a dump that is encrypted and compressed!
root@quadra ~$ pktool genkey keystore=file outkey=zdump.key keytype=aes keylen=128 root@quadra ~$ zfs send data/home/benr@002 | gzip | encrypt -a aes -k zdump.key > /tmp/home_benr-AES256GZ.zdump
So we’ve output a datastream based on a snapshot (002), compressed it, encrypted it with 128bit AES and then dumped to file. We could just as easily dump it to a tape (/dev/rmt/0cbn or something) for archiving purposes.
Finally, what if we want to work on more than just a single snapshot. What if we want to send all the “home” datasets? For some time now (although just now arriving in Solaris 10) we’ve had recursive flags for both zfs snapshot and zfs send. Lets give it a try:
root@quadra ~$ zfs snapshot -r data/home@nov6 root@quadra ~$ zfs list -r data/home NAME USED AVAIL REFER MOUNTPOINT data/home 755K 199M 24K /data/home data/home@nov6 0 - 24K - data/home/benr 379K 199M 355K /data/home/benr data/home/benr@001 24K - 88K - data/home/benr@002 0 - 355K - data/home/benr@nov6 0 - 355K - data/home/conradr 88K 199M 88K /data/home/conradr data/home/conradr@nov6 0 - 88K - data/home/glennr 88K 199M 88K /data/home/glennr data/home/glennr@nov6 0 - 88K - data/home/novar 88K 199M 88K /data/home/novar data/home/novar@nov6 0 - 88K - data/home/tamr 88K 199M 88K /data/home/tamr data/home/tamr@nov6 0 - 88K - root@quadra ~$ zfs destroy -r backups/home root@quadra ~$ zfs list -r backups NAME USED AVAIL REFER MOUNTPOINT backups 86K 218M 20K /backups root@quadra ~$ zfs send -R data/home@nov6 | zfs recv -d backups root@quadra ~$ zfs list -r backups NAME USED AVAIL REFER MOUNTPOINT backups 902K 217M 18K /backups backups/home 755K 199M 24K /backups/home backups/home@nov6 0 - 24K - backups/home/benr 379K 199M 355K /backups/home/benr backups/home/benr@001 24K - 88K - backups/home/benr@002 0 - 355K - backups/home/benr@nov6 0 - 355K - backups/home/conradr 88K 199M 88K /backups/home/conradr backups/home/conradr@nov6 0 - 88K - backups/home/glennr 88K 199M 88K /backups/home/glennr backups/home/glennr@nov6 0 - 88K - backups/home/novar 88K 199M 88K /backups/home/novar backups/home/novar@nov6 0 - 88K - backups/home/tamr 88K 199M 88K /backups/home/tamr backups/home/tamr@nov6 0 - 88K -
Simple, just snapshot the parent dataset with the -r flag, then send the parent dataset snapshot with the -R flag. Otherwise, its all the same! And, of course, you can combine this with all our other pipe tricks just the same!
And so we see that using a single set of commands, we have simplistic and powerful replication, backup, and archive capabilities. A lot of power unleashed with just a little imagination; thats the ZFS way.