Understanding ZFS: Replication, Archive and Backup

Posted on November 6, 2008

As with other features of ZFS, the traditionally complex is made simple and straight forward. This simplification can coax administrators into a false complacency.

In ZFS, backup, archive, migration… any activity that fundamentally involves the movement of data from one system to another, is a replication activity. I propose that the traditional idea of weekly backups is, in fact, just really slow crappy replication. An HA Cluster replicates every 5 seconds, but your website replicates once a week…. its really the same thing, just via different interval and possibly different tools. So understand that when I say “replication” I refer to all forms of data movement, both intra- and inter- system.

ZFS replication is preformed through the use of two simplistic subcommands: zfs send and zfs recv. These are commands that utilize STDIN and STDOUT…. and why? Pipes my friend, pipes. Rather than bake piles of functionality into these commands, Matt Ahrens and the ZFS team opted to instead make them very simple and utilize the traditional UNIX ideology of connecting things together for something even better.

Lets look at a simple intra-system example of replication. Lets say that I have a workstation with an couple internal disks, perhaps a RAIDZ, who knows, and I then attach a USB or Firewire external drive on which I create a pool called “backups”. Lets now migrate a simple dataset from my local “data” pool to my external drives “backups” pool:

root@quadra ~$ zfs list -r data
NAME                USED  AVAIL  REFER  MOUNTPOINT
data                222K   218M    19K  /data
data/home           114K   200M    24K  /data/home
data/home/benr       18K   200M    18K  /data/home/benr
data/home/conradr    18K   200M    18K  /data/home/conradr
data/home/glennr     18K   200M    18K  /data/home/glennr
data/home/novar      18K   200M    18K  /data/home/novar
data/home/tamr       18K   200M    18K  /data/home/tamr
root@quadra ~$ zfs list -r backups
NAME      USED  AVAIL  REFER  MOUNTPOINT
backups  67.5K   218M    18K  /backups

root@quadra ~$ zfs snapshot data/home/benr@001
root@quadra ~$ zfs send data/home/benr@001 | zfs recv -d backups

root@quadra ~$ zfs list -r backups
NAME                    USED  AVAIL  REFER  MOUNTPOINT
backups                 191K   218M    19K  /backups
backups/home            106K   218M    18K  /backups/home
backups/home/benr        88K   218M    88K  /backups/home/benr
backups/home/benr@001      0      -    88K  -

Lets step through this together.

Replication is always based on a static point in time, meaning a snapshot. We create a snapshot of the dataset(s) we want to replicate, in this case the snapshot “001” of benr’s home directory. Using the zfs send command we can send that snapshot to STDOUT. Using a UNIX Pipe, that STDOUT gets sent to the STDIN of the zfs recv command, which has been told via the -d backups argument that I want to preserve the dataset name and heirarchy under the “backups” dataset. This could just as easily a “backups/data-pool” dataset under which things are created, like so:

root@quadra ~$ zfs destroy -r backups/home
root@quadra ~$ zfs create backups/data-pool
root@quadra ~$ zfs send data/home/benr@001 | zfs recv -d backups/data-pool
root@quadra ~$ zfs list -r backups
NAME                              USED  AVAIL  REFER  MOUNTPOINT
backups                           217K   218M    20K  /backups
backups/data-pool                 125K   218M    19K  /backups/data-pool
backups/data-pool/home            106K   218M    18K  /backups/data-pool/home
backups/data-pool/home/benr        88K   218M    88K  /backups/data-pool/home/benr
backups/data-pool/home/benr@001      0      -    88K  -

What about incremental? I mean, I’ll want to freshen the copy right? This is done by created another snapshot, and then telling zfs send to only actually send the difference between the two:

root@quadra ~$ cp -r /etc/security/* /data/home/benr
root@quadra ~$ zfs snapshot data/home/benr@002
root@quadra ~$ zfs list -r data/home/benr
NAME                 USED  AVAIL  REFER  MOUNTPOINT
data/home/benr       379K   199M   355K  /data/home/benr
data/home/benr@001    24K      -    88K  -
data/home/benr@002      0      -   355K  -

root@quadra ~$ zfs send -i data/home/benr@001 data/home/benr@002 | zfs recv -d backups/data-pool
root@quadra ~$ zfs list -r backups/data-pool
NAME                              USED  AVAIL  REFER  MOUNTPOINT
backups/data-pool                 417K   217M    19K  /backups/data-pool
backups/data-pool/home            398K   217M    19K  /backups/data-pool/home
backups/data-pool/home/benr       379K   217M   355K  /backups/data-pool/home/benr
backups/data-pool/home/benr@001    24K      -    88K  -
backups/data-pool/home/benr@002      0      -   355K  -

So here I used ZFS send/recv almost exactly as before, but this time I tell zfs send about another snapshot from which to create an incremental. Notice that the zfs recv command didn’t change at all.

But what if I want to send it to another system? Easy, pipe the data through ssh (or rsh, or whatever) like so:

root@quadra ~$ zfs send data/home/benr@002 | ssh root@thumper.cuddletech.com zfs recv -d backups/data-pool

So thats the basics… but what does this mean? Lets get creative!

Firstly, we can write a script that every 30 seconds creates a new snapshot, and then thanks to pre-shared SSH keys can use SSH like above to recv the data elsewhere. Add a little error checking and presto! A really nice, simplistic replication scheme. Even if you have a lot of data change, if you copy it every 30 seconds its unlikely to build up into huge chunks that will take very long to move. When it comes to data that changes frequently, the key is to move early and often!

Now, say we don’t need that, simple backups are fine. We can create a script that creates a new snapshot each day at midnight, named the day of the week. When Wed comes around the old “wed” snapshot is removed and a new one created, and then we way create a simple script that zfs send/recv’s the Friday snapshot every weekend. Simple to do, plus we have those daily snapshots to fall back on in a pinch, hopefully keeping us from going out to a remote copy.

So we’ve used pipes in a simple way, to securly transport our datastream from one system to another. Consider other unique possiblities, such as piping zfs send… into gzip before sending across the network!

Or…. say what you really want is a portable dump of your ZFS dataset(s). Remember that we’re outputting a datastream from zfs send… just re-direct STDOUT to a file!

root@quadra ~$ zfs send data/home/benr@002 > /tmp/home-benr.zdump  
root@quadra ~$ ls -lh /tmp/home-benr.zdump
-rw-r--r-- 1 root root 421K Nov  6 15:14 /tmp/home-benr.zdump

Now lets test a restore from this “zdump”:

root@quadra ~$ zfs create backups/dump-restore               
root@quadra ~$ cat /tmp/home-benr.zdump | zfs recv -d backups/dump-restore
root@quadra ~$ zfs list -r backups/dump-restore
NAME                                 USED  AVAIL  REFER  MOUNTPOINT
backups/dump-restore                 392K   217M    19K  /backups/dump-restore
backups/dump-restore/home            373K   217M    18K  /backups/dump-restore/home
backups/dump-restore/home/benr       355K   217M   355K  /backups/dump-restore/home/benr
backups/dump-restore/home/benr@002      0      -   355K  -

Works like a charm! Again, we can use pipes for fun here too. Lets say that we really want a dump that is encrypted and compressed!

root@quadra ~$ pktool genkey keystore=file outkey=zdump.key keytype=aes keylen=128
root@quadra ~$ zfs send data/home/benr@002 | gzip | encrypt -a aes -k zdump.key > /tmp/home_benr-AES256GZ.zdump

So we’ve output a datastream based on a snapshot (002), compressed it, encrypted it with 128bit AES and then dumped to file. We could just as easily dump it to a tape (/dev/rmt/0cbn or something) for archiving purposes.

Finally, what if we want to work on more than just a single snapshot. What if we want to send all the “home” datasets? For some time now (although just now arriving in Solaris 10) we’ve had recursive flags for both zfs snapshot and zfs send. Lets give it a try:

root@quadra ~$ zfs snapshot -r data/home@nov6
root@quadra ~$ zfs list -r data/home
NAME                     USED  AVAIL  REFER  MOUNTPOINT
data/home                755K   199M    24K  /data/home
data/home@nov6              0      -    24K  -
data/home/benr           379K   199M   355K  /data/home/benr
data/home/benr@001        24K      -    88K  -
data/home/benr@002          0      -   355K  -
data/home/benr@nov6         0      -   355K  -
data/home/conradr         88K   199M    88K  /data/home/conradr
data/home/conradr@nov6      0      -    88K  -
data/home/glennr          88K   199M    88K  /data/home/glennr
data/home/glennr@nov6       0      -    88K  -
data/home/novar           88K   199M    88K  /data/home/novar
data/home/novar@nov6        0      -    88K  -
data/home/tamr            88K   199M    88K  /data/home/tamr
data/home/tamr@nov6         0      -    88K  -

root@quadra ~$ zfs destroy -r backups/home
root@quadra ~$ zfs list -r backups
NAME      USED  AVAIL  REFER  MOUNTPOINT
backups    86K   218M    20K  /backups

root@quadra ~$ zfs send -R data/home@nov6 | zfs recv -d backups

root@quadra ~$ zfs list -r backups
NAME                        USED  AVAIL  REFER  MOUNTPOINT
backups                     902K   217M    18K  /backups
backups/home                755K   199M    24K  /backups/home
backups/home@nov6              0      -    24K  -
backups/home/benr           379K   199M   355K  /backups/home/benr
backups/home/benr@001        24K      -    88K  -
backups/home/benr@002          0      -   355K  -
backups/home/benr@nov6         0      -   355K  -
backups/home/conradr         88K   199M    88K  /backups/home/conradr
backups/home/conradr@nov6      0      -    88K  -
backups/home/glennr          88K   199M    88K  /backups/home/glennr
backups/home/glennr@nov6       0      -    88K  -
backups/home/novar           88K   199M    88K  /backups/home/novar
backups/home/novar@nov6        0      -    88K  -
backups/home/tamr            88K   199M    88K  /backups/home/tamr
backups/home/tamr@nov6         0      -    88K  -

Simple, just snapshot the parent dataset with the -r flag, then send the parent dataset snapshot with the -R flag. Otherwise, its all the same! And, of course, you can combine this with all our other pipe tricks just the same!

And so we see that using a single set of commands, we have simplistic and powerful replication, backup, and archive capabilities. A lot of power unleashed with just a little imagination; thats the ZFS way.