First Look at ZFS Deduplication

Posted on November 10, 2009

ZFS Deduplication was recently putback (Sun terminology for “commit”) to ON (Solaris’s primary codebase). That means it should go out at snv_128 (Build 128) due later this week.

Unable to wait for the BFU archives I resorted to actually building the code myself to play; something I’ve not felt the burning need to do for at least 2 years (I’ll blog about that shortly). Here’s the initial review…

In typical fashion putting ZFS Dedup to work is a trivial task. Zpools are created in the normal way, the dedup feature is enabled on a per-dataset basis and therefore is a simple matter of turning it on:

root@quadra ~$ zpool create stick c4t0d0
root@quadra ~$ zpool get all stick
stick  size           3.75G       -
stick  capacity       0%          -
stick  altroot        -           default
stick  health         ONLINE      -
stick  guid           12142487970365036186  default
stick  version        21          default
stick  bootfs         -           default
stick  delegation     on          default
stick  autoreplace    off         default
stick  cachefile      -           default
stick  failmode       wait        default
stick  listsnapshots  off         default
stick  autoexpand     off         default
stick  dedupratio     1.00x       -
stick  free           3.75G       -
stick  allocated      76.5K       -

Notice that there is no option to enable dedup for the pool, however there is a read-only “dedupratio” key. Because ZFS properties are inherited by child datasets we’ll enable dedup on the root dataset, in this case “stick”:

root@quadra ~$ zfs set dedup=on stick

Done! That’s it. Really, you’re done! Stop reading this now. 🙂

… ok, maybe I’ll go into it a bit more.

As with many ZFS Dataset Properties, there can be more than one setting. The default value of the “dedup” properties is “off”. It can also be set to “on”, “sha256”, “verify”, or “fletcher4,verify”. “on” is simply a pseudonym for “sha256”. “verify” is a pseudonym for “sha256,verify” and enables an ability to detect and correct hash collisions, however this is very system intensive and is not recommended for casual use, if you require absolute integrity at all costs, go for it, but test your workload first. Phrases like “hash collision” can cause a panic, but remember that the odds are astronomical. For details on this see Jeff Bonwick’s post on ZFS Dedup.

So, now for some testing. I’ve created my “stick” pool on a new 4GB micro-USB stick and enabled dedup. Lets copy in a bunch of JPEG’s to several directories and see what happens:

root@quadra ~$ zfs list stick
stick  73.5K  3.69G    21K  /stick
root@quadra ~$ mkdir /stick/userA
root@quadra ~$ mkdir /stick/userB
root@quadra ~$ mkdir /stick/userC
root@quadra ~$ cd img
root@quadra img$ time cp * /stick/userA
real    0m15.395s
user    0m0.005s
sys     0m0.174s
root@quadra img$ time cp * /stick/userB
real    0m15.952s
user    0m0.004s
sys     0m0.112s
root@quadra img$ time cp * /stick/userC
real    0m2.347s
user    0m0.004s
sys     0m0.125s

root@quadra img$ zfs list stick
stick   203M  3.62G   203M  /stick

root@quadra img$ cd /stick/userA/
root@quadra userA$ du -sh .
74M     .

OK, notice that I’m copying in 74MB of data, 3 times, each to a different directory. (Its slow because its a crappy USB stick.) If we run du it registers the proper size, if we look at zfs list it shows the full size of 203MB. In fact, if I look at the dataset properties I have no indication at all of its on-disk size:

root@quadra userA$ zfs get all stick
NAME   PROPERTY              VALUE                  SOURCE
stick  type                  filesystem             -
stick  creation              Tue Nov 10  0:07 2009  -
stick  used                  220M                   -
stick  available             3.62G                  -
stick  referenced            220M                   -
stick  compressratio         1.00x                  -
stick  mounted               yes                    -
stick  quota                 none                   default
stick  reservation           none                   default

So here’s the magic… look at the pool size:

root@quadra ~$ zpool list stick
stick  3.75G  72.5M  3.68G     1%  3.06x  ONLINE  -

Beautiful. 72.5MB allocated and we correctly see the dedup ratio of 3 (less than the file sizes, leading me to believe there are some duplicate images, which I don’t doubt).

Yet again, ZFS makes it “just work”. And you don’t need a big huge expensive peice of gear, I’m deduping on this:

Suck it Data Domain. 🙂

For the elite ZFS Internals hackers out there, you can get a closer look at dedup using zdb -S (thanks to Jeff Victor for the tip):

root@quadra ~$ zdb -S stick
Simulated DDT histogram:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2      623   69.9M   69.9M   69.9M    1.83K    210M    210M    210M
     4       14   1.63M   1.63M   1.63M       84    9.8M    9.8M    9.8M
 Total      637   71.5M   71.5M   71.5M    1.91K    219M    219M    219M

dedup = 3.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 3.07

So now its time to really beat on this thing and see if and where it breaks. Dedup for the masses is coming in the mail!!!