First Look at ZFS Deduplication
Posted on November 10, 2009
ZFS Deduplication was recently putback (Sun terminology for “commit”) to ON (Solaris’s primary codebase). That means it should go out at snv_128 (Build 128) due later this week.
Unable to wait for the BFU archives I resorted to actually building the code myself to play; something I’ve not felt the burning need to do for at least 2 years (I’ll blog about that shortly). Here’s the initial review…
In typical fashion putting ZFS Dedup to work is a trivial task. Zpools are created in the normal way, the dedup feature is enabled on a per-dataset basis and therefore is a simple matter of turning it on:
root@quadra ~$ zpool create stick c4t0d0 root@quadra ~$ zpool get all stick NAME PROPERTY VALUE SOURCE stick size 3.75G - stick capacity 0% - stick altroot - default stick health ONLINE - stick guid 12142487970365036186 default stick version 21 default stick bootfs - default stick delegation on default stick autoreplace off default stick cachefile - default stick failmode wait default stick listsnapshots off default stick autoexpand off default stick dedupratio 1.00x - stick free 3.75G - stick allocated 76.5K -
Notice that there is no option to enable dedup for the pool, however there is a read-only “dedupratio” key. Because ZFS properties are inherited by child datasets we’ll enable dedup on the root dataset, in this case “stick”:
root@quadra ~$ zfs set dedup=on stick
Done! That’s it. Really, you’re done! Stop reading this now. 🙂
… ok, maybe I’ll go into it a bit more.
As with many ZFS Dataset Properties, there can be more than one setting. The default value of the “dedup” properties is “off”. It can also be set to “on”, “sha256”, “verify”, or “fletcher4,verify”. “on” is simply a pseudonym for “sha256”. “verify” is a pseudonym for “sha256,verify” and enables an ability to detect and correct hash collisions, however this is very system intensive and is not recommended for casual use, if you require absolute integrity at all costs, go for it, but test your workload first. Phrases like “hash collision” can cause a panic, but remember that the odds are astronomical. For details on this see Jeff Bonwick’s post on ZFS Dedup.
So, now for some testing. I’ve created my “stick” pool on a new 4GB micro-USB stick and enabled dedup. Lets copy in a bunch of JPEG’s to several directories and see what happens:
root@quadra ~$ zfs list stick NAME USED AVAIL REFER MOUNTPOINT stick 73.5K 3.69G 21K /stick root@quadra ~$ mkdir /stick/userA root@quadra ~$ mkdir /stick/userB root@quadra ~$ mkdir /stick/userC root@quadra ~$ cd img root@quadra img$ time cp * /stick/userA real 0m15.395s user 0m0.005s sys 0m0.174s root@quadra img$ time cp * /stick/userB real 0m15.952s user 0m0.004s sys 0m0.112s root@quadra img$ time cp * /stick/userC real 0m2.347s user 0m0.004s sys 0m0.125s root@quadra img$ zfs list stick NAME USED AVAIL REFER MOUNTPOINT stick 203M 3.62G 203M /stick root@quadra img$ cd /stick/userA/ root@quadra userA$ du -sh . 74M .
OK, notice that I’m copying in 74MB of data, 3 times, each to a different directory. (Its slow because its a crappy USB stick.) If we run du it registers the proper size, if we look at zfs list it shows the full size of 203MB. In fact, if I look at the dataset properties I have no indication at all of its on-disk size:
root@quadra userA$ zfs get all stick NAME PROPERTY VALUE SOURCE stick type filesystem - stick creation Tue Nov 10 0:07 2009 - stick used 220M - stick available 3.62G - stick referenced 220M - stick compressratio 1.00x - stick mounted yes - stick quota none default stick reservation none default ...
So here’s the magic… look at the pool size:
root@quadra ~$ zpool list stick NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT stick 3.75G 72.5M 3.68G 1% 3.06x ONLINE -
Beautiful. 72.5MB allocated and we correctly see the dedup ratio of 3 (less than the file sizes, leading me to believe there are some duplicate images, which I don’t doubt).
Yet again, ZFS makes it “just work”. And you don’t need a big huge expensive peice of gear, I’m deduping on this:
Suck it Data Domain. 🙂
For the elite ZFS Internals hackers out there, you can get a closer look at dedup using zdb -S (thanks to Jeff Victor for the tip):
root@quadra ~$ zdb -S stick Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 2 623 69.9M 69.9M 69.9M 1.83K 210M 210M 210M 4 14 1.63M 1.63M 1.63M 84 9.8M 9.8M 9.8M Total 637 71.5M 71.5M 71.5M 1.91K 219M 219M 219M dedup = 3.07, compress = 1.00, copies = 1.00, dedup * compress / copies = 3.07
So now its time to really beat on this thing and see if and where it breaks. Dedup for the masses is coming in the mail!!!