ZFS and Thin Provisioning

Posted on August 12, 2006

ZFS is an amazing and wonderful technology. I say technology because its more than just any one of its capabilities. Being able to dish out, from a single pool, both filesystems and traditional volumes (which I’ll call zvol’s) makes for an extremely power storage foundation on which to build monumental structures without the traditional complexity that comes from such beloved products as my old friend Veritas Volume Manager (VxVM). In a world in which storage design and management only seemed to get more and more complex, a calm and peaceful breeze has come over the landscape and refreshed all of us baking under the heatlamp of rusty and incapable software. Yes, ZFS makes me happy, very very happy indeed.

When ZFS appeared on the scene a while back there was such an outpouring of blogging about it, particularly from its creators themselves, that I felt unable to really add anything to the conversation and instead picked some small topics to help fill in the discussion, such a Look ma’ no disks! ZFS Testing On A Budget entry in which I highlight a storage-admins dream feature, ZFS architecting without having to own a Thumper or large disk sub-system when you don’t need capacity, you just need a big disk count. If you feel like your little single-disk workstation isn’t sufficient to really enjoy and play with ZFS’s capabilities, please read that entry!!! Don’t get left out of the fun.

Okey, intro out of the way, lets get on with it… look at my “little” home dev workstation:

root@aeon ~$ df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c3d0s0         16G    15G   632M    97%    /
/devices                 0K     0K     0K     0%    /devices
...
/dev/zvol/dsk/zonepool/thinvol2
                       4.0T   1.2G   4.0T     1%    /b

Thats right baby, 4 Terabytes, count ’em. Well, sorta. But just to further drive the point home that you are in fact looking at a 4TB UFS filesystem, check out the creation:

root@aeon /$ newfs /dev/zvol/rdsk/zonepool/thinvol2
newfs: construct a new file system /dev/zvol/rdsk/zonepool/thinvol2: (y/n)? y
Warning: 4096 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/zonepool/thinvol2:       8589934592 sectors in 1398102 cylinders of 48 tracks, 128 sectors
        4194304.0MB in 9777 cyl groups (143 c/g, 429.00MB/g, 448 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 878752, 1757472, 2636192, 3514912, 4393632, 5272352, 6151072, 7029792,
 7908512,
Initializing cylinder groups:
...............................................................................
...............................................................................
.....................................
super-block backups for last 10 cylinder groups at:
 8581213088, 8582091808, 8582970528, 8583849248, 8584727968, 8585606688,
 8586485408, 8587364128, 8588242848, 8589121568

So, then, since I said that this is on a “little” workstation, how did I do it? The topic gives it way, the answer is thin provisioning, or more properly as its called by ZFS, sparse volumes.

If its not yet obvious, what is happening here is that I’m allowing ZFS to “fake out” the size fo the volume that I created. The real disk available is a rather wimpy 53GB. So whats the point of faking out a volume? Here’s the problem with traditional “scalable” storage: you buy 1TB of disk, you create a 1TB filesystem on it, and your happy for a time. One day you hit 90% on the filesystem and like a good storage consumer you buy an additional expansion array, bolt it on and increase the size of the volume up to 2TB. At this point you need to increase the size of your filesystem to match the 2TB available…. growing a filesystem in most cases is a real painful and scary process. At best your going to write-lock the filesystem during the growing operation, at worst you need to move out your data and then restore it back in after a fresh newfs. Almost all modern filesystems allow you to grow them and most allow it online so the backup and restore upgrade isn’t often an issue anymore, but its still a sucky process.

Filesystem suckiness aside, how often are you actually using all the blocks you allocate? And how often are you buying more disk than you need just so that you don’t have to allocate more in the nearish future? Constraints constraints!

And so for these, and yet even more, reasons, sparse volumes, better known as “thin provisioning”, allows us to set advertised volume size to almost anything we want and to use it just as if it actually were that large. That means that in the future when we get close to the capacity of the real disk we can simply add another disk to the pool to meet the demand and, this is the beauty part, your done! Because you’ve already sized the end filesystem just adding disk to the pool is all thats requried, meaning no down time, no confusion, no fear, and best of all, your users and customers think that you invested in a, say, 10TB storage-subsystem when in fact you’ve only got a 500GB drive connected via USB 2.0. 🙂 w00t!

Okey, so how do you do it. Simple, you create a zvol with the -s, for “sparse”, and set the volume size to whatever you want. Observe:

root@aeon ~$ zpool status
...
  pool: zonepool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zonepool    ONLINE       0     0     0
          c2d0s1    ONLINE       0     0     0

errors: No known data errors

root@aeon /$ zfs create -V 10g zonepool/zvol1
root@aeon /$ zfs create -V 10g zonepool/zvol2

So here is my little test pool, an 88.4GB partition on a SATA-II drive. I’ve created two “normal” volumes above. Now lets add some sparse (“thin”) volumes:

root@aeon /$ zfs create -s -V 100g zonepool/thinvol1
root@aeon /$ zfs create -s -V 100g zonepool/thinvol2
root@aeon ~$ zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
...
zonepool/thinvol1     22.5K  57.1G  22.5K  -           -- Un-reserved (unallocated) 100G
zonepool/thinvol2     22.5K  57.1G  22.5K  -                        ""
zonepool/zvol1        22.5K  67.1G  22.5K  -           -- Reserved (allocated) 10G  
zonepool/zvol2        22.5K  67.1G  22.5K  -                          ""

There are two important values to consider, both of which can be viewed with zfs get var pool/fs: volsize and reservation. Notice the diffrence between zvol1 which is a reserved 10G and thinvol1 which is an un-reserved 100GB, which are, mind you, in the same 88GB pool:

root@aeon ~$ zfs get reservation,volsize zonepool/thinvol1 zonepool/zvol1
NAME             PROPERTY       VALUE                      SOURCE
zonepool/thinvol1  reservation    none                       default          
zonepool/thinvol1  volsize        100G                       -                
zonepool/zvol1   reservation    10G                        local            
zonepool/zvol1   volsize        10G                        -    

Because the reservation is disabled (“none”) we can define volsize as large as we want, whenever we want, in fact, right on up and beyond my not so wimpy 4TB.

While this might seem like a simple feature at first, not worthy of all the reading you’ve done to arrive at this very point, trust me, when you pair this fine wine with a good cheese, say iSCSI, you can do some very interesting and exciting things and make your life a whole lot less stressful, all without having to shell out $50,000+.