ZFS and Thin Provisioning
Posted on August 12, 2006
ZFS is an amazing and wonderful technology. I say technology because its more than just any one of its capabilities. Being able to dish out, from a single pool, both filesystems and traditional volumes (which I’ll call zvol’s) makes for an extremely power storage foundation on which to build monumental structures without the traditional complexity that comes from such beloved products as my old friend Veritas Volume Manager (VxVM). In a world in which storage design and management only seemed to get more and more complex, a calm and peaceful breeze has come over the landscape and refreshed all of us baking under the heatlamp of rusty and incapable software. Yes, ZFS makes me happy, very very happy indeed.
When ZFS appeared on the scene a while back there was such an outpouring of blogging about it, particularly from its creators themselves, that I felt unable to really add anything to the conversation and instead picked some small topics to help fill in the discussion, such a Look ma’ no disks! ZFS Testing On A Budget entry in which I highlight a storage-admins dream feature, ZFS architecting without having to own a Thumper or large disk sub-system when you don’t need capacity, you just need a big disk count. If you feel like your little single-disk workstation isn’t sufficient to really enjoy and play with ZFS’s capabilities, please read that entry!!! Don’t get left out of the fun.
Okey, intro out of the way, lets get on with it… look at my “little” home dev workstation:
root@aeon ~$ df -h Filesystem size used avail capacity Mounted on /dev/dsk/c3d0s0 16G 15G 632M 97% / /devices 0K 0K 0K 0% /devices ... /dev/zvol/dsk/zonepool/thinvol2 4.0T 1.2G 4.0T 1% /b
Thats right baby, 4 Terabytes, count ’em. Well, sorta. But just to further drive the point home that you are in fact looking at a 4TB UFS filesystem, check out the creation:
root@aeon /$ newfs /dev/zvol/rdsk/zonepool/thinvol2 newfs: construct a new file system /dev/zvol/rdsk/zonepool/thinvol2: (y/n)? y Warning: 4096 sector(s) in last cylinder unallocated /dev/zvol/rdsk/zonepool/thinvol2: 8589934592 sectors in 1398102 cylinders of 48 tracks, 128 sectors 4194304.0MB in 9777 cyl groups (143 c/g, 429.00MB/g, 448 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 878752, 1757472, 2636192, 3514912, 4393632, 5272352, 6151072, 7029792, 7908512, Initializing cylinder groups: ............................................................................... ............................................................................... ..................................... super-block backups for last 10 cylinder groups at: 8581213088, 8582091808, 8582970528, 8583849248, 8584727968, 8585606688, 8586485408, 8587364128, 8588242848, 8589121568
So, then, since I said that this is on a “little” workstation, how did I do it? The topic gives it way, the answer is thin provisioning, or more properly as its called by ZFS, sparse volumes.
If its not yet obvious, what is happening here is that I’m allowing ZFS to “fake out” the size fo the volume that I created. The real disk available is a rather wimpy 53GB. So whats the point of faking out a volume? Here’s the problem with traditional “scalable” storage: you buy 1TB of disk, you create a 1TB filesystem on it, and your happy for a time. One day you hit 90% on the filesystem and like a good storage consumer you buy an additional expansion array, bolt it on and increase the size of the volume up to 2TB. At this point you need to increase the size of your filesystem to match the 2TB available…. growing a filesystem in most cases is a real painful and scary process. At best your going to write-lock the filesystem during the growing operation, at worst you need to move out your data and then restore it back in after a fresh newfs. Almost all modern filesystems allow you to grow them and most allow it online so the backup and restore upgrade isn’t often an issue anymore, but its still a sucky process.
Filesystem suckiness aside, how often are you actually using all the blocks you allocate? And how often are you buying more disk than you need just so that you don’t have to allocate more in the nearish future? Constraints constraints!
And so for these, and yet even more, reasons, sparse volumes, better known as “thin provisioning”, allows us to set advertised volume size to almost anything we want and to use it just as if it actually were that large. That means that in the future when we get close to the capacity of the real disk we can simply add another disk to the pool to meet the demand and, this is the beauty part, your done! Because you’ve already sized the end filesystem just adding disk to the pool is all thats requried, meaning no down time, no confusion, no fear, and best of all, your users and customers think that you invested in a, say, 10TB storage-subsystem when in fact you’ve only got a 500GB drive connected via USB 2.0. 🙂 w00t!
Okey, so how do you do it. Simple, you create a zvol with the -s, for “sparse”, and set the volume size to whatever you want. Observe:
root@aeon ~$ zpool status ... pool: zonepool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zonepool ONLINE 0 0 0 c2d0s1 ONLINE 0 0 0 errors: No known data errors root@aeon /$ zfs create -V 10g zonepool/zvol1 root@aeon /$ zfs create -V 10g zonepool/zvol2
So here is my little test pool, an 88.4GB partition on a SATA-II drive. I’ve created two “normal” volumes above. Now lets add some sparse (“thin”) volumes:
root@aeon /$ zfs create -s -V 100g zonepool/thinvol1 root@aeon /$ zfs create -s -V 100g zonepool/thinvol2 root@aeon ~$ zfs list NAME USED AVAIL REFER MOUNTPOINT ... zonepool/thinvol1 22.5K 57.1G 22.5K - -- Un-reserved (unallocated) 100G zonepool/thinvol2 22.5K 57.1G 22.5K - "" zonepool/zvol1 22.5K 67.1G 22.5K - -- Reserved (allocated) 10G zonepool/zvol2 22.5K 67.1G 22.5K - ""
There are two important values to consider, both of which can be viewed with zfs get var pool/fs: volsize and reservation. Notice the diffrence between zvol1 which is a reserved 10G and thinvol1 which is an un-reserved 100GB, which are, mind you, in the same 88GB pool:
root@aeon ~$ zfs get reservation,volsize zonepool/thinvol1 zonepool/zvol1 NAME PROPERTY VALUE SOURCE zonepool/thinvol1 reservation none default zonepool/thinvol1 volsize 100G - zonepool/zvol1 reservation 10G local zonepool/zvol1 volsize 10G -
Because the reservation is disabled (“none”) we can define volsize as large as we want, whenever we want, in fact, right on up and beyond my not so wimpy 4TB.
While this might seem like a simple feature at first, not worthy of all the reading you’ve done to arrive at this very point, trust me, when you pair this fine wine with a good cheese, say iSCSI, you can do some very interesting and exciting things and make your life a whole lot less stressful, all without having to shell out $50,000+.