Zones start looking like Containers: CPU and Memory Caps
Posted on March 26, 2007
When Solaris Containers debuted with Solaris 10 many of us were blown away. At the time Zones became my primary passion because I was suffering from a massive shortage of test systems for everything from deployment testing to Enlightenment build and test install systems. I needed lots and lots of systems but they didn’t need lots of CPU or memory, they just needed isolated Solaris installations and Zones filled that hole perfectly. But “Containers”, that being a term which signifies the unity of Solaris Zones with Solaris Resource Controls, wasn’t quite what I was hoping for. In fact, from my point of view it was a complete joke. Solaris resource controls are applied to workloads, a process, task, or project (where that project is likely a user or group). With the introduction of Zones we finally had the kind of workgroup granularity that I really wanted, but I didn’t have the ability to slap the controls on.
Sun’s worked very hard since that time to make things right and its been a slow but constant effort thats seriously paying off now. With Nevada Build 56 we got Duckhorn, which provided Memory and Swap Capping integrated into the zone configuration. Now, with Nevada Build 61 we get its big brother, CPU Capping. We can pair those up with the rest of the rctl’s we can apply to a zone such as zone.max-lwps, zone.cpu-shares (FSS Shares), and shm limits. Finally we’ve got something seriously slick.
Configuring these limits is just smooth as silk. No files to muck with, no /etc/projects, no BS… just create or modify your zone and go:
root@aeon ~$ zonecfg -z testing1 testing1: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:testing1> create zonecfg:testing1> set zonepath=/zones/testing1 zonecfg:testing1> set autoboot=true zonecfg:testing1> set scheduling-class=FSS zonecfg:testing1> add capped-cpu zonecfg:testing1:capped-cpu> set ncpus=1.0 zonecfg:testing1:capped-cpu> end zonecfg:testing1> add capped-memory zonecfg:testing1:capped-memory> set physical=512m zonecfg:testing1:capped-memory> set swap=512m zonecfg:testing1:capped-memory> end zonecfg:testing1> add rctl zonecfg:testing1:rctl> set name=zone.max-lwps zonecfg:testing1:rctl> add value (priv=privileged,limit=500,action=deny) zonecfg:testing1:rctl> end zonecfg:testing1> add rctl zonecfg:testing1:rctl> set name=zone.cpu-shares zonecfg:testing1:rctl> add value (priv=privileged,limit=10,action=none) zonecfg:testing1:rctl> end
Once your zone is configured do the usual dance to install and boot. Then throw down the hurt and enjoy. Notice above that I set CPU Capping to ncpus=1.0. I’m testing on my home workstation which is a Dual Core Athlon64. Setting ncpus=1.0 is saying that I’m allocating 1 full CPU to the zone. If you had a 4 CPU system and you wanted to give 2.5 of those CPU’s to a zone you’d set ncpus=2.5. So you can get really creative in how you carve up resource. And, whats really slick, is that that CPU is given across all the procs in a given PSET, which by default is all your processors. So on my 2 cores a share of 1.0 still can use both cores for threaded applications which is nice.
Here is an example of some pain:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 115910 benr 5936K 4040K cpu0 59 0 0:01:13 0.7% prstat/1 101179 benr 162M 107M sleep 49 0 0:03:55 0.5% firefox-bin/9 100474 benr 210M 164M sleep 59 0 0:02:58 0.4% Xorg/1 116150 root 3304K 1184K wait 28 0 0:00:05 0.3% cpuhog.pl/1 109850 daemon 7644K 2756K sleep 59 0 0:00:02 0.2% rcapd/1 116211 root 3304K 1184K wait 1 0 0:00:05 0.2% cpuhog.pl/1 116138 root 3304K 1184K wait 2 0 0:00:05 0.2% cpuhog.pl/1 116335 root 3304K 1184K wait 12 0 0:00:05 0.2% cpuhog.pl/1 116028 root 3304K 1184K wait 2 0 0:00:05 0.2% cpuhog.pl/1 ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE 10 410 217M 134M 6.6% 0:31:53 46% testing1 0 125 638M 589M 29% 0:10:42 2.2% global Total: 535 processes, 738 lwps, load averages: 378.18, 377.42, 343.45
The load average is high because I’ve got 400 of those cpuhog scripts going which run and then get back in the run queue to get back on CPU. Even with a load that high I’m not feeling the effects at all on the globalzone, where, coincidently, I’m currently typing this blog entry (see firefox-bin, thats me writing this entry). Now check out the mpstat for a closer look:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 17 384 275 132 32 34 3 0 135 51 1 0 48 1 0 0 11 190 63 203 54 35 2 0 83 48 0 0 52 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 20 394 291 152 46 23 1 0 81 56 1 0 43 1 0 0 2 185 67 186 46 28 5 0 116 45 1 0 54 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 15 406 290 146 50 34 0 0 730 48 15 0 38 1 0 0 11 197 64 234 68 31 9 0 628 60 1 0 39
Of course, we can alter the allocation of CPU on the fly without a zone reboot:
root@aeon ~$ prctl -n zone.cpu-cap -i zone testing1 zone: 10: testing1 NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT zone.cpu-cap privileged 100 - deny - system 4.29G inf deny - root@aeon ~$ prctl -r -t privileged -n zone.cpu-cap -v 150 -i zone testing1 root@aeon ~$ prctl -n zone.cpu-cap -i zone testing1 zone: 10: testing1 NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT zone.cpu-cap privileged 150 - deny - system 4.29G inf deny -
Look at the zone CPU usage now….
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 115910 benr 5984K 4092K cpu1 59 0 0:01:57 0.8% prstat/1 101179 benr 162M 107M sleep 49 0 0:04:49 0.7% firefox-bin/9 116067 root 3304K 1184K wait 16 0 0:00:10 0.5% cpuhog.pl/1 116323 root 3304K 1184K wait 37 0 0:00:09 0.4% cpuhog.pl/1 116073 root 3304K 1184K wait 1 0 0:00:10 0.4% cpuhog.pl/1 100474 benr 217M 165M sleep 59 0 0:03:20 0.4% Xorg/1 116084 root 3304K 1184K wait 43 0 0:00:10 0.4% cpuhog.pl/1 116118 root 3304K 1184K wait 35 0 0:00:10 0.3% cpuhog.pl/1 116050 root 3304K 1184K wait 56 0 0:00:10 0.3% cpuhog.pl/1 ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE 10 410 217M 136M 6.6% 1:01:26 68% testing1 0 125 600M 553M 27% 0:12:44 2.1% global Total: 535 processes, 736 lwps, load averages: 380.23, 379.94, 372.62
Solaris Containers are really looking like what they promised to be, a fine pairing of Solaris Resource Control and Solaris Zones. Its awesome to watch that little zone thrash its brains out while I’m watching movies on the same system.
While CPU and Memory are top of the mind, network inevitably become a concern. With Nevada 61 we can allocate full real NIC’s to a zone using IP Instances (PDF), part of Project Crossbow. VNIC’s should be upon us soon, which will close the loop and put zones in the place they should be.
A big round of applause for Alexander Kolbasov, Erik Nordmark, Yukun Zhang, Dong-Hai Han, Stephen Lawrence, Andrei Dorofeev, Jerry Jelinek, and everyone else involved with these great advancements in Solaris.
PS: If you hadn’t noticed, Solaris Resource Control starting to look very sexxy indeed.