Crossbow for Christmas

Posted on December 29, 2008

After 2 years of waiting, Project Crossbow has arrived! It integrated into Nevada Build 105 on Dec 4th, and BFU’s became available around the middle of the month. SX:CE isn’t available just yet, but should be up in about a week I hope. Crossbow is huge. This is a monumental improvement to Solaris and continues to push the bar out of reach of its competitors.

Simply put, Crossbow redefines the nature of network virtualization. To date, virtualization was limited to creating traditional “virtual interfaces” like so:

root@quadra ~$ ifconfig e1000g1:1 plumb 10.0.0.50 netmask 255.255.255.0 up
root@quadra ~$ ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g1: flags=201000843 mtu 1500 index 2
        inet 10.0.0.18 netmask ffffff00 broadcast 10.0.0.255
        ether 0:1b:21:25:3e:7b 
e1000g1:1: flags=201000843 mtu 1500 index 2
        inet 10.0.0.50 netmask ffffff00 broadcast 10.0.0.255

Creating virtual interfaces like this gets the job done but has a number of drawbacks, all based on the fact that its not a real interface. Stats are screwed up, you can’t snoop the interface, you can’t tune it, etc.

Crossbow changes all that. Now we can create Virtual NIC’s (vnic’s) which are, for all intents and purposes, real interfaces. They have their own network stack and queues, they can be tuned, the can be snooped, they can be VLAN’ed, etc. Anything you can do to a real interface you can do to a VNIC.

While VNICs are handy things to have in the globalzone, they really shine when used with virtualization such as Solaris Containers (zones) or Xen guests, because we now can hand off interfaces that are fully controllable from within the virtual environment without having to dedicate a physical NIC to each one. The result is virtualized environments that feel way more like real servers.

If you’re not already familiar with the dladm command its time for you to get acquainted. dladm is short for “Data Link Administration”, and now compliments ifconfig. For some time now its been used for managing WIFI, 802.11ad Link Aggregation (“teaming” or “trunking”, depending on your pedigree), and more recently VLANs. its even replacing the old (and crappy) ndd with dladm‘s “link properties”… a welcome improvement.

As of snv_105 several new options are available, namely sub-commands for creating VNICs and Etherstubs. A VNIC is a virtual network interface with all the trimmings of a real network interface. For the moment, it appears the max number of vnic’s is 799, but thats not set in stone, and frankly if you need more than that you need to re-architect. Etherstubs are in-software switches which can be used in concert with VNIC’s to create entirely virtualized in-software networks! In short, a standard VNIC will be associated with a physical GLDv3 network adapter, but we can also create a VNIC associated with an Etherstub to keep anything from ever touching the wire.

Lets ponder this. Why would you want a VNIC that uses a software switch (etherstub)? Seems completely useless right? Not entirely. On a traditional network you would create a DMZ with firewall and other goodies which routes to a private internal network… imagine that you can now do that all inside a single system!

Ok, so lets get cracking. Once you have snv_105 installed, we’ll create a VNIC associated with physical e1000g1, then an etherstub and 3 more VNICs that are internal using that etherstub:

root@quadra ~$ dladm show-link
LINK        CLASS    MTU    STATE    OVER
e1000g1     phys     1500   up       --
e1000g2     phys     1500   down     --
e1000g0     phys     1500   unknown  --

root@quadra ~$ dladm create-vnic -l e1000g1 vnic0
root@quadra ~$ dladm create-etherstub etherstub0
root@quadra ~$ dladm create-vnic -l etherstub0 vnic1
root@quadra ~$ dladm create-vnic -l etherstub0 vnic2
root@quadra ~$ dladm create-vnic -l etherstub0 vnic3
root@quadra ~$ dladm show-link
LINK        CLASS    MTU    STATE    OVER
e1000g1     phys     1500   up       --
e1000g2     phys     1500   down     --
e1000g0     phys     1500   unknown  --
vnic0       vnic     1500   up       e1000g1
etherstub0  etherstub 9000  unknown  --
vnic1       vnic     9000   up       etherstub0
vnic2       vnic     9000   up       etherstub0
vnic3       vnic     9000   up       etherstub0

So we have a variety of VNIC’s at our disposal. We now treat these like regular interfaces, using ifconfig to plumb them and assign IP’s:

root@quadra ~$ ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g1: flags=201000843 mtu 1500 index 2
        inet 10.0.0.18 netmask ffffff00 broadcast 10.0.0.255
        ether 0:1b:21:25:3e:7b 

root@quadra ~$ ifconfig vnic0 plumb 10.0.0.19 up
root@quadra ~$ ifconfig vnic1 plumb 10.100.0.2 netmask 255.255.255.0 up
root@quadra ~$ ifconfig vnic2 plumb 10.100.0.3 netmask 255.255.255.0 up
root@quadra ~$ ifconfig vnic3 plumb 10.100.0.4 netmask 255.255.255.0 up

root@quadra ~$ ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
e1000g1: flags=201000843 mtu 1500 index 2
        inet 10.0.0.18 netmask ffffff00 broadcast 10.0.0.255
        ether 0:1b:21:25:3e:7b 
vnic0: flags=201000843 mtu 1500 index 7
        inet 10.0.0.19 netmask ff000000 broadcast 10.255.255.255
        ether 2:8:20:3a:70:5a 
vnic1: flags=201000843 mtu 9000 index 8
        inet 10.100.0.2 netmask ffffff00 broadcast 10.100.0.255
        ether 2:8:20:f2:56:4d 
vnic2: flags=201000843 mtu 9000 index 9
        inet 10.100.0.3 netmask ffffff00 broadcast 10.100.0.255
        ether 2:8:20:bc:b1:a1 
vnic3: flags=201000843 mtu 9000 index 10
        inet 10.100.0.4 netmask ffffff00 broadcast 10.100.0.255
        ether 2:8:20:55:11:56

Please notice that they all have individual MAC addresses! There are severla methods for how the MAC is chosen, but I won’t go into them here.

If you are using Solaris Containers these VNIC’s would be given to a Zone as an “IP-Instance” (exclusive mode), a feature which was added some time ago but untill now only usable by dedicating a physical interface. The same should apply to Xen or other virtualization tools.

Finally, in our whirlwind tour of this amazing technology, lets look at my favorite feature of Crossbow.

Crossbow is both Network Virtualization (we looked at that above) and Network Resource Control. With Crossbow we have a real network resource control capability that is free from the terror that is IPQoS.

There are three types of resource controls at present: max bandwidth (rate limiting), priority (relative to other traffic), and cpu’s. Please note that these controls are not cumulative, but rather apply to any given point in time. These controls can be applied either to an entire link (NIC or VNIC) or alternatively to a particular network flow.

Let me pause here. If your not familiar with a “network flow”, it is a defined collection of network communication. For instance, a flow might refer to all HTTP (port 80) traffic to a given IP address, or perhaps all TCP traffic, or perhaps a combination of FTP, SMTP, and HTTP ports. If you’ve worked with firewall rules your familiar with the concept, a flow simply allows us a way to apply some action to a specific flow of traffic.

Crossbow adds the new command flowadm to define and control network flows. Here is an example:

root@quadra ~$ flowadm add-flow -l vnic0 -a transport=tcp,local_port=80 httpflow
root@quadra ~$ flowadm add-flow -l vnic0 -a transport=tcp,local_port=443 httpsflow
root@quadra ~$ flowadm show-flow
FLOW        LINK        IP ADDR                        PROTO  PORT    DSFLD
httpflow    vnic0       --                             tcp    80      --
httpsflow   vnic0       --                             tcp    443     --

flowadm relies on attributes that describe a flow, and properties which assign some resource control. We’ll add bandwith control to the flows above by modifying the “maxbw” property:

root@quadra ~$ flowadm show-flowprop
FLOW         PROPERTY        VALUE          DEFAULT        POSSIBLE
httpflow     maxbw              50          --             50M 
httpflow     priority        --             --             
httpsflow    maxbw              80          --             80M 
httpsflow    priority        --             --      

Here the maxbw is specified in Mbps. Docs show that percentages, Kbps, etc are supported, but they don’t seem to work right now.

maxbw will rate limit to the specified throughput, priority can be set “low”, “normal”, “high” or “rt” (real time). Using these controls carefully you can partition off bandwidth pretty nicely.

In addition to all this, extended accounting has been extended to incorporate accounting based on links or flows, but I’ll save that for another day.

Congrats to everyone on the Crossbow team. This is a major achievement and an amazing technological advance!