NFS on vSphere Part 4 – Technical Deep Dive on Load Based Teaming
In my past three posts, I go into some misconceptions on how NFS behaves on vSphere, along with a pair of deep dives on load balancing in both a single subnet and multiple subnet environment. If you’re just catching up on this series and are unfamiliar with how NFS works on vSphere, I recommend giving these articles a glance. The summation is that NFS requires multiple subnets in order to use multiple uplinks, with the exception of a situation where an EtherChannel is properly utilized on a single subnet. However, even with a static EtherChannel, multiple storage targets and planning for unique least significant bits are still required to actually utilize more than one uplink.
However, there is another option available to those with Enterprise Plus licensing: the “Route by physical NIC load” load balancing policy, otherwise known as load based teaming (LBT). This policy is only available on distributed switches (vDS) which cannot be created if you are not licensed for Enterprise Plus.
Load based teaming is a very powerful technology that monitors vmnics (uplinks) for saturation. When an vmnic reaches 75% utilization for 30 seconds, LBT tries to move workloads to other, non-saturated vmnics. It is my opinion that this was mostly created with the mindset of balancing VM traffic, but it also works well for vmknics carrying NFS traffic. In this post, I’ll go over how this process works, configuration, and a lab test.
Load Based Teaming
The concept of monitoring load and migrating traffic around is nothing new to the world of vSphere. VMware admins are constantly leveraging the ability to vMotion workloads around for maintenance and balance, along with tools such as the Distributed Resource Scheduler (DRS) to assist in automated workload distribution.
Some interesting things about load based teaming and how it works:
- The power of load based teaming exists outside of the portgroup construct. Meaning, you don’t need all of your VMs or vmkernels to exist in a single portgroup to take advantage of load based teaming.
- As long as “Route based on physical NIC load” is selected, any portgroup will proactively monitor the vmnic utilization in their team and shift workloads around, even if another portgroup is responsible for generating the load.
- Ultimately, vmnic utilization triggers moving workloads.
- Turning on LBT is non-invasive and does not impact the active workloads.
- Only active vmnics are considered for movement. Any standby or unused vmnics are not targeted as destinations.
- Saturated 100 MB links do not trigger LBT movement, and I tested this in the lab to confirm – though, is anyone seriously using 100 MB links on their vSphere host?
With that said, let’s cover the configuration of this lab environment to showcase the power of LBT.
This time around I’ve reconfigured the lab entirely. The NetApp simulators are incredibly sluggish to configure and test against, so I have switched over to Nexenta’s Community Edition on a virtual machine.
Below is the logical configuration. I’ve created a lab using a single NAS server (Nexenta CE) presenting 4 exports. All traffic is on VLAN 1, 2, 3, and 4 (which is 10.0.X.0/24 in my lab, where the VLAN number equals the third octet) to an ESXi host running 5.0 update 1 (build 623860). The host has 2 uplinks along with 4 vmkernels. In order to consistently create traffic, I have deployed 4 of the VMware IO analyzer appliances – one on each export. This allows me to quickly simulate VM traffic going to all of the exports at the same time.
Rather than using a virtual host, I have rebuilt the lab network to work on on my “production” hosts and switches. This makes it much easier to generate enough traffic to trigger a LBT movement and eliminates the massive amount of duplicate frames received (as seen with virtual hosts on a promiscuous portgroup).
Additionally, the storage has been presented by the Nexenta over NFS. Under the hood are a pair of SSD drives, giving me plenty of IO for this test and the ability to simply mount the same datastore repeatedly using different VLANs.
Lab Test – Triggering Load Based Teaming
Let’s first take a look at the environment and identify the relationships between vmkernels and vmnics (uplinks). vmk1 and vmk4 have been put on vmnic3, while vmk2 and vmk3 are using vmnic0. This was decided by the hypervisor, I had no input in the matter.
Also, note that vmk0 (my management vmkernel) is using vmnic3 and is in an entirely different portgroup. I enabled LBT for that portgroup as well, to prove that LBT doesn’t care about portgroups as a delimiting factor.
Let’s see if we can generate a lot of traffic on vmnic3 and get the other guys to use vmnic0. I’ll fire up the IO Analyzer that is sitting on VLAN1 (vmk1) and see if we can get LBT to shuffle things around. Below is a screenshot showing the results, along with a zoomed image of the ESXTOP data.
The IO Analyzer saturated all of vmnic3, so LBT moved all other vmkernels over to vmnic0, even the management vmk0 on an entirely different portgroup. As you might imagine, this is a very powerful method for load balancing.
For the sake of fun, I’ll generate another big spike of load on the 3 vmkernels sitting on vmnic0 and watch LBT balance them. Below you can see vmk2, vmk3, and vmk4 kick off a large read spike that saturates all of vmnic0.
After trending the traffic for 30 seconds, LBT kicks in and migrates vmk2 and vmk3 to vmnic3. It’s somewhat difficult to balance 3 workloads that are going full speed on 2 uplinks, but LBT does a good job at trying.
It seems that load based teaming is a great way to address dynamic shifts in workload, and is relatively easy to set up. If you’re using Enterprise Plus licensing and are comfortable with distributed switches, this is probably the best way to go. Keep in mind, however, that you will need to oversubscribe your vmnics (uplinks) with a higher ratio of vmkernels. Otherwise, LBT will have nothing to balance. For example, if you had 2 vmkernels for 2 vmnics, each vmkernel has a dedicated uplink – there’s nothing it can move around.
I hope you’ve gained some valuable insight into the world of NFS on vSphere through my deep dive series, and no longer feel that the protocol is only suitable for ISO storage.
Also, if you want the official VMware white paper on “vSphere on NFS” by Cormac Hogan, it was released in February of 2013.
NFS on vSphere – Deep Dive Series
The entire series of NFS on vSphere deep dives: