NFS is a rather new experience for me in the VMware space, at least in regard to hosting virtual machine workloads. I have traditionally mounted NFS volumes to vSphere clusters for the purpose of storing and utilizing CD ISO images (operating systems, applications, updates, etc.). However, the times-they-are-a-changin’, and now NFS can be considered a powerful storage protocol with some pretty interesting advantages over VMFS. These include breaking the 2TB (-512 byte) limitation, being able to dynamically shrink (and grow) a volume, and support for some really neat storage side cloning and snapshot capabilities.
The glaring down side is that it is clunky when it comes to getting the session (connection) to use more than one NIC. For those new to handling IP Storage on VMware, the issue is that by default the NFS session will be established on the first vmkernel port matching the subnet of the storage array’s subnet on a team’s vSwitch, which in turn only utilizes a single physical NIC. The remaining NICs will go unused!
To demonstrate, I’ve created a graphic of what would happen if you created 4 vmkernel ports on a single vSwitch, and gave that a team of 4 physical NICs to use. The storage array is using the 10.0.0.X subnet, so VMware finds the first vmkernel that is also on this subnet, vmk0. It just so happens that vmk0 is bound to vmnic0, which is something you could see in ESXTOP. Thus, the session would ride on vmk0, which uses vmnic0, to get to the storage. The remaining 3 vmkernels would never be used, and the vmnics would not be used until failover (switch or link) occurred.
Have no fear, there are some methods to get sessions to establish on multiple NICs.
#1 – Using Network Isolation
One method is to force traffic out different physical NICs by creating multiple virtual interfaces on the storage array (called a vif – virtual interface) using different subnets, then creating multiple vmkernel ports that correspond to those subnets and mapping NFS volumes using different subnets. VMware will send out storage traffic using a like subnet vmkernel when possible, only using the default gateway when not possible. Each subnet should also be a unique VLAN to create broadcast isolation.
In the graphic below the storage has been logically divided into 4 vifs, 10.0.0.X through 10.0.3.X. 4 Volumes exist on the array and are accessible from any vif. On the VMware side, the 4 vmkernel ports are set to IPs on each of the corresponding storage vif subnets. When the NFS mounts are added to the cluster, make sure to use a different vif for each volume to force that volume to use a specific vmkernel port. For example, in the graphic I mounted Volume A using the IP 10.0.0.50, Volume B using IP 10.0.1.50, and so on. This is just to make sure that each volume is reached by different means.
There may also be some other secret sauce needed to get the vmkernel ports to choose different vmknics (it would do no good if 2 vmkernels tried to use the same vmnic), but for this demonstration I’m assuming that has been handled.
#2 – Using Etherchannel
Traditional Etherchannel is nothing new, but simply a link aggregation protocol used to bond links together into a single MAC that aggregates traffic. I typically see it in back end networks for trunking large data links. When used with VMware, an IP Hash algorithm “rolls the dice” to determine which physical NIC the session will use to get to the storage array. As my colleague terms it, this is “fake etherchannel” in that you can not exceed the bandwidth of a single link. It is simply a method of getting NFS sessions to use more than one link. There is no load balancing involved, so you could see a single vmnic get saturated if enough hash values tell sessions to use that NIC.
From what I’ve heard, LACP can solve this issue, but requires the Nexus 1000V switch, a Cisco guy, and Enterprise Plus licensing.
Here are some screenshots showing a live configuration of an environment set up in this fashion.
This first graphic shows the layout of a dvSwitch populated with 4 1GbE NICs. A single vmkernel port (vmk2) is configured with a private Class A IP and lives inside the NFS portgroup.
The next graphic shows the teaming and failover policy for the NFS portgroup. All four dvUplinks are active and load balancing is set to route based on IP hash.
The final graphic gives a glimpse at the ESXTOP view of the uplink traffic. In this particular screenshot, vmnic6 and vmnic7 are transmitting some data outbound.
#3 – Using Load-Based Teaming (LBT) … ?
Could Load Based Teaming, the new Load Balancing policy in vSphere 4.1, benefit NFS and alleviate the headache of all that switch configuration?
This was rattling around in my brain for a while until I finally gave it a shot. The short answer is “It can, but requires a lot of configuration and Enterprise Plus”. The long answer is … well, let’s take a look.
First off, let’s imagine the scenario. You have a server with just 2 NICs that you can dedicate to IP Storage. The storage array has 4 vifs on 4 subnets, and the host has 4 vmkernels that correspond to the subnet of those vifs. So far, nothing very different from scenario #1. Here’s the graphic:
At some point in this pretend scenario, a VM that is using Volume A suddenly goes hot with storage I/O, pushing thousands of transactions to the storage array and eating up more than 75% of the available throughput of vmnic0. LBT notices this and moves vmk1, which was sharing vmnic0, over to vmnic4, which has only light traffic because vmk2 and vmk3 are not really doing that much. Eureka, load balancing.
I’ll admit this is a bit of a niche scenario for a lot of people. I had hoped LBT would be able to shift traffic within the vmkernel level, so that it could shift VM storage traffic between vmkernels that were on the same subnet. Unfortunately, that VM specific storage traffic is not analyzed, only the aggregate traffic on the vmkernel port as it relates to the vmnic.
Additionally, you will probably need roughly a 2:1 over subscription of vifs to NICs to see any significant gain from LBT since it shines with very granular setups. The more active paths to the storage array, the finer the load balancing.
I’m quite pleased with the performance of NFS, although I do still enjoy the simplicity of MPIO with fiber channel storage. As technology heads to NFS v3 a lot of the pathing headache of NFS as a storage medium will evaporate, as it supports multiple session mappings and some other fun goodies. Until then, I will continue to use the Etherchannel approach.
Also, yes, the article image is a NetApp V3140.