I’ve already squashed the loop avoidance myth in the post entitled vSphere Does Not Need LAG Bandaids and point out some advantages with Load Based Teaming (LBT). However, I neglect to cover some of the positive impacts that a LAG can have for the rest of your upstream network (kudos to Ivan and Greg for their valuable perspectives). In this post I play Dr. Jekyll to my Mr. Hyde and review some reasons why you would want to create a LAG on a vSphere host.
Switching Choices and MAC Address Tables
Let’s kick this adventure off with a look at the network topology in a LAG-less design. As shown below, I have two vSphere hosts with 2 network adapters being used as uplinks within a vSwitch (it doesn’t really matter what kind of vSwitch). Each host has a single VM for simplicity’s sake and both are on the same subnet and VLAN. When VM1 sends traffic to VM2 there are two potential paths for the data to traverse:
- Red Line: In this scenario, the hypervisor has put the VMs on different uplinks
- VM1 is bound to vmnic0
- VM2 is bound to vmnic1
- The MAC address for VM2, labeled as MAC 2, is directly connected to a port on 5K-B
- The upstream switches must use their inter-switch link to forward data from 5K-A to 5K-B
- Gold Line: In this scenario, the hypervisor has put the VMs on the same uplink
- VM1 and VM2 are both bound to vmnic1 (or vice versa)
- The upstream switch can locally forward data
Obviously the Gold path is preferred, since it does not use the inter-switch link to forward data traffic. As a vSphere administrator, there are ways that you can make this behavior predictable, such as binding the port group to a specific side of the fabric by making vmnic0 Active and vmnic1 Standby (or this in reverse).
If you do decide to use LBT, however, all uplinks will need to be made Active and it’s possible to encounter the Red path as LBT shifts around VMs. You’ll also need to use a Distributed vSwitch.
Introducing a LAG
A few interesting things occur when you create a LAG, so let’s look at a scenario where we’ve created a Virtual Port Channel (vPC) to each vSphere host. Combining the uplinks into a LAG eliminates the idea of binding the VM to a specific uplink – any vmnic can now be used to send or receive traffic for any virtual machine on that vSwitch. Additionally, each upstream switch now has a local entry for the VM MAC addresses in their table. When VM1 sends traffic to VM2, the LAG hashing algorithm will determine which uplink to use, and the switch will always have a local path in which to forward data to VM2.
Both the Red and Gold paths are identical from a forwarding perspective. In essence, vmnic0 and vmnic1 are now one logical interface. Thus, while there are two unique control plane fabrics (5K-A and 5K-B), there is one single data plane fabric.
Layer 3 Routing
It’s worth noting that we’re talking about layer 2 switching in the topologies above. Any time traffic must hop across VLANs or into different subnets, some sort of device will need to route the data. It may be that the upstream switch is doing this (such as a 5K with the layer 3 daughter card) or another upstream device is handling the routing table (such as a 7K pair that is upstream from the 5K pair).
Additional Network Benefits
Forming a LAG has some other impactful benefits that may or may not be visible to the vSphere side of the house. And this can be a superb thing!
- Topology Change Notifications (TCN) – Using a LAG for the host facing ports reduces the amount of TCNs that will float around on the network. I also want to point out that you should be placing your host facing ports into a mode that does not participate in Spanning Tree Protocol (STP). Examples include:
- IOS: spanning-tree portfast
- NX-OS: spanning-tree port type edge
- Convergence – It could take whole seconds for a network to converge based on link state failure (such as link down or switch maintenance). A LAG can reconverge in under a second, perhaps even as fast as 200ms depending on the topology, which is exponentially faster. This allows for much greater flexibility when performing switch maintenance or upgrades because the host will (hopefully) have no idea the network has shifted traffic around a dead link or switch. Keep in mind that we’re talking best case scenario here – a badly designed upstream topology can’t be wholly fixed with LACP.
- Unified MAC Address Table – Because the VM MAC addresses are no longer pinned to a single uplink, each upstream switch fabric has a unified view of the MAC address table. There is no need to update one another as a MAC address migrates from one uplink to another on the same host. This often occurs when LBT is enabled.
- Note that this excludes vMotion. When a VM moves from one host to another, the MAC address table will need to be updated due to the VM now “living” on a new LAG interface.
Now that you have the benefits and drawbacks of a LAG and LAG-less design, you should be rather well armed to make the decision in your environment. For those without a Distributed vSwitch, I think the choice is simple – avoid a LAG. I mainly say this because you’re without the ability to use LACP to form a dynamic LAG and static LAGs are prone to human error.
For those with a Distributed Switch, you may want to further examine your upstream switch topology. Here are several sample questions to further identify which design is best for you:
- Are you connecting to single or dual homed FEXs for your Top of Rack (ToR) switches?
- Do you often suffer from hot spots on your uplinks and want to let LBT smooth them out?
- Are you looking at taking advantage of iSCSI NIC binding or multi-NIC vMotion?
- Do you need to take advantage of port mirroring (SPAN) on your vSwitch?