vSphere Does Not Need LAG Bandaids

Just over a year ago, I wrote a post outlining my distaste for the use of Link Aggregation Groups (LAG) for vSphere hosts when storage is concerned. For those not into network hipster slang, any time you bond links together into a logical group – be it static or dynamic – you’re forming a LAG. Examples include EtherChannel (static) or LACP (dynamic), but the end results are identical. LAGs really serve one higher master – they bandaid the fact that classical Ethernet can’t handle loops and will, if given the chance, rain misery down on your network if it finds a loop. Hence the deployment of the Spanning Tree Protocol (STP) suite for most any network.

This guy loves LAGs
This guy loves LAGs

People don’t like blocked links. For one, it’s expensive – ports don’t grow on trees, ya’know! – and it reduces the overall usable bandwidth / throughput available on a switch. So, we form a LAG to play peek-a-boo with STP, hiding multiple physical links behind a single logical link. Of course, the switch needs some sort of algorithm to determine when to use the links, which is why there are choices of hashing algorithms.

In the vSphere 5.5 world, this boils down to a few choices (read up on “enhanced” LACP support in 5.5) on when to use a particular link:

  • Layer 2 – MAC address (frame)
  • Layer 3 – IP address (packet)
  • Layer 4 – Port (segment)
  • 802.1Q – VLAN ID

Traffic is inspected for the value of one of the above fields, based on the hash being used, and an uplink is chosen within the LAG. This means that a single traffic session will only ever use a single uplink. Period. And, it’s why I think using a LAG for storage traffic is silly. Only NFS has any sort of semi-valid reason to use a LAG, and it’s a bit of an edge case.

The LAG hash determines which uplinks is chosen
The LAG hash determines which uplinks is chosen

vSphere Can’t Form a Loop

Physical switches can form loops. It’s easy to experiment with. Take any two switches, plug them into each other two or more times. Congratulations, you’ve formed a loop. If STP is enabled, it will block a link, and if not – the switch probably looks something like a Vegas casino.

A vSwitch doesn’t operate in this manner. You can cable a vSwitch to a physical switch using as many cables as you’d like, and it will work just fine without the ability to loop. Additionally, you can’t form a loop between two vSwitches. Rip off your LAG bandaid and embrace simplicity in your network designs.

[symple_box color=”yellow” text_align=”left” width=”100%” float=”none”]
Shameless plug: I go pretty deep into this topic in Networking for VMware Administrators, releasing soon.
[/symple_box]

In my mind, using a LAG means you’re missing out on one of the more awesome and less highlighted technologies in the distributed vSwitch tool belt: Load Based Teaming (LBT). Even on its best day, a LAG can only perform traffic distribution. It’s a bit like a mindless zombie choosing an uplink based on rolling the dice. Maybe that uplink it chose is already saturated with traffic while another one is not. Too bad! The algorithm has spoken. 🙂

LBT, on the other hand, actively examines traffic on a vSwitch and makes informed decisions about where to place virtual machines and VMkernel ports. Shown as “route based on physical NIC load” in the GUI, it’s an incredibly awesome way to provide load balancing.

Thoughts

This is most likely part 1 in a series I plan to do on LAGs and how they actually work in a wide variety of vSphere network topologies and traffic types. To date, I’ve yet to find a good reason for using a LAG on a vSphere host, aside from a handful of folks who state they have a good reason (please blog about it). Until the day comes where a vSwitch has an ability to loop, there’s no need to slap a bandaid on it.