If you’re looking to use VMware’s NSX along with Cisco’s UCS, there are a few physical changes that must be made in order to support the larger frame size used with VXLAN and STT (Stateless Transport Tunnel, which I mentioned about in this blog post). Specifically, VXLAN adds 50 bytes to the header, while STT adds 80 bytes. For further reading, I also enjoyed this detailed post on STT from Plexxi.
In an NSXmh world, the protocol used is STT, while NSXv relies on VXLAN. Thus, we need to allow at least 1580 bytes through to cover both scenarios, which is often rounded up to 1600 bytes in the documentation and allows for some wiggle room.
I’ve found that while the VMware NSX Network Virtualization Design Guide does a spectacular job at outlining what the UCS configuration should be, it glosses over a few changes and their impact to a Cisco UCS domain. Additionally, much of the official Cisco documentation tends to leave out the impact of making changes to various configs and policies. This post contains details on the three major changes required for UCS and their impact on workloads.
QoS System Class
In order for the Fabric Interconnects to pass along jumbo frames, meaning any frame with a payload beyond 1500 bytes, you’ll need to edit the appropriate priority within the QoS System Class section. In the example below, I’m passing all LAN traffic over the Best Effort priority. As such, I’ve edited the MTU value for the Best Effort priority to allow for the largest jumbo frames possible – 9216 bytes. For NSX transport traffic, you’ll need to input 1600 bytes or larger due to the VXLAN or STT overhead (around 50 or 80 additional bytes, respectively).
Adjusting settings in the QoS System Class has no impact on the running workloads unless you are lowering the MTU value, which may cause fragmentation / drops if your vNICs are set to a higher value. In the example above, I changed the MTU value of the Best Effort priority from “normal” (1500 bytes) to 9216 bytes.
The FIs will now allow jumbo frames to flow on the Best Effort priority, but no blades within the environment have been configured to use jumbo frames, yet. Let’s fix that.
In order to associate vNIC templates to the correct QoS priority, we’ll need a QoS Policy. This is sort of the glue between a vNIC and QoS priorities. Making a QoS Policy does not have any impact on the environment. Here is the one that I use, which is the same as demonstrated in the VMware NSX Network Virtualization Design Guide.
- Priority = the same priority you adjusted in the QoS System Class, above. In my example, it is the Best Effort priority.
- Burst(Bytes) = Use the same size as the link itself, which is 10 Gb.
- Rate(Kbps) = Use line-rate to avoid any constraints on bandwidth for this link.
- Host Control = By using Full, the UCS domain will respect any CoS priority markings received for prioritizing traffic. Otherwise, they would be discarded. I tend to use Full for allowing L2 markings to be inserted from within the VDS. Your choice, though. 😉
The policy does nothing until it is applied to a vNIC.
The final step is to edit a vNIC Template with the QoS Policy and adjust the MTU size. This presents an impact in the form of a blade reboot for any blade that uses this vNIC Template, assuming it is an Updating Template.
I strongly suggest making sure your service templates and service profiles are set to User Ack (user acknowledgement) via the associated maintenance policy. This will ensure that all changes are put into the Pending Activities bucket, instead of requiring a massive reboot to all the affected blades.
- Set the MTU value to 1600 bytes or larger. I’ve chosen 9000 bytes because I also plan to use this vNIC for IP Storage.
- Change the QoS Policy to the new NSX policy created previously.
Once you click OK, the system will analyze the impact and let you know what needs to be rebooted.
- Immediate maintenance policy = UCS is going ask if it is OK to reboot all of the affected blades at once – but you can still click Cancel if you don’t want this. Once you click OK, all of the affected blades are rebooted.
- UserAck maintenance policy = UCS will ask if it is OK to put the list of blades into the Pending Activities list. The blades can be rebooted one-by-one using vSphere’s host maintenance mode options.
I’m more of a fan of UserAck.
This is a very simple example. You could use multiple QoS priorities and specify a different priority for NSX VXLAN traffic, or have multiple vNIC Templates, or any combination of the two. I strive for simple configurations because they are most often plenty powerful enough for 99% of use cases.
My point in this article was to clear up what requires a reboot and has impacts to the system, and what can be easily changed without causing a fuss. With that said, it’s still best to submit a change control for these sorts of things and try to perform the changes during a maintenance window.