A Caveat to Multi-Cluster VDS Design with NSX

One of the many design decisions surrounding an NSX implementation concerns the distributed switch (VDS) and how far it should stretch. Many folks create one large VDS and stretch it across clusters for simplicity’s sake, which is fine. Be aware, however, that VMs can be manually pushed around from one cluster to the next via vMotion or erroneous administrator placement. If your VM happens to be consuming a port on an NSX logical switch, the VM can only live on a host in which NSX has been installed and configured.

This is because the port groups that represent NSX logical switches – made obvious by the “virtualwire” name in the port groups – are attached to the VDS, which would be made available to all of your hosts that are joined to the VDS regardless of the installation of NSX software. Here’s a few examples of logical switches on a VDS:

Logical Switches
Logical Switches

I’ll admit that the risk here is small because DRS won’t automatically migrate a VM across clusters, but it may not be entirely obvious which clusters have NSX installed and which do not. Just be aware that normal vMotion checks do not look to see if the destination host has NSX installed. If the network and storage are available, the migration will be allowed. I’ve provided a sample illustration below:

VDS Across Mixed NSX Clusters
VDS Across Mixed NSX Clusters

Once a VM that relies upon NSX lands in a cluster without NSX installed, traffic will cease to flow. Additionally, the vNIC will be disconnected on the VM.

VM is disconnected
VM is disconnected

If you try to reconnect the vNIC, an error will pop up stating Invalid configuration for device ‘0’ even after you’ve migrated the VM onto a host with NSX installed. There’s a few ways to fix this, such as temporarily throwing the vNIC into a different port group and then moving it back. I’ll show you the cleaner method.

First, make sure the VM is back on a host that is attached to the NSX transport zone. Then, start by opening the vSphere Web Client and navigating to Networking & Security > Logical Switches. Select the logical switch that the VM should be using and click the Add Virtual Machine button.

Add a VM to the Logical Switch
Add a VM to the Logical Switch

Next, find the VM that needs to be fixed and check the box next to it.

Select the VM
Select the VM

Check the box next to the network adapter (vNIC) on that virtual machine.

Select the vNIC
Select the vNIC

Complete the wizard. The VM should now be properly reconnected to the logical switch and can be pinged once more.

Successful Pings
Successful Pings

I enjoy breaking things and then figuring out how to fix them. Should you do the same in your lab (or in production), you now have a solution. I don’t see this particular problem being all that common, but it would be nice if future versions of vSphere would also validate that the NSX bits were installed in the destination host before allowing a migration. A warning would be nice.