One of the many design decisions surrounding an NSX implementation concerns the distributed switch (VDS) and how far it should stretch. Many folks create one large VDS and stretch it across clusters for simplicity’s sake, which is fine. Be aware, however, that VMs can be manually pushed around from one cluster to the next via vMotion or erroneous administrator placement. If your VM happens to be consuming a port on an NSX logical switch, the VM can only live on a host in which NSX has been installed and configured.
This is because the port groups that represent NSX logical switches – made obvious by the “virtualwire” name in the port groups – are attached to the VDS, which would be made available to all of your hosts that are joined to the VDS regardless of the installation of NSX software. Here’s a few examples of logical switches on a VDS:
I’ll admit that the risk here is small because DRS won’t automatically migrate a VM across clusters, but it may not be entirely obvious which clusters have NSX installed and which do not. Just be aware that normal vMotion checks do not look to see if the destination host has NSX installed. If the network and storage are available, the migration will be allowed. I’ve provided a sample illustration below:
Once a VM that relies upon NSX lands in a cluster without NSX installed, traffic will cease to flow. Additionally, the vNIC will be disconnected on the VM.
If you try to reconnect the vNIC, an error will pop up stating Invalid configuration for device ‘0’ even after you’ve migrated the VM onto a host with NSX installed. There’s a few ways to fix this, such as temporarily throwing the vNIC into a different port group and then moving it back. I’ll show you the cleaner method.
First, make sure the VM is back on a host that is attached to the NSX transport zone. Then, start by opening the vSphere Web Client and navigating to Networking & Security > Logical Switches. Select the logical switch that the VM should be using and click the Add Virtual Machine button.
Next, find the VM that needs to be fixed and check the box next to it.
Check the box next to the network adapter (vNIC) on that virtual machine.
Complete the wizard. The VM should now be properly reconnected to the logical switch and can be pinged once more.
I enjoy breaking things and then figuring out how to fix them. Should you do the same in your lab (or in production), you now have a solution. I don’t see this particular problem being all that common, but it would be nice if future versions of vSphere would also validate that the NSX bits were installed in the destination host before allowing a migration. A warning would be nice.