One of the many design decisions surrounding an NSX implementation concerns the distributed switch (VDS) and how far it should stretch. Many folks create one large VDS and stretch it across clusters for simplicity’s sake, which is fine. Be aware, however, that VMs can be manually pushed around from one cluster to the next via vMotion or erroneous administrator placement. If your VM happens to be consuming a port on an NSX logical switch, the VM can only live on a host in which NSX has been installed and configured.
This is because the port groups that represent NSX logical switches – made obvious by the “virtualwire” name in the port groups – are attached to the VDS, which would be made available to all of your hosts that are joined to the VDS regardless of the installation of NSX software. Here’s a few examples of logical switches on a VDS:

I’ll admit that the risk here is small because DRS won’t automatically migrate a VM across clusters, but it may not be entirely obvious which clusters have NSX installed and which do not. Just be aware that normal vMotion checks do not look to see if the destination host has NSX installed. If the network and storage are available, the migration will be allowed. I’ve provided a sample illustration below:

Once a VM that relies upon NSX lands in a cluster without NSX installed, traffic will cease to flow. Additionally, the vNIC will be disconnected on the VM.

If you try to reconnect the vNIC, an error will pop up stating Invalid configuration for device ‘0’ even after you’ve migrated the VM onto a host with NSX installed. There’s a few ways to fix this, such as temporarily throwing the vNIC into a different port group and then moving it back. I’ll show you the cleaner method.
First, make sure the VM is back on a host that is attached to the NSX transport zone. Then, start by opening the vSphere Web Client and navigating to Networking & Security > Logical Switches. Select the logical switch that the VM should be using and click the Add Virtual Machine button.

Next, find the VM that needs to be fixed and check the box next to it.

Check the box next to the network adapter (vNIC) on that virtual machine.

Complete the wizard. The VM should now be properly reconnected to the logical switch and can be pinged once more.

I enjoy breaking things and then figuring out how to fix them. Should you do the same in your lab (or in production), you now have a solution. I don’t see this particular problem being all that common, but it would be nice if future versions of vSphere would also validate that the NSX bits were installed in the destination host before allowing a migration. A warning would be nice.
You can prevent this by having vmotion vmkernel in the clusters in 2 different subnets.
your take.
@Hari – So long as the vMotion subnets can reach one another, the VM will be able to migrate. There is also the possibility that someone places a new VM in a non-NSX cluster and attempts to use the Logical Switch port group.
[…] A Caveat to Multi-Cluster VDS Design with NSX by Chris Wahl […]
Thank you
Thank you for this post.
Nice blog. Thanks for sharing.
Nice article! Yes, there is no better way to learn technology then to break them first 🙂
Good Post
thx for the info/warning
great stuff that take into consideration for the NSX deployment
Hopefully they add a feature that checks if the NSX bits are on the destination before allowing a migration.
This is a great tip so we can take into consideration the design of the vDS and NSX.
Tips and traps most helpful!!
wont this be solved in NSX 6 and vsphere 6? with the new nsx features?
Thanks for good information
Thanlks for the info .
Very helpful indeed !!
Nice blog on how vmotion can cause issues cross cluster when nsx is also installed
Good info.
great works!