An interesting questions was brought to my attention on Twitter by Anthony Elizondo:
@complex That sounds complex, I'm not sure you could do it with zero data interruption.
— Chris Wahl (@ChrisWahl) June 9, 2014
For those with Twitter blocked, Anthony wants to migrate from a standard vSwitch with no LAG to a distributed vSwitch with LACP, and there are only two uplinks (vmnics).
After spending some time pondering at my whiteboard, I drafted a solution that will overcome reasonable constraints without much risk. In this case, I make an assumption that it is acceptable to disrupt traffic for < 5 seconds by using an old trick to fool the hypervisor with an empty standard vSwitch. It’s something I’ve done a few times in the past when doing rather extreme workload migrations across data centers and has worked rather well for me.
Here’s the high level overview:
The four major steps for migration are as follows, with rough details provided:
- Create a new VDS with all of the required management, vMotion, virtual machine, and other port groups.
- Add Host B to the VDS but don’t migrate over any uplinks (vmnics) yet.
- Put Host B in maintenance mode to remove all of the VMs.
- Do not configure LACP on the VDS or the physical network at this time.
- Migrate over one vmnic from the VSS, followed by all of the VMkernel ports to the VDS, and ensure they are operational.
- Once verified, migrate over the remaining vmnic from the VSS, leaving the standard vSwitch devoid of uplinks.
- At this point you can configure LACP on the host and then the physical network.
- Although there will be a slight disruption to VMkernel traffic, the LAG will form in short order and traffic will once again flow.
- If you goof this up, just restore the config using the ESXi shell or use the VDS rollback feature.
- Set the cluster’s DRS settings to partially automated or automation level 1 to avoid any automatic vMotion activity.
- Exit maintenance mode and use vMotion to migrate a VM workload onto the host.
- The VM will no longer have network connectivity once it arrives on Host B.
- Immediately change the VM’s network from the standard vSwitch port group to the distributed vSwitch port group.
- Traffic will resume.
Scripting with PowerCLI
I’d prefer to do the vMotion and port group change with a script to maximize speed. Here’s a quick PowerCLI script I wrote as an example. It primarily leverages the Set-NetworkAdapter cmdlet:
$desthost = Get-VMHost -Name "hostname"
[array]$vmlist = Get-VM -Name "list of VMs to move"
$destpg = "VM-1"
foreach ($vm in $vmlist)
Move-VM -VM $vm -Destination $desthost
Get-NetworkAdapter -VM $vm | Set-NetworkAdapter -Portgroup $destpg -Confirm:$false