Migrating vSphere from ESX to ESXi 4.1

VMware has made it very clear that ESX is on the way out the door, and ESXi requires less patching (Woot), so I migrated my company’s vSphere cluster over to ESXi 4.1 from ESX. This is far from a definitive guide on how to migrate, but should offer some insight to others who wish to pursue the same adventure.

First off, there are many good reasons to adopt ESXi 4.1. Here are the two major reasons I decided to take the plunge:

  1. Significantly less patching – Since the hypervisor isn’t a “guest” of the COS (Console Operating System), there is no COS to patch. Historically, most patches are fixing security flaws and bugs that exist in the COS; removing this eliminates many vulnerabilities, which has a direct reduction on OpEx (Operating Expenses – or in layman’s term, the cost of paying people to maintain the system).
  2. Deployment has been simplified – Even if you’re doing a small operation with a few servers, the installation process for ESXi has been greatly simplified. Realistically, you could just pop in a CD with the vSphere ESXi hypervisor, answer a few easy questions, and go to work. Not a fan of Linux? No problem – ESXi doesn’t ask how big to make the cryptic sounding /var or /tmp partitions, because they’re gone!

Hardware Considerations

For this migration, I’m fortunate to be dealing with all enterprise grade servers that have CPUs in the same family (Intel Xeon 56XX). I will stress that you should read the HCL to find out if your hardware is compatible with ESXi 4.1 prior to the upgrade. I’d assume that if you’re already running ESX 4.0 or 4.1 you should be 99% of the way there, but due diligence is always rewarded in the end, so check first.

From experience, most HCL issues pop up in three areas:

  1. NIC – This will result in the installation failing with a “failed to load lvmdriver” message.Without a compatible NIC, the hypervisor is unable to generate a unique identifier needed to continue the installation.
  2. SCSI controller (RAID) – If you’re doing a disk based installation, as opposed to using embedded flash or boot from SAN (Storage Area Network), the disk may not appear as an option for install if you have an unsupported SCSI controller. Don’t be tempted to use a single SATA or SAS disk – it’s not worth the risk of having your server suffer downtime because of something as simple and as common as a hard drive failure.
  3. CPU – I have not seen a specific CPU cause installation issues, but this can cause problems with vMotion inside of a cluster. Fortunately, vCenter doesn’t allow migrations across instruction sets without Enhanced vMotion Compatibility (EVC) as it would cause some VM corruption – here’s a video showing a Hyper-V migration that did not stop a VM from moving to a less instruction-rich CPU. Additionally, stick with one manufacturer for all host CPUs. Even with EVC, vCenter “does not allow for migration with VMotion between Intel and AMD processors” as per this EVC support link. Of course, you could power down the VM and then migrate it, but then – what’s the point?

The One Upgrade Path

A word of caution: If you’re not comfortable doing this in production, I’d suggesting setting up a practice lab using VMware Workstation 7 with a set of ESX and ESXi hosts running as VMs with a vCenter VM using all evaluation licenses. You get 60 days to figure things out without any affect on production.

First off, Eric Siebert (vExpert) has posted a great “seven step” guide to upgrading hosts from ESX to ESXi. Make sure to read what he has to say about upgrading, as he covers some more detailed concerns that I did not have to work around in my specific environment. I’d also suggest watching this Episode 7 of vChat with Eric Siebert, Simon Seagrave, and David Davis.

The bad news of going from ESX to ESXi is that there is no upgrade path. Veterans to patching and maintaining a vSphere host are used to utilizing VUM (VMware Update Manager) and the Host Update Utility however, your only choice to upgrade is a Clean Install.

Documentation

The very first step of any project should be documentation. Since you’re doing a Clean Install, it’s vital that you document your vSwitches, resource shares, IP addresses, etc. I used the free edition of Veeam Reporter to record this information, which was easy to use.

I strongly advise not skipping this step, as once you pull the trigger and wipe your host, it’s gone forever. Take the time to ensure a smooth upgrade.

The First Host Upgrade

In order to get the ball rolling, you’ll need to vMotion off all VMs from the first host. Then, set that host to Maintenance mode, Disconnect it from the cluster (if applicable) and Remove it from vCenter. You can then power it down and boot off your install media.

It may also be wise to disconnect all production LUNs from the host (fiber channel connections, iSCSI cables, etc.) to avoid any possible injury to your SAN datastores.

This is an opportune time to consider introducing your cluster to EVC. At this point, you can simply create a new cluster and make sure EVC is enabled. Begin adding the new ESXi hosts to the new cluster. Note that if your ESX hosts contain mixed CPU instruction sets that are not available on the ESXi hosts, migration will fail. You will most likely need to do a cold migrate of those VMs.

If you’re not sure what the “lowest” EVC mode you support is, check your CPU specs against the settings in EVC. For example, Intel modes are Core 2, 45nm Core 2, Core i7, and 32nm Core i7. Additionally, the little blue speech bubble next to the VMware EVC mode information item tells you what modes the host itself supports, even if EVC is disabled. I’ve highlighted the button with a red square.

Go through the installation of ESXi on your first host, making sure to input the IP address, subnet mask, gateway, DNS servers, and NTP server information.

Connect the vSphere Client to vCenter and add a host to the existing cluster (or the new cluster if you’re beginning to use EVC). If an alert warns that the host is not configured properly for HA, select the “Reconfigure for VMware HA” option on the host. I’ve only seen the issue once and the reconfigure  solved the problem. Reconnect your SAN to the host and check to make sure your Datastores appear as they should (the fiber SAN environment I worked in did not show work necessary because the WWN did not change, so the fiber network still considered this to be the same host as soon as it was plugged in).

Additional Hosts

Migrating additional hosts does not call for any deviation. Repeat the steps above, omitting the addition of a new EVC cluster because it has already been created.

Virtual vCenter Server Considerations

Working with a virtual vCenter server can present special challenges when administrative work must be done to the VM. Migrating a vCenter VM to a new EVC cluster involves a bit of a trick . VMware has a similar article outlining the process, but is a bit heavy on steps.

The steps I performed were:

  1. Make sure you have one host running ESXi in the newly created EVC cluster.
  2. Connect the vSphere Client directly to the host that contains the vCenter VM in inventory.
  3. Record the Datastore location of the vCenter VM.
  4. Power off the vCenter VM.
  5. Remove the vCenter VM from inventory.
  6. Connect the vSphere Client directly to the host that is running ESXi in the newly created EVC cluster.
  7. Manually add the vCenter VM to inventory.
  8. Power on the vCenter VM.

Thoughts

Overall, the process is pretty straightforward, and a nice exercise in the fundamentals of VMware host and guest management. While it would be nice if there was an in-place upgrade option, the radical differences between ESX and ESXi make it reasonable to require a fresh install. From a day-to-day administration standpoint, there really isn’t any difference between the two, making the switch transparent to pretty much everyone but the person doing the monthly patches, who should be happy about the significant reduction in workload.