I was working on a vCenter 4.1 to 5.1 upgrade on a physical host when I ran across this little bugaboo that I wanted to share. In this particular situation, a new physical vCenter server was being put into production at version 5.1, and the old vCenter server running 4.1 was being decommissioned. This has the added advantage of giving the process an easy roll back – just power on the old box. Sort of like a physical version of a snapshot rollback. 😉
To simplify the upgrade process, I stole the network cable and name off the old vCenter server. This allows the user base to connect to the same server, as well as removing a lot of complexity around IP addresses, DNS A records, and the like. However, I hit a bit of a snag when doing the actual upgrade to the vCenter portion of 5.1.
SSL Certificate Checking
If you read the upgrade documents from VMware, you’ll see this little jewel:
Make sure that SSL certificate checking is enabled for all vSphere HA clusters. If certificate checking is not enabled when you upgrade, HA will fail to configure on the hosts. Select Administration > vCenter Server Settings > SSL Settings > vCenter requires verified host SSL certificates. Follow the instructions to verify each host SSL certificate and click OK. (source)
OK, that makes sense, since the new “HA engine” in version 5.X is FDM. AAM is dead and gone . I always just figured that I could do a “reconfigure for HA” operation after the upgrade, or even configure a fresh new HA cluster if necessary. No big deal, right?
The upgrade actually stops you as it’s about to kick off the “vCenter Services” portion and alerts that you did not enable SSL certificate checking! It then exits and expects you to go set this variable. So, I was kind of stuck – I couldn’t install vCenter to change the option.
The only way I could really think of to fix this was to go re-cable the old vCenter 4.1 box to the network, power it all back up, and fire off the vSphere Client – just so I could check these boxes:
This just goes to show how a little arrogance can lead to a lot of extra work. I would imagine that if you no longer had your vCenter 4.1 server, the only other ways to fix this would have been a support call (to have a VMware DBA ninja go into the bowels of the database and manually set the options) or installing vCenter 4.1 just to toggle the setting.
If you’re not on vCenter 5.0 already, make sure you turn on host SSL certification checking in vCenter. You should be able to do it during production with no adverse affects – just make sure to check the “verified” box next to each host. There will be lot of tasks in the vSphere Client that show “reconnecting host” and immediately finish. If you have a concern over this, however, it may be best to just get a change control completed and schedule for a maintenance window. It’s going to take a fair bit of time set aside (depending on the number of hosts) to check all the thumbprints.
Also, if you want more certificate goodness, check out fellow VCDX Michael Webster’s post on “The Trouble with CA SSL Certificates and ESXi 5” over at his blog. It came up when I was first searching on this error via Google and contains some great information.