On vSphere Upgrades, Version Numbers, and Semantic Versioning

I tend to read a fair number of blogs, and one of them that is high on my list is Bob Plankers’ The Lone Sysadmin blog. Bob recently published an in-depth post entitled When Should I Upgrade To VMware vSphere 6? It’s a bit of a introspective conversation but, like with most everything that Mr. Plankers writes, I certainly found it to be a worthwhile read peppered with real world experience and saucy language. :)

Having published a Pluralsight course on Upgrading Your vSphere Environment, I thought I’d weigh in on a few things to further along this discussion. The first is the idea that version numbers are somewhat arbitrary, or as Bob puts it:

Thing is, a version number is just a name, often chosen more for its marketing value than its basis in software development reality.

Perhaps. I’m more of a student of semantic versioning and the proper use of three dotted version numbers (x.y.z) to reveal mysteries about generally available code. While I’m certainly not anywhere near VMware’s internal developer trenches, there is a rather well published set of rules around what numbers you pick and how to pick them. I’d likely nod to marketing having some influence, but updating to a new major version is a pretty serious release from an API perspective.

Assuming that there is a pattern here, a major release increment is associated with removing backwards-compatible APIs and potentially other forward-facing changes. This should influence the approach as to when upgrades occur, especially since every environment out there has dependencies on VMware and third party code.

Semantic Versioning Slide from the Upgrade Your vSphere Environment course

If a minor or bugfix value had been incremented instead, I would be less worried about backwards-compatibility.

It All Falls Back to Functional Design

The second point I wanted to riff on was:

Some people say that you should wait until the first major update, like the first update pack or first service pack.

History has taught many folks to be cautious, especially with the older Windows Server release cycles. Using GA code has typically meant that I’m paying to be a beta tester and find all the broken things that didn’t get enough real-world testing. However, an update or service pack doesn’t necessarily include the latest branch of fixes nor does it necessarily protect me from GA-level harm. See Explicit Failover Shenanigans when Upgrading to ESXi 5.X as a good reminder. :)

These days, it seems that public betas with wide swaths of folks having their hands on the code is the norm. This is in contrast to the release candidates and golden master builds that were typically released to specific sets of individuals and OEMs to validate compatibility and squash bugs. More than likely, a code upgrade will be delayed in your environment because certain dependencies are not yet certified compatible.

One of the key points that I spent half of my course focusing on is that upgrades, like any other design, should be directly tied into functional design elements. These are requirements, constraints, risks, and assumptions. Crafting a design against these four buckets help turn thoughts and “following the gut” into finite, measurable, and quantitative directives. As an added bonus, writing down your functional design will also help determine scope, flesh out missing items, and craft the bones of a logical and physical design document.

Upgrade projects should kick off a new effort to iterate on previous functional design elements to see which ones remain valid, are now missing, or can be wiped away. This is especially true for longer term environments or ones that you inherited from another architect. As Bob wisely states, he’s doing a fresh installation in a test environment to build a model that can be repeated in production.

Here are some specific thoughts on this:

  • What is the driver behind your upgrade project – new features, support concerns, addressing a dependency, providing new security, squashing bugs, or a combination of these?
  • What are the risks versus rewards that wrap around the upgrade project?
  • How much internal code (scripts, modules, manifests, configuration management) and external code (ecosystem partners and their API compatibility) will be supported in the new version, and has that been tested (internally and by vendors)?
  • Where is your environment located on the lifecycle, compatibility, and support road maps?