How vSphere APIs for IO Filtering (VAIO) Changes Everything

During this year’s US based VMworld 2014 conference, I had the pleasure of sitting down with the Tech Field Day Extra at VMworld crew to hear a variety of vendors discuss their products and solutions. One such company was SanDisk, or more specifically, the FlashSoft folks, around their collaboration with VMware on vSphere APIs for IO Filtering, which has been shortened to VAIO. No, not the laptop. :)

The idea behind VAIO is to create a specific set of APIs available for the expressed purpose of exposing disk operations for vendors that wish to fiddle with the IO bits before they land on the storage device. This gives VMware and third party developers the ability to be a first class citizen within the VM’s IO path, rather than having to use the PSA, hacks, or other workarounds at the kernel level. I think Cormac Hogan (whom was a dinner guest of ours) put the reason behind this into words quite nicely:

[VAIO] will not be like Microsoft’s 3rd party Device Driver program, where a misbehaving driver could blue screen the host. Instead, the filters will run in VM user space so that in the event of a failure, it only impacts the VM, and the ESXi host continues to run. (source)

VAIO will be a feature baked into the next major version of vSphere and is, to me, much more important in the grand scheme of things than any sort of configuration maximum value. In today’s ecosystem, it’s quite difficult to insert additional value into the IO path of a VM. Commonly, vendors get around this by using virtual appliances and protocol tricks. This adds overhead and complexity that will ultimately be reduced or eliminated by VAIO.

The SanDisk Partnership

Interestingly enough, VMware selected SanDisk to be a partner for the journey to create the VAIO design. This is the same company that acquired both FlashSoft and Fusion-IO – that’s a lot of brain power along with software and hardware knowledge to tap into, so it makes a lot of sense. To wrap my head around it, I had a few conversations with Serge Shats, Engineering Fellow at SanDisk, to discuss the implications of this development. In my own words, I compared it to building a new highway – the APIs – so that everyone in the ecosystem can then bring their car design to market and see what works.

apis-everywhere

I’ve already shared that VAIO will leverage filters in the VM user space (user world) rather than the kernel itself. This means that – as one would expect – each VM can be treated as a granular object and handled accordingly. It also means that the first set of use cases that VMware is interested in exploring revolve around distributed, cache-coherent, write-back acceleration along with replication (such as EMC and their RecoverPoint software).

The FlashSoft team went on to further state:

[We] are not only working on the API design, but we are actively working on implementing our next version of FlashSoft for vSphere based on the vSphere APIs for IO Filtering in ESXi 6.0. We plan to have our product ready for the launch of ESXi 6.0 early next year. (source)

Distributed Caching Challenges

When new layers of cache are inserted into the IO flow of a VM, old performance problems are solved and new dirty data problems are created. Let’s take a deeper dive by looking at a slide that SanDisk used in their Tech Field Day presentation.

VAIO with SanDisk

If each VM has dirty data, which is data that has yet to be written down to the storage target, then there are some new challenges to solve:

  1. Storage array backups or snapshots are missing the dirty data
  2. Storage array level replication would also miss the dirty data
  3. If a VM crashes and is HA restarted on a different host, the dirty data may be on different hosts

This is a short list, there are more, but it gets the conversation going. VAIO will have much tighter control over these scenarios, since the end-to-end stack, including the 3rd party vendor, will have visibility into the IO path. Let’s turn those three issues above into potential solved problems below:

  1. Storage array queries VAIO and waits for the dirty data to be flushed before committing a backup or snapshot
  2. Replication engine grabs the dirty data using a VAIO query, or waits for it to be flushed down to the array
  3. VAIO holds open the VMDK lock, thus HA is prevented from starting up the VM. VAIO can then flush the data before the lock is removed and the VM is powered on.

Again, I’m just spitballing here, but being able to query an API that understands the VM state is critical. Lots of new options will be available beyond just caching and replication once everyone at VMware and the vendor ecosystem get the warm and fuzzies on VAIO.

Thoughts

Although I think VAIF makes a cooler acronym (wink), the technology behind VAIO is spectacular and, quite frankly, very overdue. The vFlash Read Cache (vFRC) fiasco really shined a light on how not to implement server-side flash caching in a number of ways: it was read only, it was manual, and it was clunky. VMware’s new focus is on an open set of APIs for the ecosystem, which is a much better plan of attack to strategically let the vendor space focus on doing what they do best – creating great user experiences on top of the vSphere platform.

Looking forward to VAIO in vSphere.next, along with FlashSoft’s latest product release that will take advantage of it.