NetApp FlashCache Considerations in a VMware View Environment

I’ve worked with NetApp storage for a decent amount of time now, and have historically only given moderate appreciation to FlashCache. Perhaps this is because much of my time has been spent in the server world, where the VM architecture is so widely varied that it’s difficult to design for FlashCache. However, I’m getting ahead of myself: let me give you the short version of what this technology does, then we’ll go into some of my design thoughts on using it with VMware View.

So What Is This FlashCache Stuff?

Formerly known as PAM (Performance Acceleration Module), the FlashCache technology is a layer of cache installed on PCIe card(s), the number of which depend on the NetApp model and size of your wallet. These cards live inside the filer (the new term is storage system) and serve read requests that are not in the system memory. Essentially, as data flows up from disk, FlexScale (the software side to FlashCache) uses Intelligent Caching to figure out what data is worth holding on to.

A graphical look at FlashCache that I blatantly stole from the Interwebs

It’s All About the Linked Clones

In a VMware View environment, and I’m going to use View 5 moving forward, this makes an interesting match with linked-clones. Why you might ask? The replica, or the “parent VM” that all linked-clones are linking back to, is a read only file. Logically, the more linked-clones that refer back to the replica, the more often reads should occur on the replica. This sounds like a really good use case for FlashCache.

I’ll take a moment here to pause and point out an article written by a vSpecialist named Andre Leibovici entitled “Use Flash Drives (SSD) for Linked Clones, not Replicas” that demonstrates that perhaps replicas aren’t so deserving of tier zero storage after all (depending on the pool design). He specifically states the following two points:

It all comes down to how the VDI environment is utilized by the users. If boot storms, logon storms and VM refreshes are often requested, such in a Non-Persistent Pool serving school classes, then it is important to support Replica disks as much as possible. That means Replica disks should be on the fastest storage tier.

If there are no constant VM refresh or extreme login storms, then it is probably it is better to support Linked Clones as much as possible. That means Linked Clones should be on the fastest storage tier.

Go ahead and read the article (it’s worth it) … don’t worry, I’ll be here when you get back.

Done? Great. First off, I like that Andre is thinking outside the box and has some hard evidence to back his claim – it’s certainly more than what I have here in my pondering. The one difference I’ll point out is that FlashCache can only serve reads and is a copy of the underlying replica; the replica does not “live” on the cache. I think that makes a bit of a difference over traditional SSD as with something like EMC FAST (tiering). I’ll underline right now that this is not a NetApp vs EMC post, I’m just talking about two approaches here as I can appreciate and have used both.

But How Does This Influence Design?

So we’ve covered that FlashCache and replicas work well together. The benefits are:

  1. Because the replica IO is not hitting disk, you have freed up those spindles to handle more writes for your linked-clone disks.
  2. FlashCache is able to serve the reads faster than going to disk (regardless of what kind of disk it is) since the read request never goes any further down than the storage system.
A candid photo of myself thinking about View on FlashCache design

Now, where do we put the replica so as to best maximize the benefits of FlashCache? I highly doubt you’re only going to have one View desktop pool, so we can assume you’ll need more than one replica (one per linked-clone pool). Here are my thoughts so far, and I welcome contributions:

  • If the replicas are very similar to one another (let’s pretend you have a reason for this, such as max desktops per pool), put the replicas on the same volume and enable deduplication …
  • Building further on this idea, create a volume dedicated to handling replicas, and then enable FlexShare (a prioritization of resources at the volume level) to work a sort of QoS on the replicas or pin them to the FlashCache …
  • However, should the dedicated volume have an aggregate of fast spindles underneath it that can support the read requests should the FlashCache fail? Theoretically, the need for spindles underneath the FlashCache would just be for holding the non-volatile data and for writing new replicas before they get soaked up by FlashCache.

There’s something about linked-clones in VMware View that gets me really interested in FlashCache design and optimization. If you’ve gone through some builds, heartache, or success, let me know.

And bonus points to those who noticed that my headline graphic is The Flash.