Has Affordable All-Flash Storage for Virtualization Arrived?

I saw some pretty crazy presentations while at Tech Field Day 8 in Silicon Valley, and by crazy I mean “how did they do this” and had to hit the B.S.-O-Meter a few times. One such vendor that I want to review was Pure Storage: they make some really bold statements, but seem to have the numbers and hardware to back them up. Heck, even Tony Bourke had some interesting things to say about these guys, and he holds nothing back. In this post, I’ll go over some of my prejudices, then introduce you to some things I liked about Pure Storage, while trying to keep the writing oriented towards virtualization.

Lets Get Clear on Price

Today’s modern storage arrays seems to like expressing capacity in terms of “effective” storage. I’ll explain. Let’s say you buy 10 TB of usable storage. The vendor may say that you can get “up to 50 TB” of data on that array due to deduplication and compression, so you effectively just bought 50 TB of storage. The price per GB is then adjusted to that effective number. This effective capacity number looks great, especially with data that is highly dedupable like VDI.

Be aware that effective storage is being used when comparing $ / GB

In the land of flash storage, effective storage is the gold standard. If vendors expressed capacity in usable storage, spinning disk clobbers the dollars per GB and you sort of gag on the price. Now hold on a second, that doesn’t mean it’s wrong or evil; It depends on your perspective. At the end of the day, if you really can cram 50 TB of “stuff” on that 10 TB of storage, then awesome. However, the vendor better also tell you what the usable storage is and run some sort of analyzer on your current storage to figure out what the reality will most likely be.

The takeaway here is this: keep in mind that most claims are for effective storage, and that your mileage may vary. And to their credit, Pure Storage does break down their costs by usable storage and effective storage while also offering a tool to calculate your data reduction on their frame called the Purity Reduction Estimator (PRE).

Ok, we’re done with all the boring cost stuff. Let’s get onto the neat technology!

FlashArray Specs

Their product, the FlashArray, is a beast of hardware. Each controller has 24 CPU cores, 48GB of DRAM, and use 40Gb/sec Infiniband networking for the active/active I/O handling. There are two flavors, the FA-310 (10TB) and the FA-320 (20TB).

The Game Changers

The Purity goodies

Being able to teleport back in time and morph your liquid metal body into different shapes is cool, but I’m pretty sure even the T-1000 would be envious of some of the concepts that I saw on the Pure Storage box. Let’s look at a few that I feel are beneficial to the virtualization space and how they are trying to change the game. I also invite you to head on over to Jonathan Franconi’s site, a fellow TFD8 delegate and all around sharp fellow, to view his post that digs deep on his impression of the Pure Storage technology.

RAID-3D

The first is the concept of RAID-3D. This seems to be a huge part of the “meat” of Pure Storage. In a nutshell, the array is aware of the state of the flash drives (what they are reading and writing) and how many copies of the data that is trying to be written exist, and then intelligently chooses what type of RAID stripe to do. It may decide to write one block of data using a 4+2 RAID6, then the next using a 11+1 RAID5, and so on. I’m kinda boggled by this technology and worry about how “jumbled up” things must be inside the box. However, I’m not a deep dive level storage guy and only had a few hours to learn everything on Pure Storage, so I definitely missed out on some of the technical knowledge that is required to really grasp RAID-3D.

What I do like about the idea is that RAID is served at an “as needed” basis, rather than protecting everything with mirrors or parity. In an environment where writes are to be avoided when possible, this sounds like a very clever way to avoid writing more than necessary, while at the same time protecting data that needs it.

The comment was also made that this type of RAID has empowered the company to add a lot of protection around the data so as to make Purity an Enterprise and Tier 1 application ready product.

Space Savings

Pure Storage makes some really huge space savings claims. I’m extremely skeptical of these numbers in a real world environment. The whitepaper promises 5-20X (80%-95%) data reduction and the presentation we were given stated a 20x (95%) reduction on a 1000 VMs on VMware. I run a lot of VMs for work and typically see 65% in a non VDI environment with similar servers stacked on each other. I suppose I’ll have to run the PRE tool to find out, but it still seems really high.

The presentation highlights a 20x data reduction on a set of 1000 VMware VMs

The main difference that I can find is the Pure Storage is using a 512 byte dedupe block, where as my storage is running 4K blocks. If 80%+ is true, however, it really is quite the accomplishment.

One comment on the presentation is the reduction in servers by 10-to-1 by moving to a Pure array. I’m not convinced on this point, as the reasons were that a) CPU cycles are reduced as less time is spent waiting on disk, and b) that the VMs could swap to the storage instead of using memory. My counter-arguments are:

  1. CPU is not typically the contention point for an ESX host, it’s normally bound by physical memory. CPU wait improvements are nice, but they aren’t going to eat 90% of the hosts.
  2. Swapping to any kind of disk is bad! It wastes that precious vRAM licensing in vSphere 5 and also reduces VMs performance (nothing is faster than the memory bus, which is typically measured in nanoseconds).

End to End Latency

Saving the best for last, the most exciting thing I saw was the ability to run that 1000 VM workload mentioned earlier in this post with zero latency. Yes, zero, nada, zilch. Crazy!

0 latency to 1000 VMs

During the presentation, the engineer opened an SSH session to the Purity console to show total latency to the datastore was somewhere around 0.2ms (otherwise stated as 200 microseconds) because VMware rounds that number down to the nearest whole millisecond. I have to admit that I’ve never seen that before and was very impressed with the performance.

Thoughts

I really enjoyed the Pure Storage presentation. Their team consists of some super smart people who are passionate about storage and really know their stuff. They all opened up to the Tech Field Day crew and took our questions, which can be quite aggressive at time, with stride. Thanks guys! In addition, Chris Evans (The Storage Architect and fellow Tech Field Day delegate) has also written about Pure Storage – worth a visit as this guy lives and breathes storage.

The Pure Storage approach to the market

Do I think all flash arrays are the future? I’m really not sure. Is Pure Storage doing pretty good for not even being GA yet? I would have to say yes. If the Purity OS really can handle all these workloads with zero latency, this could be a “single SAN solution” to solve many problems and possibly eliminate the practice of doing an air-gap between different workloads. My gut tells me the price point just isn’t there yet and that selling TCO on effective storage will be a tough nut to crack (especially with some of the big players out there not having realistic dedupe and spreading FUD), but those are things that time will solve nicely (fabrication prices typically only end up going down, not up). I’ll definitely be keeping an eye on Pure Storage as the future unfolds.

Since you’ve made it this far into the post, I’ll reward you with a video of Pure Storage presenting at Tech Field Day 8. 🙂

[vimeo http://vimeo.com/29754057]