It’s no secret that I’m a bit biased towards NFS and against iSCSI, or at least, I don’t try to make that a secret. Most of my encounters with the iSCSI protocol have been littered with poorly implemented stacks by vendors who add iSCSI target support as an after thought. As such, it requires much painful head smashing into a nearby desk. I tend to harp on the protocol mostly in jest these days, but it is difficult to shake the association placed upon me and therefore I just let it ride.
Nimble Storage, a hybrid-flash storage array vendor that has recently filed registration for their IPO, is looking to change my mind on iSCSI with a rather sexy combination of performance, enterprise features, and support. First spotted at Tech Field Day 3 (2010) and later at Storage Field Day 2 (2012), they are back for more
punishment fun as myself and 11 other delegates take them on at Storage Field Day 4 in San Jose on Thursday, November 14th. While I strongly encourage that you tune in for the entire event (or watch the recordings afterwards), I ended up bending the rules a bit and going deep on the product with the Nimble Storage team prior to the event. Don’t tell Mr. Foskett.
Cache Accelerated Sequential Layout (CASL)
Let’s get the physical bits out of the way first. Nimble Storage uses the prevalent 3U SuperMicro chassis that features a pair of storage controllers in an active / passive configuration, meaning one controller is handling all of the workload IO while the other is simply there to take over in case of failure. The controller chassis houses a 9+2 RAID 6 of NL-SAS drives with a single hot spare (for a total of 12 NL-SAS disks) and 4 MLC SSDs. The capacity of the drives are variable depending on what was purchased to meet the use case. Up to 3 additional storage shelves can be added to the system by way of 6 Gb SAS connections, and each shelf can house a 12+2 RAID 6 of NL-SAS with only a single MLC SSD required (mainly for metadata). Each controller features a custom designed PCIe card housing NVRAM and a super capacitor acting as a power backup (instead of a battery housing).
You won’t be able to throw a rock around the Nimble Storage documentation without hitting mentions of the CASL architecture. They’re pretty proud of the design, and after much geeking out with the team, I finally had a taste for what makes this thing tick. The idea is to place all writes onto the PCIe NVRAM device when they enter the system, which plays to the “Cache Accelerated” portion of CASL. NVRAM is mirrored between the two controllers to ensure that the standby controller can immediately and safely take over in the event of a failure. Once the mirroring has been completed, the write is acknowledged back to the host. This makes for a speedy and small latency round trip time (RTT).
The NVRAM card handles variable, per-block, inline data compression. There is no need to select a single block size for your workloads – the array can roll with whatever is thrown at it. This can be handy for mixed workloads – such as a SQL server with 64K blocks mixed with another Web Server using the standard 4K blocks – and avoids partition complexity or having to mess with offsets.
Because the blocks are always compressed inline, Nimble Storage is perfectly fine with using Eager Zero Thick (EZT) VMDKs with vSphere. The LZ4 compression algorithm will ensure that no space is wasted on the array. By using EZT on your virtual machines, you help fight against any over provisioning pain points caused at the virtualization layer.
As the compressed data fills up NVRAM, it is serialized into a 4.5 MB stripe to be placed onto the NL-SAS disks – here’s where we hit the “Sequential Layout” portion of CASL. The serialization process actually looks at the origin of the blocks to determine any data locality, such as two blocks arriving from the same datastore. The array tries to locate those “sibling” blocks next to one another when writing to disk, since statistically there is a high chance that the data will be retrieved at the same time (a file is rarely just the size of a few blocks). Once the serialization is completed, the 4.5 MB stripe is written to disk. That would be about 512 KB of data written to each NL-SAS disk (4608 KB / 9 data disks) plus parity data.
What About The SSDs?
I thought it was rather clever to skip the SSDs from the write process, although they are involved in an indirect manner. The 4 MLC SSDs are in a JBOD (just a bunch of disks) like collection, meaning there is no RAID. Nimble Storage asked me to think of the SSDs as a large adaptive flash cache. As writes enter the system, they are analyzed for their “cache worthiness.” Random data is often found worthy, while sequential data is not, although it’s a bit more complex than just that. SSD caching can be enabled on a per-volume basis, allowing a fine grain configuration for what you deem worthy of consuming cache space.
As “cache worthy” data is identified, it is collected into a smaller sequential stripe the size of an SSD’s page and dumped into one of the flash drives. Using a First In, First Out (FIFO) log structure, data is eventually evicted as it grows cold. When a cache miss occurs, the system pulls the data from the NL-SAS disks and performs a read-ahead and prefetch.
Data that still resides in the NVRAM or DRAM (via a “shadow” cache) can be read, bypassing the need to chat with the SSD layer.