This is a relatively brief post, with hopes to help educate those who are starting off with vSphere and are using local storage for reasons such as a proof of concept (POC) or rules of scale (perhaps an SMB).
Throughout the VMware Technology Network forums, one of the most popular question I see basically states the following:
- The user had Windows Server running on a piece of physical hardware (such as an HP or Dell rack server).
- They then installed the VMware hypervisor and loaded a few VMs.
- Write speeds were recorded as being very sluggish, often in the 3 to 5 MBps range. When the Windows Server was on the box, they could write 50 or more MBps.
Comparing Write Methods
Without getting too entrenched into a technical deep dive, there are two basic ways to write to disk when a raid controller is involved.
With this method, the server sends blocks to the controller to be written. The controller writes the data to the disk as soon as the disks have an opportunity to seek and find a place to write. Once the write is successful, the controller reports back to the server that the operation was successful. The server then sends more data to write. This is a very “synchronous” process of send, wait, write, continue. It is also very slow, but is the only safe way to ensure that data is actually written before moving on.
The write-back method incorporates an additional piece of hardware: a piece of cache (memory). Typically, this is accompanied by a battery on the controller (something that looks like a small 9V battery hooked in using a 2 – 4 pin connector). In some cases, the battery backup is supplied by the UPS further upstream. When the server sends data to the controller, the controller stores the data in cache and sends a report back to the server that the write was successful immediately. The server will continue sending more data until the cache is filled. Meanwhile, the controller is feeding the blocks to the disks when the disks are ready to perform writes, which involves the usual seeking time to find a place to write. By making the action asynchronous, in which the server is decoupled from the whole “wait on disk” part, writes become much quicker.
Why Does ESX Seem So Much Slower?
To compensate for write-through scenarios, a server will often use it’s own RAM to assist in the caching process. When a Windows Server OS is loaded on the physical hardware that does not use write-back (typically because the controller has no cache or does not have a battery backup), it can drop large amounts of data into memory and feed that to the controller to be written. Even VM guests can take advantage of this, as they have dedicated amounts of memory (although usually not that much) to use.
The ESX hypervisor does not steal memory to perform this caching, and thus has to wait on disk directly when write-through is used. Remember, the hypervisor is designed to be as lightweight and non invasive as possible, and caching large writes into memory would require that the hypervisor either 1) have larger amounts of memory assigned to it, or 2) steal from the resources normally reserved for VM workloads.
In my opinion, it is assumed that the rather minimal expense for a write-back capable controller is offset by the value of virtualization, and as such VMware assumes you will have made this purchase or are simply not using the local disks for any VMs at all, which usually makes write performance a non issue. In a very small or POC environment, the opposite usually holds true; you may have acquired end of life or aging equipment to experiment with virtualization and cannot justify the cost of such a controller.
I’ve used controllers that did not have cache on them and were in write-through mode, and I can tell you that it can be very painful to work with. To ensure sanity, I ended up loading Windows Server 2008 onto the same server and confirmed that for brief periods of time I could get write speeds to a local RAID 1 (mirror) of 50-60 MBps. However, as the memory filled up, those speeds trickled down into the range of 3-5MBps. This could also be replicated on Windows guest VMs; they would show great speeds for a brief period of time, then trickle down. Later, I replaced the cards with a model that had 512MB of cache and enabled write-back mode. The transformation was amazing; ESX (and the Windows guests) finally started performing in the 50+ MBps range.