12 Responses

  1. Matt Liebowitz
    Matt Liebowitz at |

    First off, I have to say that I love that you use Prime95 as your “peg the resources” app. I do too and thought I was the only one.

    Good findings, though interestingly enough they are contrary to what VMware themselves found when they were doing some specific testing with vMotion and Exchange 2010 Database Availability Groups (DAGs).

    As a bit of background, during a vMotion operation a DAG could experience a database failover due to the brief drop in network connectivity. Microsoft provides a workaround to this by increasing the cluster heartbeat timeout settings in Windows to essentially increase the cluster’s tolerance for lost heartbeats. That is the Microsoft accepted solution and the one that I recommend using.

    As part of that testing they tested how jumbo frames impacted the performance of vMotion. They found that with jumbo frames enabled, they were able to perform vMotion operations and experience no DAG database failovers even without modifying the cluster heartbeat timeout.

    You can read more about the testing here (opens a PDF): http://www.vmware.com/files/pdf/using-vmware-HA-DRS-and-vmotion-with-exchange-2010-dags.pdf

    One of the differences between your test and theirs was the use of multi-NIC vMotion. Their testing was done on vSphere 4.1 before that feature was introduced. I would think if anything multi-NIC vMotion would only improve the situation but it might be worth running the test again with a single NIC just for giggles.

    Anyway, great stuff as always.

    Matt

    Reply
  2. Ken Schroeder
    Ken Schroeder at |

    Nice post Chris. I just recently ran similar tests for Mutli-Nic VMotion and included test case variables with and without Jumbo Frames for single VMotion nic on UCS with 10GB+ connectivity. My test results with larger pipe definitely showed an improvement in the 15-20% range on average.

    Reply
  3. Rurik
    Rurik at |

    Any thoughts to testing it on a 10GB network and posting it (like this one)?

    Reply
  4. Gabriel Chapman
    Gabriel Chapman at |

    Michael Webster did some testing a while back with Jumbo Frames, his results here: http://longwhiteclouds.com/2012/03/29/jumbo-frames-on-vsphere-5-update-1/

    around a 10% bump in performance.

    Reply
  5. Josh Atwell
    Josh Atwell at |

    Good post Chris.

    I’d definitely recommend trying it out with multi-nic 10Gb connections. I’d also like to see what it looks like doing multiple simultaneous vMotions. The use case we were investigating at Cisco was improving time required to evacuate an extremely busy, high memory ESXi host for maintenance.

    After all, you can’t utilize the limits of your hardware if you can’t get the thing in maintenance mode for patches/updates.

    Reply
    1. Josh Coen
      Josh Coen at |

      Chris,

      Great question regarding MTU and cost/benefit; and a great post, as usual.

      I’m using your testing baseline and script to perform the same test in a 10Gbe environment (some other configs also vary). I’ll send you my results and may post something on my blog as well. I’m also going to test multiple VMs.

      -Josh

      Reply
  6. Philip Sellers
    Philip Sellers at |

    Chris,

    This was a really interesting post. vMotion performance is one of those things I think every administrator is looking to improve. Jumbo Frames is a logical thing to try, but your findings are extremely useful. I’m in the same boat as others – wondering how 10Gb would affect things. Great stuff!

    -Philip

    Reply
  7. JustinT
    JustinT at |

    We see an approx 10% improvement with Jumbo frames. In general though we struggle to get more than 18Gb/s throughput for vMotion to / from a host even with e.g. 4 x 10GbE NICs.
    This is contrasted by e.g. iPerf where we can run up a lot of TCP or UDP streams on VMs with large MTUs and large send/receive buffers and max out all of the same 10GbE links when allocated to VM traffic vs. vMotion.
    Has anyone else experienced this?
    Am wondering if it is due to vMotion TCP stack e.g. limited receive buffer size?

    Reply

Share your point of view!