I’m a bit of a Cisco UCS technology fan, although not nearly as knowledgeable as the infamous Colin “UCS Guru” Lynch. Thus, when the folks over at VMTurbo announced the Fabric Control Module that plugs into their Operations Manager product to offer new insights, I was hooked and had to see it for myself.
Over the past few months, I have taken the product for a spin in a reasonably large UCS shop with multiple UCS domains and several dozen blades running bare metal and vSphere ESXi production workloads. This is not a lab! As such, I had to blur out names and IPs 🙂
Let’s dive a bit deeper into what I found.
UCS Fabric Network Utilization Details
UCS has a lot of different interfaces to monitor: logical, host, network, and the list goes on. I’ve found that most shops just keep an eyeball on the Ethernet uplinks that connect upstream into their LAN fabric by way of some sort of port monitoring software. It’s a bit more rare to gather solid data on the UCS Server Ports that face downstream towards the UCS IO Modules (Chassis FEX); these are consumed by the blade servers.
Here’s an image showing off a variety of data for a UCS domain that has 7 chassis and 14 IO Modules:
Let’s break this graphic down:
- The average utilization of all ports (shown in the upper right) is a hair above 1%. Therefore, we have plenty of bandwidth.
- The Southbound Ports described by VMTurbo are the Server Ports going from the Fabric Interconnects (FI) to the IO Modules.
- The Northbound Ports are the Ethernet uplinks going from the Fabric Interconnects to the LAN switches.
- I have Fabric Interconnect B highlighted (shown as sys/switch-B), meaning that all of the data is for that particular FI.
- The IO Module Utilization in the bottom left shows the UI (utilization index) and overall network traffic for each Chassis. I can isolate hot spots and migrate any stateless blades around if necessary or add Server Ports to the FI and IO Module. VMTurbo’s decision engine will provide recommendations, too.
The data above is typically difficult to acquire and analyze. Although I’m not the biggest fan of the VMTurbo interface (mainly the shiny bar graphs), the data itself is quite valuable.
Power, Cooling, and All That Jazz
The tool is not limited to speeds and feeds. I can also switch over to a UCS chassis to find out the power draw (in Watts) and cooling value (in Celcius). Here’s an example chassis below:
Cooling shouldn’t be all that difficult for most environments, but power might be. And the fact that I can find my power hot spots, or places where I might have to shut off a blade if a PSU fails, is pretty snazzy.
End-to-End Supply Chain
One of the other areas I’ll cover here that was really neat to me is the Supply Chain feature. It shows off the complete path between a virtual machine (VM) and the Fabric Interconnects. For example, when a VM wants to send or receive traffic, it must traverse these high level components:
Virtual Machine's NIC, ESXi vmnic (UCS vNIC), IO Module Host Interface, IO Module Network Interface, Fabric Interconnect Server Port, Fabric Interconnect Ethernet Uplink
That’s a lot of dynamic points to look at!
Let’s start by looking at the supply chain from the Fabric Interconnect’s point of view. I’ve selected FI-A (sys/switch-A) and can see all of the various connections in the chain. VMTurbo uses the concepts of produces and consumes. In this case, FI-A produces ports for the IO Modules, but consumes the UCS Domain (cluster). Everything relating to FI-A, both the producers and consumers, are shown in this view, along with action items recommended by the decision engine. I can quickly scroll through the dependencies to find weak links where the utilization is too high or too low for my taste.
I could also look at the relationships from an ESXi host (physical machine) point of view. I can see the physical machine (ESXi host) itself, which IO Module ports are passing traffic, the chassis housing the blade, and any VMs running on the host. In the case below, VMTurbo has suggested that I migrate one of the VMs to a different ESXi host to improve performance.
The recommendation can be set to automatic, meaning VMTurbo will act on the suggestion and remediate the issue, or manual as we have here. This allows an administrator to review and approve the recommendation.
Because VMTurbo can see both the physical world (UCS) and the virtual world (vCenter), looking at a virtual machine is filled with data. Gobs and gobs of delicious data. Here I’ve selected one of the vCenter Ops VMs, named Analytics VM, and decided to look at its details. I can see utilization information, what resources are being consumed from the host and fabric, datastore usage, and the ESXi blade utilization. Key resources, such as ballooning, latency, and CPU Ready are all factored into the index values and used by the decision engine to make performance related recommendations.
I look at this like having a co-pilot around to help figure out the big picture among many different virtual and physical components. Yes, you could figure this stuff out yourself given enough time and effort, but I would imagine quite a few folks would put value in a “here is the exact problem” response from VMTurbo.
I’ve only scratched the surface of this tool, as it contains a lot more parts and pieces that I did not have time to play with. But from a UCS perspective, I’m really diggin’ this bad boy. 🙂
If you want to take VMTurbo Operations Manager version 4.5 for a spin, you can do so with this 30 day trial link. It’s a simple deployment (OVA) that you can drop into your lab or work environment. The out-of-box config is 4 vCPUs and 16 GB of RAM, but I’ve found that you can tweak that down if necessary. If you don’t have a Cisco UCS system handy, you can point VMTurbo at a UCS Emulator but some of the reporting, such as temperature readings, will end up blank because the emulator is a virtual system. The trial download page also has a haiku for your entertainment, which is something I haven’t seen before. 🙂