There’s something fun about building snazzy graphs and charts in which the data points are arbitrary and ultimately decided upon by myself. This is why I’ve been having a blast building a few lab graphs using the recently released Grafana 2.0, which is “an open source, feature rich metrics dashboard and graph editor.” It’s certainly much simpler than others I’ve stood up, such as the really slick sysadminboard that I have a lot of respect for. And it includes multiple methods in which you can share your dashboards, such as local snapshots and published snapshots on raintank.io.
— Shane Schnell (@shaneschnell) April 23, 2015
Because Shane asked for it, I’ve written down much of what I’ve been doing with Grafana in this post and tried to explain how I stood up various graphs. If I glossed over something significant, drop a comment and let me know.
Why Not Graphite?
I had originally planned to try out Graphite as the back-end data source, but ended up pivoting over to InfluxDB instead. I think Robin Moffatt over at Rittman Mead has the best reason:
Whilst I have spent many a happy hour using Graphite I’ve spent many a frustrating day and night trying to install the damn thing – every time I want to use it on a new installation. It’s fundamentally long in the tooth, a whilst good for its time is now legacy in my mind. Graphite is also several things – a data store (whisper), a web application (graphite web), and a data collector (carbon). Since we’re using Grafana, the web front end that Graphite provides is redundant, and is where a lot of the installation problems come from. (source)
So, I went with InfluxDB. Up to you, really.
Deploying Grafana and InfluxDB
These deploys are elegantly simple. I default to CentOS 6.6 in the lab, so you can follow the official install guides if you’re in the same boat. You could also drop Grafana on Windows or Docker. There’s also modules for Chef, Puppet, Salt, Ansible, and others.
Note that my template image includes the Extra Packages for Enterprise Linux (EPEL) repo because it crops up as a requirement so often. I don’t recall if that is required for these packages, but just throwing that out there.
Here are the two installation links I used for CentOS:
- Installing Grafana on RPM-based Linux (CentOS, Fedora, OpenSuse, RedHat)
- Installing InfluxDB on RedHat & CentOS
I suppose you could deploy both packages to the same server, but I ended up cloning a pair of servers in the lab and deploying each package separately. Both the Grafana VM and InfluxDB VM have 1 vCPU, 1 GB of RAM, and 20 GB of thinly provisioned disk space. For a lab environment, this seemed to be more than adequate.
Assuming you’ve stood up the servers per the instructions, you’re almost done.
Browse to the IP or DNS name of the InfluxDB server using port 8083 and a login of root/root. Head to Databases and create a database with whatever name strikes your fancy. I went with spongebob because that’s how I roll. The details and shard space information can be left at defaults.
If you use the Explore Data link, there’s currently nothing in the database to explore. You could manually enter some data just for fun – in fact, I suggest tinkering a bit to understand how to use the SQL-like query language and read up on the required format for the JSON payload. The query format is quite simple and likely something you won’t be using much in this walkthrough – we’re going to mainly focus on Grafana as a front end. However, knowing how to construct the payload is important.
In my lab, I went the lazy man approach and created an admin user/password with grafana/grafana. I then baked the information into the URL of the POST. Alternatively, use basic authentication. Here’s an example URL:
<span class="pl-smi">$url</span> <span class="pl-k">=</span> <span class="pl-s">"http://172.16.20.236:8086/db/spongebob/series?u=grafana&p=grafana"</span>
Note the following:
- The API port is 8086
- The database name, spongebob, is included in the URL
The POST body uses this structure:
Notice that no work was done prior to set up the tables and columns; the very act of posting to the API will add data points to the series name specified. Also notice that the points key:value pair uses a nested array because you can batch data points and send over multiple arrays at one time (using your own timestamp value). If you want to rely on the InfluxDB timestamp, send over one array at a time and the point in which the server receives the data will be used.
That’s pretty much it for InfluxDB. You now have the back-end stood up and ready to receive data. I’ve written a series of PowerShell scripts to collect data from vSphere Hosts, VMs, SQL Server, and a NAS share used for Veeam Backups. You can view those in my grafana-vsphere-lab project to get started with data collection, use the ps1 scripts as examples, or even improve upon the repo and send me a pull request. I don’t think the project will become anything super polished, but I wanted to share what I’ve written thus far.
The Grafana web interface is available by browsing the IP or DNS name of your Grafana server using port 3000. The default login is admin/admin. There are no dashboards out-of-the-box, so the first screen you see will be rather barren.
Let’s add InfluxDB to the configuration so that Grafana can display some data. Perform the following:
- Start by selecting the Grafana logo on the top left corner to expand the menu
- Choose Data Sources.
- Select Add New.
- Enter the information for your InfluxDB server, including the database name (mine is spongebob). Don’t forget that the API URL is 8086; don’t use port 8083 (web interface).
- I’d recommend making this data source default, as otherwise Grafana defaults to itself as the data source.
Building a Grafana Dashboard
It’s time to build a dashboard!
- Select the Home button.
- Choose +New to build a new dashboard, which I will walk through a bit below, or …
- Choose Import to load a dashboard from a JSON file. You can load my sample dashboard using my JSON example on GitHub.
Once you have a dashboard created, it’s time to make some graphs.
- Select the green menu button on the left side to edit a Row.
- Choose Add Panel.
- Choose Graph.
- If you want more rows, use the purple Add Row button.
- To save your work, press the Save button.
A new graph will appear with a name of “no title (click here).” Do what it says 🙂
- Click on the Title (that says no title).
- Select Edit.
There’s a lot you can do here, so I’ll focus on a single use case to get your noodle juice flowing.
Building an ESXi CPU Utilization Graph
Graphs use data sources to create visuals. Because my InfluxDB was added, and is being updated with live data from my various scripts, it’s really just a matter of finding the data and displaying it with Grafana. Each series you enter in Grafana will pull one or more series of data from InfluxDB (or other back-end data sources) and use select statements, time grouping, and other query delineations to dynamically build a graph.
Here’s the data structure I’m using for the host performance points:
Because I use a set naming structure for my hosts (the $vmhost var), it seemed easiest to use a regular expression (regex) for the two metric series necessary to pull data as opposed to creating a query for each host individually. The nice thing about a regex is that it will automatically add new hosts to the chart without any help, so long as I continue to use the same name format.
Enclosing a string with forward slashes is used to build a regex with an InfluxDB back-end data source. The bracketed [0-9] portion allows the metric to pull data from any match that includes a single number in the name, such as esx1 and esx3. The hosts with a “d” in them are marked for dev work.
For the alias, I’m using a variable and a static string: $0 cpu. The series name can be referenced as a series of strings split by periods. Thus esx1.glacier.local can be referenced by these variables:
- $0 = esx1
- $1 = glacier
- $2 = local
And so on for series names with more periods.
Finally, update the select box with the data point for this chart. Because it’s a CPU Utilization chart, I’ve chosen the CPU data point. The result is that each metric pulled by the regex will be esx# cpu. The remaining values can be left default for this example, as Grafana is smart enough to determine time groupings based on the data it receives. The chart now looks like this:
Make sure to save the dashboard when changes are made, or just browse away from the dashboard and discard changes if you don’t like how it looks and want to revert to the last save.
With a little time and work, you can have some pretty amazing graphs built into the dashboard.