Building a Live Mount Across New and Old Backups with Rubrik

The idea of bringing back data quickly is attractive when dealing with a data protection solution. Unfortunately, most of the conversations I’ve had in the past dealt with ingest speeds and storage nerd knobs. While it’s important to be able to protect applications at a frequency that meets the desired Recovery Point Objective (RPO), I’d wager that finitely knowing how quickly data and applications can be returned to an operational state or spun up for answering “what if” questions are top of mind for many folks.

In today’s post, I’ll dig in a bit deeper into Rubrik’s instant recovery capabilities by talking about the Live Mount feature. In a nutshell, this allows for a full virtual machine to be restored from a backup (we call it a snapshot) in a matter of seconds. This can be done across any number of first full / incremental forever snapshots without concern for how many backups exist. Most of the customers I work with leverage this for regression testing, application upgrade validation, service schema or configuration testing, and disaster recovery models. The sort of “what if” scenarios that many of us would like to tinker with without applying any pressure (in terms of both capacity or performance) on the production storage array(s).

To demonstrate this, I’ll use my trusty “SE-CWAHL-WIN” virtual machine that runs on Windows Server 2012 R2. This workload has been running in the Rubrik engineering lab for years. My, how time flies! I’m not much of a GUI person, so I’m going to grab information via PowerShell. It also makes measuring the time between requesting a Live Mount and seeing it completed a bit easier. For fairness, though, I’ll include a few screenshots and short animations from the GUI. 🙂

Gathering Snapshot Information

After loading the Rubrik PowerShell module and connecting to one of my engineering clusters, I’ll begin by pulling down information on the snapshots and storing the results to $Snapshot. After that, I’ll display some of the results.

# Gather snapshot details for my virtual machine
$Snapshot = Get-RubrikVM -Name 'SE-CWAHL-WIN' | Get-RubrikSnapshot
$Snapshot | select -Property vmName,id,date,cloudState

This takes only a moment. To avoid making this post super long, a snipped list of snapshot details are below. Data for the past month is held locally on the cluster, while data beyond 30 days has begun to transition into the public cloud as per the SLA Policy assigned to my virtual machine.

vmName       id                                   date                 cloudState
------       --                                   ----                 ----------
SE-CWAHL-WIN 3576c17f-3573-4470-aac0-904c3245fe74 2017-10-13T15:46:29Z
SE-CWAHL-WIN 08066df2-f8e3-49ab-b491-841a3243be80 2017-10-13T11:40:53Z
SE-CWAHL-WIN 87188710-4f43-4307-8646-ecaf09e99235 2017-10-13T07:17:18Z
SE-CWAHL-WIN 5922c8cc-eccf-4c43-a056-776cd0b5e0d4 2017-10-13T03:16:19Z
SE-CWAHL-WIN ef19adc6-5236-4bc1-b6c6-a36a6e94147e 2017-10-12T23:15:05Z
SE-CWAHL-WIN 586dc89b-0328-4a9b-a49a-98aa9680e77b 2017-10-12T19:12:18Z
SE-CWAHL-WIN 53ce3a38-4c7a-4ada-a183-fb353ec96885 2017-10-12T15:11:19Z
<snip>
SE-CWAHL-WIN dfdfe0a1-a95f-45c3-a9d9-280b138edda8 2017-09-16T20:14:26Z
SE-CWAHL-WIN e2eacfce-f477-4958-befe-a4e714d1e0dc 2017-09-15T20:09:08Z
SE-CWAHL-WIN a80ff357-036f-44b9-897a-2f433efdb110 2017-09-14T20:04:53Z
SE-CWAHL-WIN 96677406-cd39-45ab-8f42-191d2e24cb89 2017-09-13T19:59:26Z
SE-CWAHL-WIN f288f3de-c046-475e-b51b-9c391ee31600 2017-09-11T23:49:12Z 6
SE-CWAHL-WIN 0e6a8789-0f81-4c00-b697-cca94630bdb1 2017-08-21T21:52:17Z 2
SE-CWAHL-WIN 84be401b-5a6d-48da-a2c1-95e2ad6ba616 2017-07-22T22:54:22Z 2
<snip>

I’ll pluck out a few different id values and store them to an array named $SnapshotID. Specifically, the most recent snapshot along with the 20th and 40th snapshots.

# Pick out the snapshot id values
[Array]$SnapshotID = 0, 20, 40 | foreach {$Snapshot[$_].id}
3576c17f-3573-4470-aac0-904c3245fe74
474f0c81-27d3-4fe4-81f9-7685b3ec6ad4
a499949c-2948-4cf3-82a5-fd5794d5c8ab

I can now send a request to Rubrik asking for a Live Mount for all of these snapshots.

Creating Live Mounts

Now that I have a list of id values, I’ll go ahead and request that all of the Live Mounts be created in parallel and disable the confirmation dialogue.

# Perform a live mount of the selected snapshot ids in parallel
$SnapshotID | foreach {New-RubrikMount -id $_ -Confirm:$false}

This results in a series of requests being sent asynchronously to the Rubrik CDM software.

progress  :
status    : QUEUED
startTime : 2017-10-13T19:12:28Z
id        : MOUNT_SNAPSHOT_40e54ede-31d5-45e6-9265-473917851421_32fae4ff-afae-497b-adbd-7bc96990bbdd:::0
links     : @{rel=self; href=https://172.17.28.15/api/v1/vmware/vm/request/MOUNT_SNAPSHOT_40e54ede-31d5-45e6-9265-473917851421_32fae4ff-afae-497b-adbd-7bc96990bbdd:::0}
progress  :
status    : QUEUED
startTime : 2017-10-13T19:12:29Z
id        : MOUNT_SNAPSHOT_2d40df31-b3b1-409b-9931-9a3772598c7a_93ab7d65-ca96-4605-af98-855e81460f29:::0
links     : @{rel=self; href=https://172.17.28.15/api/v1/vmware/vm/request/MOUNT_SNAPSHOT_2d40df31-b3b1-409b-9931-9a3772598c7a_93ab7d65-ca96-4605-af98-855e81460f29:::0}
progress  :
status    : QUEUED
startTime : 2017-10-13T19:12:30Z
id        : MOUNT_SNAPSHOT_811272f6-712d-4ccc-8695-132297c97fa7_e054434e-d0c0-4380-883c-b96362d877a1:::0
links     : @{rel=self; href=https://172.17.28.15/api/v1/vmware/vm/request/MOUNT_SNAPSHOT_811272f6-712d-4ccc-8695-132297c97fa7_e054434e-d0c0-4380-883c-b96362d877a1:::0}

Live Mount Validation

For fun, the results can be viewed in the Live Mounts section of the GUI. It’s worth noting that once the status transitions to “Mounting…” it means that the immutable image of the backup has been exposed by Rubrik via NFS to the target ESXi host(s). Since I didn’t specify a host, Rubrik chooses one on my behalf.

The remaining time is spent waiting for the vSphere environment to add the virtual machine to inventory. The Live Mounts complete at roughly the same time, which is groovy considering I’m using snapshots from today, a week ago, and a month ago. The net result is that Rubrik’s Atlas file system isn’t impacted by how long it has been since a backup was taken. Otherwise, it would be a poor design. 🙂

The vSphere Perspective on Live Mounts

For further verification, the vSphere HTML5 interface shows the original virtual machine along with 3 Live Mounts.

Let’s pick on the oldest snapshot from September. I requested this particular Live Mount at 2017-10-13T19:12:30Z (GMT) which is 12:12:30 Pacific (my local time). Here’s the request reply to showcase the start time.

progress  :
status    : QUEUED
startTime : 2017-10-13T19:12:30Z
id        : MOUNT_SNAPSHOT_811272f6-712d-4ccc-8695-132297c97fa7_e054434e-d0c0-4380-883c-b96362d877a1:::0
links     : @{rel=self; href=https://172.17.28.15/api/v1/vmware/vm/request/MOUNT_SNAPSHOT_811272f6-712d-4ccc-8695-132297c97fa7_e054434e-d0c0-4380-883c-b96362d877a1:::0}

The bottom of the vSphere log shows the “Creating VM on host” event at 12:12:31. This is one second later. One second!

The remaining 8 seconds is spent adding the virtual machine to inventory. Considering that this particular snapshot was taken 40 backups ago, I don’t see how we could make it any faster than one second.

Thoughts

The ability to bring back data and applications in a quick manner should be of paramount importance to anyone responsible for service deliver within an organization. This is especially worth digging into when considering the amount of data that is being ingested and the technology a vendor leverages to maintain that data, such as legacy snapshot “chains” that require rebuilding the master image versus an intelligent and content-aware file system designed to make data available both globally and instantly.

While this post covers Live Mounts, which are a form of clone that is built using backup data to solve what-if use cases, there’s another feature named Instant Recovery that is targeted as recovering in the face of failure. The main difference is that Live Mount presents a clone of the original workload with the network disconnected. Instant Recovery puts the workload back into its original working order – same vSphere and network personality, active network connection, and so forth – with an optional Storage vMotion towards the end to place the workload back onto the production storage array. Both options are based on the same underlying technology when it comes to making snapshot data available in a quick and efficient manner.

Enjoy!