Performing a vCenter Database Failover to a Secondary SQL Server
Now that I’ve covered how to configure SQL Log Shipping for the vCenter database, it’s time to go into the critical segment on performing a failover. After all, a backup is worthless data if it cannot be restored, right? Fortunately, the failover is a relatively simple process that just requires some up front configuration and planning. The actual failover part is reasonably painless.
This post will go over the environment covered in my past Log Shipping post in a failover scenario where the primary database has become unavailable due to a disaster. To simulate the disaster, I am simply taking the primary database offline instead of nuking the entire primary SQL server (mostly because it’s also hosting a number of my other services)
Primary SQL Server Failure!
Let’s just pretend that someone (who will remain nameless) has accidentally dropped some tables from your precious vCenter database, and didn’t use a transaction for the query. Woops! The database is now unusable and it’s time to switch over to the secondary while you figure out a maintenance window to cut back over to the primary.
Sometimes bad things just happen
Before we begin the failover, let’s do a few checks to ensure that it will be successful. It would be extremely wise to have these items done in advance, but never hurts to double check before cutting to the secondary.
Check #1 – VMware VirtualCenter Server Service Account
If you’re using a domain account to authenticate to SQL, chances are you are running the VMware VirtualCenter Server service as that account as well. That account needs ownership of the vCenter database or the service will fail to start. Make sure that the domain service account has ownership of the secondary database by checking the “db_owner” role in the Database Roles folder. In my case, the service account is “svc_vcenter“.
Check #2 – Test Query
I also like to run a top 1000 query on a table just to make sure it has data in it. This is completely optional, but I find it valuable before a failover to be 100% confident that I’m using a valid database. To run this query, expand the Tables folder within the database, right click a table (I used dbo.VPX_VM here as it lists all the VMs), and select “Select Top 1000 Rows”.
The Failover Process
Once you’ve made the decision to failover to the secondary server, you have a number of tasks to perform.
Disable Log Shipping Jobs on the Secondary (Failover) Server
The first thing to do is turn off any Log Shipping jobs that are occurring. At best, they are going to simply fail, but at worst they could actually restore something you don’t want into the database. To disable any Log Shipping jobs, simply expand the SQL Agent section, right click on the jobs, and choose Disable.
Stop the VMware VirtualCenter Server service
Stopping this service will also prompt to stop other dependent services, which is OK. This will force vCenter to halt, letting you making a key change to the ODBC connection.
This step can take a while as the vCenter service takes a bit of time to stop (usually 2 minutes in my lab).
Edit the vCenter ODBC System DSN
Open up the ODBC Data Source Administrator (x64 bit) so that you can edit the setting on the vCenter System DSN. If you used the same name on your secondary server, the only thing you have to change is the SQL server name. Below, I changed the Server name from sql.glacier.local (primary) to sql2.glacier.local (secondary). If you chose a different database name, you’ll need to go further into the settings and choose the DB from the drop down menu.
Once you hit finish, a summary screen appears. I advise using the “Test Data Source” button to ensure connectivity.
Start the VMware VirtualCenter Server services
The final step is to start all of the services that were stopped previously. The order I use is:
- VMware VirtualCenter Server
- VMware VirtualCenter Management Webservices
- VMware vSphere Profile-Driven Storage Service
- vCenter Inventory Service
You have now performed a failover to your secondary server.
This guy never sweats a Sev-1 call
I had initially thought of doing a DNS A record pointing to the secondary server using the primary’s name, but realized that would not work if the SQL server were not completely down (as in this case). So, I tried a host file on the vCenter server, but that seemed to make SQL angry (as it seems to be aware that you are addressing it from an incorrect name). So, I went with the minor DSN server change trick instead, which worked every time I simulated a failover.
Have you worked with a SQL failover for vCenter before? Do you have any thoughts on ways to improve this process? Care to share any battle stories of a similar nature? Comments welcome.