Terraform Upgrade to 0.13 in Production

Hey folks! ? I recently took the time to upgrade all of my production Terraform code to work with version 0.13, released a few months back. The effort required was much less than expected, largely thanks to using DRY, modular code that is applied using continuous integration within GitLab. However, there were numerous “ah ha!” moments encountered. I’m always learning how to write better infrastructure as code but am far from perfect!

In this post, I cover the key milestones and takeaways encountered during the upgrade process to Terraform version 0.13. Keep in mind that this is just my experience and opinions on the matter. I start with remote state, the 0.13upgrade command, tool versions, and dependency mapping. From there, I finish with keeping the code stateless, the migration of production, and a few bonus items. Let’s get to it!

Addressing Remote State

The first (and biggest) chunk of effort came from completely redoing the backend remote state configuration across all projects. I was originally storing state remotely within the GitLab project itself, along with a few projects using Amazon S3 with DynamoDB. This works fine? But it comes with a load of caveats and “extra things you have to know” that were a hassle to deal with across a team.

Below is the old method I was using to handle state. Not fun. ?

terraform init -reconfigure -backend-config="address=${GITLAB_TF_ADDRESS}" -backend-config="lock_address=${GITLAB_TF_ADDRESS}/lock" -backend-config="unlock_address=${GITLAB_TF_ADDRESS}/lock" -backend-config="username=${GITLAB_USER_LOGIN}" -backend-config="password=${GITLAB_TF_PASSWORD}" -backend-config="lock_method=POST" -backend-config="unlock_method=DELETE" -backend-config="retry_wait_min=5"

In an effort to keep things simple, I migrated all state files to Terraform Cloud. I talk about setting up each workspace in the Using Terraform to Manage Git Repositories blog post. Instead of passing along a plethora of backend-config values during CI operations, I just plopped this bit of code into each project:

Terraform Cloud is great for remote state!

In an nutshell, each Terraform project was given a unique workspace and then state is migrated over. I then committed the remote state changes to the project.

Merge, my friend, merge!

This vastly improved my user experience while dealing with state files. This is ultra important – once a state file is upgraded to version 0.13, that’s it! The only “clean” way to go back is to revert the version, which may cause all sorts of other headaches. If you’re new to Terraform state, here is a quick 10 minute video I made to cover the basics. ?

Using the Terraform 0.13 Upgrade Command

Terraform comes with a 0.13upgrade command to help with upgrading code. This aids in avoiding deprecations and caveats while supporting new features and requirements. I make it a habit to use this command in a clean working git branch to easily spot any differences. Make sure to read the upgrade guide!

For example, providers received a fairly significant change in version 0.13. The required_providers code block now requires a path to the source code whereas before it was assumed that all providers came from the Terraform registry or a local source. Using git diff HEAD on the branch (or any other diff tool) reveals the schema change after executing an upgrade.

git diff HEAD
diff --git a/code/tf-cluster-asg/provider.tf b/code/tf-cluster-asg/provider.tf
index 22f1520..00624db 100644
--- a/code/tf-cluster-asg/provider.tf
+++ b/code/tf-cluster-asg/provider.tf
@@ -1,6 +1,9 @@
 terraform {
   required_providers {
-    aws = "> 2.66"
+    aws = {
+      source  = "hashicorp/aws"
+      version = "> 2.66"
+    }
   }
 }

Additionally, the upgrade command creates a versions.tf file to specify the Terraform CLI version required if none exists previously. I thought this was handy because I had been controlling the required version within GitLab CI.

terraform {
  required_version = ">= 0.13"
}

Having an explicit call out helps make the code more portable across CI tools and increases clarity on my desired version to the team.

Terraform CLI Upgrade and Versions

For my own workstation tooling purposes, I keep copies of Terraform 0.12 and 0.13 releases inside of %APPDATA%\bin. This is my “toolbox” of sorts and allows me to reference a single entry in the Windows environmental PATH system variable once for multiple tools. I store Terraform, Helm, Kubectl, and other goodies in there.

> $env:Path.Split(';') | findstr "Roaming\bin"
C:\Users\chris\AppData\Roaming\bin\

Any time Terraform CLI is downloaded it appears as “terraform.exe”. I rename each copy with the version number, e.g. “terraform1229.exe” for version 0.12.29. Here’s a look at what’s stored on my dev box right now:

I’m a hoarder

This makes it easy to swap between versions to prototype on version 0.13 while still being able to manage code on version 0.12. I found this helpful when combing through numerous projects as the migration progressed. To improve my user experience, profile aliases are used to expose desired Terraform versions.

New-Alias -Name "tf" -Value "terraform1304.exe"
New-Alias -Name "tf12" -Value "terraform1229.exe"

I’ve made it a habit to refer to the latest versions as simply tf and tf12 to save on keystrokes. ?

Fixing Dependencies

This felt like a good time to review all of my provider and module dependencies and “clean up” any bad habits. The first place I put eyeballs on was the provider requirements expressed in both parent and child modules. Considering that I host my own private module repository, it didn’t make a whole lot of sense to express provider versions in both the module repository and parent (calling) code.

Instead, I moved all provider version requirements into the parent code to gain tighter control over the version used. Couple this with branch pinning to find specific iterations of the child module code and I have a fairly simple method for ensuring the right version of code is used in the right places and clouds.

I also reviewed exactly which downstream providers I was consuming and how they were being pinned. For example, much of my older AWS code was configured with pessimistic conditional operators bound to 2.66, a fairly old version:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 2.66"
    }
  }
}

This seemed like a great time to try out new major versions and to start tracking which providers are being used for which code bases. I’ve begun using a staggered release cadence in which dev uses something a bit more akin to bleeding edge to ensure that prod will not self-destruct during the next apply. This was again aided by using specific private modules with branch pinning configured along with a “blue / green” style deployment for pretty much everything that goes into AWS, Azure, and GCP.

Stateless Avoids Headaches

In a small number of cases, I found that a .terraform folder was hiding away in the source git repositories. This is no good! In the example below, the modules.json file had snuck in and was being updated from my local workstation. ?

Oops

I try hard to avoid carrying the “baggage” of anything persisting in the .terraform folder. Instead, I prefer to let GitLab CI load all of the needed providers, modules, and backend details at runtime upon a clean slate.

The culprit was a misconfigured .gitignore file. I think this is something I wrote a long time ago, before I really understand what happens in the .terraform folder with CI, and was carried over via copy / paste operations. Oops!

Changing the filter from **/.terraform/plugins/** to **/.terraform/** did the trick.

Set filters to maximum power

With a proper filter in place, I just needed to go back and hunt down any stragglers and remove them from the clean working git branch.

Migration of Production

Once I had addressed state, dependencies, .gitignore filters, and so forth, it was time to start testing the fruits of my labor. I prefer to select a region in each cloud and do a full deployment of everything defined in code.

Because GitLab CI is doing the work, the process is repeatable: from standing up identity / roles and infrastructure to pushing out applications and network services, the entire flow is automated. The only difference is the region, which has a few caveats (such as available features).

For example, my AWS infrastructure code, lovingly named “site-deploy”, was first tested in us-west-1 as an empty region that I only use for testing. I started with the old version 0.12 code and then updated it to the newer version 0.13 code. Once the pipeline was fully green, I was able to discern that the process worked and could be repeated in other regions with a high level of confidence.

My commit messages are lame

Pushing to production is always a little scary when performing an upgrade. To abate this fear, I made sure to perform a few “terraform plan” dry runs locally to ensure that there were no hidden surprises. Each production project was updated in about 15 minutes without anything eventful happening.

The Bonus Round

I almost forgot! Terraform version 0.13 has a few nice bonus features that I snagged for a few projects. Mainly, the ability to use iterators such as count on child modules directly from the parent module. This is handy for pushing multiple copies of an instance, such as the example application below, with a single line of code.

Count is super handy

In addition to count, there is also support for depends_on and for_each. Snazzy!

Overall, I tackled the upgrade of 27 projects from Terraform version 0.12 to 0.13 in the space of about 8 hours. It felt good to address a bunch of nagging inconsistencies and newbie design decisions that I had made in the past.

I hope your upgrade goes super smoothly and you get to enjoy all that the new version has in store! ?

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about Infrastructure as Code, or other modern technology approaches, head over to the Guided Learning page.