Using Private Git Repositories as Terraform Modules

The Terraform Registry hosts thousands of self-contained packages called modules. These modules leverage popular providers from Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and several others. Each module reduces time spent on delivering cloud resources by allowing consumers to provide a handful of inputs with minimal coding efforts. However, there are situations requiring private, custom crafted modules. For whatever reasons, these custom modules are not suitable for public distribution. Fortunately, it is entirely possible to use private git repositories for Terraform modules.

Time to wrangle some private git repositories!

In this post, I start with an overview of Terraform module sources and the various methods for supplying git credentials. From there, I dive into dynamic git configuration, referencing modules in sub-directories, and pinning to specific repository versions or branches. Finally, I showcase how to setup continuous integration using a protected environmental variable.

Understanding Sources of Terraform Modules

Every module declared in a Terraform configuration must come from a source. In the post Terraform Plans, Modules, and Remote State, I use local modules in the root configuration. Hence, the source field is given a path to the module folder as shown below:

module "local-module" {
  source = "../local-module"
<snip>
}

It is also possible to use modules from the Terraform Registry. Google’s network module, for example, is sourced as follows:

module "network" {
  source  = "terraform-google-modules/network/google"
<snip>
}

A generic git repository is yet another viable module source and the topic of this post. By feeding in the HTTPS or SSH clone path, Terraform understands where to locate the module code. I have configured an example repository in my GitLab environment named site-deploy. Terraform is able to checkout the module code when using the prefix git:: followed by the repository’s clone path as shown below:

module "site-deploy" {
  source = "git::https://gitlab.com/rubrik-octo/lab/site-deploy.git"
}

If the repository is public, no further action is required. However, private repositories will fail to load when running terraform init without supplying credentials. This makes sense – the repository is private, after all.

Supplying Git Credentials

Git supports a handful of methods for requesting and consuming credentials. Each method has benefits and drawbacks. I’m using HTTPS with the OAuth 2.0 authorization framework for my GitLab environment. It is simple to implement, uses a protected and masked token value, and can be easily automated via continuous integration (CI).

In my scenario, a helper account protected with two-factor authentication (2FA) is used to access the private repository. I use the helper account as a “bot” user to perform various housekeeping and CI activities while maintaining greater isolation from my user account. The use of 2FA for the bot means that I leverage a frequently rotated token for programmatic access. Every git hosting service handles tokens a little bit different from one another; GitLab provides details on how to create a personal access token in their documentation.

Token in hand, I now need to dynamically perform a URL substitution that properly provides the token value.

Dynamic Git Configuration

Git needs to know when and where to use the token when checking out code from a private repository. I do not want to provide the token information in the Terraform configuration – that would be a terrible security practice. Instead, I want git to automatically detect when Terraform modules are being loaded from a private repository and insert the token for the duration of the session.

The solution is to use git’s insteadOf option, as shown below:

git config --global url."https://oauth2:[email protected]".insteadOf https://gitlab.com

This command adds two lines to the .gitconfig file. The resulting configuration is as follows:

cat ~/.gitconfig

<snip>
[url "https://oauth2:[email protected]"]
	insteadOf = https://gitlab.com

Git will dynamically insert oauth2:[email protected] into the https://gitlab.com URL. The token authenticates the client session, allows the code to be checked out, and returns the configuration to Terraform.

> terraform init

Initializing modules...
Downloading git::https://gitlab.com/rubrik-octo/lab/site-deploy.git for site-deploy...

If the token is invalid, an access denied error will terminate the initialization process.

Cloning into '.terraform/modules/site-deploy'...
remote: HTTP Basic: Access denied
fatal: Authentication failed for
'https://gitlab.com/rubrik-octo/lab/site-deploy.git/'

Easy enough! But, what if I need to be more specific in the module’s location within the repository?

Referencing Modules in Sub-Directories

The previous git repository hosts a single module. Git checks out the entire repository and returns the contents to Terraform as a module. However, it is also possible to store multiple modules in a single git repository. This is known as a “monorepo.”

In this new scenario, I have a single git repository named source-modules hosting multiple modules inside various folders. I specifically want the module named transit-gateway that is saved inside a folder named site-deploy. By using a // at the end of the source location, I can instruct Terraform to checkout a specific folder to satisfy my requirements.

module "transit-gateway" {
  source = "git::https://gitlab.com/rubrik-octo/lab/source-modules.git//site-deploy/transit-gateway"
}

I prefer this model is most situations. It results in fewer repositories but requires increased collaboration and security controls over the source code. However, more can be done to improve upon this design.

Pinning to a Specific Version or Branch

As described in Dependency Pinning with Infrastructure as Code, I make a habit of pinning dependencies to avoid breaking changes. This design pattern holds true for modules in private git repositories, too.

For git hosted repositories, this means using a protected, non-default branch or a tag version when loading a module. The ref query parameter is passed to git checkout for selecting a specific branch or tag version.

# Branch example - grab the "production" branch
module "transit-gateway" {
  source = "git::https://gitlab.com/rubrik-octo/lab/source-modules.git//site-deploy/transit-gateway?ref=production"
}

# Tag example - grab the code tagged with version 1.0.0
module "transit-gateway" {
  source = "git::https://gitlab.com/rubrik-octo/lab/source-modules.git//site-deploy/transit-gateway?ref=tags/v1.0.0"
}

Pinning the module reduces the chance of unknowingly ingesting a breaking change. The fine folks at tflint agree.

Adding Continuous Integration

The final step is to load the token into CI and use an environmental variable to dynamically configure git when a workflow is triggered. As mentioned earlier in this post, I use a frequently rotated token to authenticate CI activities on behalf of my bot user. This token is loaded into the runner performing workflow jobs as an environmental variable named GITLAB_TOKEN. Each time the runner is launched, the code below is executed:

git config --global url."https://oauth2:${GITLAB_TOKEN}@gitlab.com".insteadOf 

The environmental variable is protected and masked, meaning the value of the token is not displayed or stored in the logs. The runner is able to authenticate to other private git repositories and checkout the desired Terraform modules. The runner is terminated upon completion which destroys the session and token.

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about Infrastructure as Code, or other modern technology approaches, head over to the Guided Learning page.

If there’s anything I missed, please reach out to me on Twitter. Cheers! 🙂