Using Terraform to Manage Git Repositories

Storing code and other artifacts into a repository backed by a version control system (VCS) is a fairly well understood and agreed upon technique. However, many organizations I work with are still creating and managing repositories by hand or with one-off scripts. While this method does work, and is often good for beginners to get a grasp on the fundamentals, there are tough challenges with scale, consistency, security, and cleanup to surmount. It is far more powerful to use code to deploy, secure, and manage repositories.

In this post, I start with a design overview on how to piece together a repository manager to build and maintain multiple repositories. From there, I switch into “guide” mode and dive into initial setup, using a personal access token (PAT), securing the root repository, and generating a new repository with Terraform. Finally, I showcase how to import an existing repository into Terraform, address any drift concerns, and drop a few handy tips to consider for the future.

Design Overview

I use Terraform to declaratively build all of my repositories across GitHub, GitLab, and Bitbucket. Each service is used for different organizations (work, personal, community) and for different use cases (internal code, external code, examples). My Terraform code is stored in a repository called the Repository Manager as shown below:

The Repository Manager configuration describes how each production repository should be created, including name, labels / tags, members, teams, readme details, licensing, CI settings, visibility, and more. Specific configuration settings vary depending on the provider used, such as GitHub accepting a template repository to create new repositories.

Colleagues are able to submit a pull request against the Repository Manager configuration to meet specific requirements such as creating, modifying, or archiving a repository. Pull requests are subject to policy, linting, and validation jobs by way of continuous integration (CI).

This system works well for distributed teams managing numerous repositories. It is especially handy when dealing with a variety of hosted and internal services. It allows everyone to focus on writing code instead of worrying over the operational toil of managing repositories.

Initial Setup

There are a few components necessary to begin setup:

  1. An account with the desired VCS. I will use GitHub in this example.
  2. A personal access token (PAT) for the aforementioned account. The documentation from GitLab and GitHub do a nice job with explaining this step.
  3. A local copy of Terraform CLI.

Setup the root organization and the Repository Manager repository by hand. This avoids circular dependencies and gives the code a place to live during development.

Clone the repository locally.

> git clone [email protected]:WahlNetwork/repository-manager.git

Cloning into 'repository-manager'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (3/3), done.

Setup the usual Terraform suspects for a new project: variables, providers, versions, .gitignore, and so forth. Initialize the Terraform configuration with terraform init. Add the changes, cut a new commit, and push back to the repository as shown with this commit.

Using the Personal Access Token (PAT)

Each provider will require the PAT for authentication. In the case of GitHub, the token is passed in the provider section. I advise using a Terraform variable and passing the token value as an environmental variable or tfvars file while working through this guide.

provider "github" {
  organization = "wahlnetwork"
  token        = var.github_token
}

If the token is not defined, Terraform will request the value during execution.

> terraform plan

var.github_token
  Personal access tokens (PATs) for authentication to GitHub.

  Enter a value: 12345 (I've got the same combination on my luggage!)

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage.

Secure the Repository

Take a moment to lock down the Repository Manager repository:

  1. Protect the main branch. No one should be able to push code without a review and approval by a maintainer or owner.
  2. Require at least one review for pull requests.
  3. Set the repository visibility to private. However, my repository is public to use as an example to this post.

The repository is now clean and is ready to churn out new repositories. Inception? 🙂

Generate a New Repository with Terraform

I can now add new repositories using a small bit of Terraform configuration. This can be daunting at first; GitHub accepts a large quantity of arguments. However, many arguments are not required and have acceptable defaults.

The creation of a new repository named demo-1 is performed using the code below:

resource "github_repository" "demo-1" {
  name             = "demo-1"
  description      = "A demo GitHub repository created by Terraform"
  private          = false
  homepage_url     = "https://wahlnetwork.com/"
  has_projects     = false
  has_wiki         = false
  has_downloads    = false
  has_issues       = true
  license_template = "mit"
  topics           = ["example", "public", "infrastructure-as-code", "operations", "terraform", "github"]
}

Run terraform plan -out plan.tfplan to validate the configuration meets expectations. Add or modify any arguments that need adjustment and repeat as necessary.

Terraform will perform the following actions:

  # github_repository.demo-1 will be created 
  + resource "github_repository" "demo-1" {  
      + allow_merge_commit     = true
      + allow_rebase_merge     = true
      + allow_squash_merge     = true
      + archived               = false
      + default_branch         = (known after apply)
      + delete_branch_on_merge = false
      + description            = "A demo GitHub repository created by Terraform"
      + etag                   = (known after apply)
      + full_name              = (known after apply)
      + git_clone_url          = (known after apply)
      + has_downloads          = false
      + has_issues             = true
      + has_projects           = false
      + has_wiki               = false
      + homepage_url           = "https://wahlnetwork.com/"
      + html_url               = (known after apply)
      + http_clone_url         = (known after apply)
      + id                     = (known after apply)
      + license_template       = "mit"
      + name                   = "demo-1"
      + node_id                = (known after apply)
      + private                = false
      + ssh_clone_url          = (known after apply)
      + svn_url                = (known after apply)
      + topics                 = [
          + "example",
          + "github",
          + "infrastructure-as-code",
          + "operations",
          + "public",
          + "terraform",
        ]
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Once the configuration looks solid, run terraform apply plan.tfplan to bring the repository to life. Note that the plan.tfplan file contains an encoded version of the token value and should be kept private.

> terraform apply plan.tfplan

github_repository.demo-1: Creating…
github_repository.demo-1: Creation complete after 10s [id=demo-1]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Check out the new repository and bask in the glory of automation.

This is great for new repositories. But, what about existing ones?

Import Existing Repositories

It’s rare to find a truly greenfield environment without any existing repositories. Terraform’s import command is great for adding the existing repositories into management. I have manually created a repository named demo-2 that will be imported into Terraform using the steps below.

Define the Repository in Terraform

I’ve started by defining the repository in the Terraform configuration using the demo-1 example from earlier. The name and description have been updated to provide the correct information for demo-2.

resource "github_repository" "demo-2" {
  name             = "demo-2"
  description      = "A demo GitHub repository created by hand and imported into Terraform"
  private          = false
  homepage_url     = "https://wahlnetwork.com/"
  has_projects     = false
  has_wiki         = false
  has_downloads    = false
  has_issues       = true
  license_template = "mit"
  topics           = ["example", "public", "infrastructure-as-code", "operations", "terraform", "github"]
}

Import the Repository and Address Parameter Drift

The next step is to import the repository into Terraform. This is also detailed in the GitHub provider documentation. GitHub only requires the name of the repository – easy!

> terraform import github_repository.demo-2 demo-2

github_repository.demo-2: Importing from ID "demo-2"...
github_repository.demo-2: Import prepared!
  Prepared github_repository for import
github_repository.demo-2: Refreshing state... [id=demo-2]

Import successful!

The resources that were imported are shown above. These resources are now in your Terraform state and will henceforth be managed by Terraform.

It is now a good idea to run another terraform plan to see if any parameters have drifted from the current configuration. In my example, the license_template parameter would require the destruction of the repository.

This is not desired! I already have a license file in this repository. Thus, I will comment out the license_template parameter to avoid harm and re-run the plan command. It now appears that only a few modifications will be performed to meet the defined configuration, which is desired. Filtering for the “~” (tilde) character is a shortcut to seeing changes.

> terraform plan | findstr "~"

~ update in-place
  ~ resource "github_repository" "demo-2" {
      ~ has_downloads          = true -> false
      ~ has_projects           = true -> false
      ~ has_wiki               = true -> false
      ~ topics                 = [
          + "example",
          + "github",
          + "infrastructure-as-code",
          + "operations",
          + "public",
          + "terraform",
        ]

Plan: 0 to add, 1 to change, 0 to destroy.

That’s better! I then terraform apply the changes. The demo-2 repository is now fully under Terraform’s management.

Additional Tips

This guide should be enough to get the creative juices flowing to meet the requirements of a specific use case. However, there is so much more that can be done, including:

  • Use the native CI capabilities of GitHub or GitLab to lint, test, and validate pull requests based on your team’s standards and policies.
  • Deploy a service account or bot user to perform the Terraform work instead of using your own PAT.
  • Store the PAT as a secret in VCS instead of using an environmental variable or tfvars file.
  • Add a cron job to CI to check, and potentially remediate, configuration drift.
  • Add the prevent_destroy meta-argument to ensure that Terraform is not capable of destroying defined resources. Alternatively, limit the permissions bound to the PAT to exclude destroying resources.
  • Use remote state for the Terraform configuration, such as with Terraform Cloud, instead of a local state file. Yes, there is a provider for this. 🙂
  • Split the Terraform configuration files into small chunks, such as main.tf to pull data and define and use-case.tf for a specific project or use case.

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about Infrastructure as Code, or other modern technology approaches, head over to the Guided Learning page.

If there’s anything I missed, please reach out to me on Twitter. Cheers! 🙂