There’s something super fun about building, configuring, and deploying systems that run in production and perform useful tasks. The fun factor is taken up a notch when collaborating with peers. Unique perspectives and opinions bring better design elements to the table. However, constructing solutions as a team requires the introduction of collaborative operations, which I’ve talked about in the past.
Continuous Integration (CI) is a powerful practice that is fantastic at absorbing the ideas and proposals of a distributed team. CI provides a clear line of sight into the current and desired state of production resources while providing a programmatic and repeatable approach to making changes. More importantly, CI removes much of the toxic garbage that is found in the traditional Change Approval Board (CAB) and ITIL monoliths.
The example below provides a high-level perspective on how Continuous Integration works:
- In the first column, the team has stored code that builds and manages the current production resources using a tool such as Terraform and a Version Control System (VCS) such as GitLab.
- In the second column, a member of the team is proposing a change. They want to add Resource C to the production environment without making any modifications to Resource A or Resource B.
- The third column focuses on the value of Continuous Integration. The CI tool reviews the proposed code change by running through a set of pre-defined Jobs. Examples include testing, linting, validating, auditing, requesting approval, or checking against a defined style guide.
- The final column shows the results when one or more team members have agreed that the change is necessary, safe, and desired. The CI tool merges the proposed code change into the VCS and triggers the construction of Resource C into production.
What a wonderful way to work with peers and make changes into production!
In this post, I’ll dig into many of the concepts used in Continuous Integration with a focus on an Infrastructure as Code use case. This includes Workflows, Stages, Jobs, Runners, Artifacts, Caching, and Secrets management. By the end of this post, you will have a solid grasp of the major components that comprise CI. Additionally, check out my Getting Started with Continuous Integration 4K video on YouTube! ?
The diagram shown earlier has a column for Continuous Integration. In this column, I list numerous Jobs that are performed against the proposed code change. This “list of Jobs” is known as a Workflow or Pipeline, depending on the tool.
A Workflow is a file describing everything that a CI tool must know to perform its role. The format is typically YAML with a specific structure and operational syntax required by the CI tool provider. I commonly write Workflow metadata and user-friendly comments at the start of the file and provide more granular instructions further along.
I have crafted a Workflow diagram and inserted the metadata section in the image below. This diagram will evolve as the post progresses.
Workflow files include, but are not limited to:
- When the Workflow should execute, known as a Trigger.
- What Variables and Secrets to use, and where to pull them from.
- What Stages to progress through, and in what order.
- How to run Jobs to test, lint, validate, or audit the code changes.
- What to do in the event of a failure.
- What compute provider, such as a container or virtual machine, will run the defined Jobs.
Workflow files are often stored in the same location as the source code. For example:
- GitHub Actions – A YAML file stored in
.github/workflowsusing any name you wish.
- GitLab – A YAML file named
.gitlab-ci.ymlstored in the root directory.
- CircleCI – A YAML file stored in
The source code is stored in a Version Control System, meaning the Workflow file is versioned along with the source code. Nice! 🙂
Most providers have a template repository showing how to create a custom Workflow to meet your needs. After that, I spend effort building and perfecting the Workflow as my source code progresses and changes.
With that out of the way, I’ll cover the guts of a Workflow in the next sections.
Some vendors provide the ability to group Jobs into logical units called Stages or Milestones. CI tools execute Jobs within a Stage in parallel, which is handy for improving the performance of a Workflow.
In the example below, I’m writing a sample Terraform Workflow. I’m telling the CI tool to group Jobs into the Validate, Plan, and Apply Stages. Each Stage can have a Trigger, such as only running the Apply Stage after merging code into the main VCS branch.
Each Stage has a unique pass or fail criteria that progress or halt the Workflow. For example, if one Job within the Validate Stage fails, the Workflow terminates with an error and returns information about the failure for remediation. Stages do not perform work per se, but instead, help the Workflow understand how work should be performed.
A Job captures the atomic unit of work, which I cover next.
Jobs tell the Workflow to “do something” by providing all of the instructions necessary to execute a script, command, or other sorts of actions. Jobs go by numerous names across CI tools: steps, scripts, tasks, and so forth.
Returning to the earlier Terraform example, I’ve populated example Jobs into each Stage. The CI tool reads the Workflow, determines the order of the Stages, and executes the Jobs when the correct Trigger is invoked. Optionally, each Job can contain Trigger conditions, depending on the CI tool used. There is a ton of flexibility!
In the example below, changes to a Terraform configuration undergo numerous validation, linting, auditing, and planning Jobs before the plan is shared with the team for review and approval.
I use different types of Trigger conditions for a Workflow such as this:
- Validation: Triggered any time a new feature branch appears.
- Plan: Triggered any time a pull request is submitted against the main branch.
- Apply: Triggered any time code is committed to the main branch, which is only possible via a pull request when using branch protection and proper security controls.
This completes what I want to share concerning Workflow files. It’s hard to remain all that excited about YAML files! 🙂 Next, I’ll go into how to put these concepts into action.
Continuous Integration in Action
Armed with a properly linted YAML file, I’ll address the concepts focused on actively operating Continuous Integration. This section will cover Runners, Artifacts, Cache, and Secrets.
Runners are compute resources that perform work on behalf of the CI tool. Container images deployed as Kubernetes pods or stand-alone containers are common candidates to be Runners. Some CI tools refer to Runners as Executors or Instances.
A Runner checks out (downloads) all of the proposed code and other defined dependencies before running the commands contained in the Job. Runners are ephemeral and often sourced from useful locations, such as the Docker Hub, which is a major advantage over legacy build servers.
For example, I frequently use the HashiCorp Terraform Light build for production Jobs centered around Terraform. I override this value when testing the 0.13 beta 2 release by switching to a different container image, as shown below:
Many CI providers offer shared Runners in addition to allowing private and self-hosted Runners. However, you may wonder how to save data or outputs from an ephemeral Runner. The answer to that is Artifacts.
Any sort of object or “by-product” generated and retained as part of the Workflow is an Artifact. When building a Terraform Workflow, I save the plan file as an Artifact and share the summarized results as a comment in the pull request. Others use Artifacts to build a software release or publish a build.
I also like to save the Runner’s log file as an Artifact for 7 days. This is helpful when troubleshooting without piling up too much data, but overkill for passing state information between Runners.
Caching saves data for later consumption and is a valuable tool for sharing state. Next stop, Caching!
Runners are ephemeral container images that launch, perform some set of tasks, and disappear. This poses a problem if the Workflow contains a downstream Job that depends on the results of an earlier Job. Caching solves this problem.
CI vendors implement Caching in different ways. The result is the same – as one Job completes, the required state from that Job’s Runner is saved and passed along to another Job’s Runner. Consider this a collaborative relay race with the Cache acting as the baton.
In the example below, I have captured the Terraform plan file and stored it into the CI Cache. Once a team member approves the plan, a Runner loads the Cache and provides it to the Job responsible for applying the plan.
I typically Cache information on plugins, plans, modules, and any other changes in the working directory. Each Runner is loaded with a consistent state across Jobs using this method.
Because the Jobs outlined in this Workflow are making changes to production, they must need credentials and information about the environment. It’s time to talk about Secrets!
Code should remain as stateless and DRY (Don’t Repeat Yourself) as possible. This improves code re-use and weeds out the risk of statically defined information hidden away in the code. Stateful information should be stored in a Secrets manager and loaded into Variables on-demand.
Secrets are key / value pairings containing the information required to execute the Workflow. This includes, but is not limited to:
- Credentials for accessing, updating, changing, and deleting resources.
- Configuration settings, such as names, network addresses, locations, or versions.
- Conditional information to determine if or when a resource should be mutated.
Building upon the previous example, I have defined a handful of Variables in the Workflow’s metadata section. I have Variables for the Terraform plan name, AWS credentials, and the Cache state path. The CI tool uses its back-end Secrets manager to populate values into the Variables during each Job.
I often mask Variable values to prevent showing them in the console, which is a common feature across CI tools. I recommend using environmental variables to avoid accidental Secrets leaking with
echo or other unsafe loading methods.
Please accept a crisp high five for reaching this point in the post!