GitLab CI Runners on Amazon EKS using Terraform

I have spent a significant chunk of this year using GitLab as a host for internal code and as a continuous integration (CI) platform. GitLab uses the idea of a runner to perform the various jobs outlined in a CI workflow. By default, GitLab online accounts are granted a specific quantity of shared CI runtime minutes for free. This value has recently been adjusted down with a lot of proactive communication. I think that folks may be looking for alternative locations to host private CI runners for cost optimization. In my environment, an Amazon Elastic Container Service for Kubernetes (EKS) cluster has been hosting ephemeral containers for GitLab’s runners for 6+ months at a cost of about $40 USD per month.

In this post, I cover all of the steps necessary to get up and running with EKS to host GitLab CI runners. I start with the inner workings and creation of an EKS cluster service role followed by a GitLab cross-account role. From there, I show how to build both of these roles with a Terraform configuration. Finally, I cover adding an IAM user to Kubernetes using RBAC and a ConfigMap to support daily operations.

Understanding the Amazon EKS Cluster Service Role

A traditional EKS cluster is made up of two types of resources: a cluster resource in EKS and one or more node instance resources in Amazon Elastic Compute Cloud (EC2). Both resource types need an AWS Identity and Access Management (IAM) role in order to function. IAM roles provide a relationship between trusted entities (the who) and policies (the what) for an account (the where). A role applied to a resource allows that resource to inherit the policy without having to fuss around with certificates, secrets, or tokens.

An EKS cluster must have a role applied in order to do the things it needs to do in the AWS account. This role does not exist by default. Supplying the role to an EKS cluster is required during creation and is immutable once set. By creating an EKS cluster role, I am explicitly trusting the entity to access multiple AWS services on behalf of my account.

Role assumption is handled by another AWS service, the Security Token Service (STS), which is defined in the role as an action. Policy information is supplied by an AWS managed policy; this defines what actions for which services the role is allowed to perform. Managed policies are “out of the box” and – you guessed it – managed by AWS.

Below is a diagram showing how the EKS cluster service role works:

AWS provides guidance on how to create the role using the console. However, I prefer to use Terraform to create this role (in addition to the next one). Either way, the role’s Amazon Resource Name (ARN) will be required later when setting the EKS cluster via GitLab.

Check out this video to see how I create the role using Terraform, including refactoring the code!

Understanding the GitLab Cross-Account Role

GitLab provides a managed Kubernetes experience in which provisioning and operations is largely tackled by them. To accomplish this, a Kubernetes cluster is provisioned using a cross-account role. This role permits the GitLab AWS account to assume the role and provision the required resources to support the cluster. More specifically, a CloudFormation Stack that builds all the things. 🚀

Below is a diagram showing how the GitLab cross-account role works:

A few notable points about this role:

  1. The trusted entity, GitLab’s AWS account, uses an External ID to further assist with authentication. This is an additional layer of defense.
  2. The policy is customer managed, meaning created by the customer. GitLab supplies the list of required permissions.
  3. Deletion of resources is prohibited by the policy to prevent potential harm.

The External ID value is part of the role’s conditions. The value is stored in plaintext and is not a substitute for proactive security hygiene. An example is shown below:

The policy summary shows a more extensive list of permissions. The policy does not permit delete operations for any of the defined services.

If the CloudFormation Stack must be destroyed, the task must be performed by the customer. I approve of this approach; deleting the cluster should be a rare occurrence driven by internal customer operations.

GitLab provides guidance on how to create the role using the console. However, it should come as no surprise that I prefer to use Terraform. The repository contains detailed information on the configuration and how it works. My intent is to keep the code simple and beginner-friendly. Feedback, issues, and pull requests are all welcome!

Accessing the Cluster with Kubectl

Welcome to the Bonus Round. In this part of the article, I focus on daily operations – the part typically long forgotten!

In my scenario, one of the GitLab managed applications – the runner – was failing to deploy into the EKS cluster. I realized that I would need access to the pod logs to isolate a root cause.

Something went wrong while installing GitLab Runner
Operation failed. Check pod logs for install-runner for more details.

However, there was a problem: the GitLab cross-account role is, by default, the only role entitled with RBAC access to the Kubernetes API. Even as a full Administrator, the API, and tools such as kubectl, will deny access with this error message:

> kubectl get pods

error: You must be logged in to the server (Unauthorized)

My solution to this issue was to temporarily hijack the GitLab cross-account role so that I could add my IAM user to the system:masters group. This required doing two things to the GitLab cross-account role:

  1. Added my user account as a trusted entity
  2. Removed the External ID condition

Once done, I configured the AWS CLI to access the EKS cluster using the GitLab cross-account role as shown below:

aws --region us-west-2 eks update-kubeconfig --name gitlab-eks-cluster --role-arn arn:aws:iam::[account]:role/gitlab-eks-role

This allowed me to use kubectl to dig around in the logs and file an issue.

> kubectl logs install-runner -n gitlab-managed-apps

+ export 'HELM_HOST=localhost:44134'
+ helm init --client-only
+ tiller -listen localhost:44134 -alsologtostderr
+ helm upgrade runner runner/gitlab-runner --install --atomic --cleanup-on-fail --reset-values --version 0.20.1 --set 'rbac.create=true,rbac.enabled=true' --namespace gitlab-managed-apps -f /data/helm/runner/config/values.yaml
Error: failed to download "runner/gitlab-runner" (hint: running `helm repo update` may help)

In this case, the issue was with a missing helm chart. Confident that the EKS cluster was healthy, I went about adding my IAM user with a ConfigMap.

Adding an IAM User to RBAC

Adding an IAM user is accomplished by modifying the aws-auth ConfigMap. By default, a single MapRole exists to form a relationship between the EKS node role (which was created via the CloudFormation Stack by the logical id NodeInstanceRole) and two Kubernetes system groups. The config starts life by allowing nodes to join cluster and is further used to assign IAM users and roles.

An example is shown below:

  mapRoles: |
    - rolearn: arn:aws:iam::[account]:role/gitlab-eks-cluster-NodeInstanceRole-1234567890ABC
      username: system:node:{{EC2PrivateDNSName}}
        - system:bootstrappers
        - system:nodes

The edit command provides an easy process for editing the aws-auth ConfigMap.

kubectl edit -n kube-system configmap/aws-auth

Adding my IAM user to the data key with the MapUsers method and the system:masters group provides the needed access. Once the file is saved and closed, the effect is immediate.

  mapUsers: |
    - userarn: arn:aws:iam::[account]:user/chris.wahl
      username: chris.wahl
        - system:masters

With this done, I reverted the hijacked GitLab cross-account role to its desired state by running the Terraform configuration once more. Finally, I removed the role/gitlab-eks-role statement in my .kube configuration to avoid assuming the GitLab role any further.

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about Cloud Architecture, or other modern technology approaches, head over to the Guided Learning page.

If there’s anything I missed, please reach out to me on Twitter. Cheers! 🙂