How to Query GraphQL APIs with Python

In the post called How to Create AWS Lambda Layers for Python, I briefly touch upon the need to retrieve workflow and environment data from GitLab’s GraphQL API. I publish this data to an internal dashboard to track the status of workflows (success, failure, skipped) and the state of environments (available, stopped). The dashboard uses InfluxDB as a data source.

Hang on, folks, we’re about to query a GraphQL API

In this post, I take the concepts described in An Introduction to GraphQL Queries and Mutations and apply them to the GitLab GraphQL API. I start by exploring the GraphQL API schema for GitLab and providing an example of the results. Then, I define a Python function to query GraphQL APIs of any type. Finally, I touch on translating the results into a payload compatible with InfluxDB’s line protocol.

Exploring the GraphQL API Schema

GitLab hosts a browser-based integrated development environment (IDE) using GraphiQL (pronounced “graphical“). It is supremely handy for walking through various GraphQL schemas to find desired information.

A few things I like about GraphiQL:

  • Information on queries, mutations, and fields are found on the right-hand panel using the same hierarchy as the API.
  • The use of Ctrl + Space provides a list of available fields based on cursor placement.
  • The “play” button sends a live request to the API with user credentials supplied.
Using Ctrl + Space to search across applicable fields

The two query fields I am interested in are environments and pipelines. These are both found within the projects field along with a membership argument to limit results to projects of which my account is a member. Additionally, the first: 1 argument for pipelines limits results to the most recent pipeline run.

Once I tweak the query in GraphiQL to provide the desired results, I save the query into a Python variable named gitlabQuery for later usage.

gitlabQuery = """
{
  projects(membership: true) {
    nodes {
      name
      fullPath
      environments {
        nodes {
          name
          state
        }
      }
      pipelines(first: 1) {
        nodes {
          status
          duration
          finishedAt
        }
      }
    }
  }
}
"""

Example GraphQL Query Results

What do these results look like? I’ve snipped out one object from the query results as an example of what to expect:

{
    "name": "lab-azure-site-deploy-eus",
    "fullPath": "string",
    "environments": {
        "nodes": [
            {
                "name": "lab",
                "state": "stopped"
            }
        ]
    },
    "pipelines": {
        "nodes": [
            {
                "status": "SUCCESS",
                "duration": 1396,
                "finishedAt": "2020-07-17T06:29:31Z"
            }
        ]
    }
}

This lab-azure-site-deploy-eus project provides Infrastructure as a Service (IaaS) cloud resources in the Azure East US region. It is used for on-demand demonstrations, hence the lab environment name, as is stopped when not in use to save cost. Additionally, the last pipeline run took 1396 seconds, finished on 17th July, and produced a status result of success.

Armed with a working GraphQL query, it is time to switch from GraphiQL to Python for further scripted queries.

Defining a Python Function for GraphQL Queries

I use a simple Python function named run_query to send a request to an API. I found a slightly different version of this function on GitHub and altered it to suit my needs – kudos to Andrew Mulholland.

The function accepts the URI address, query (as defined earlier in this post), a desired status code, and the authentication header. If the desired status code is not returned, the function throws an exception.

def run_query(uri, query, statusCode, headers):
    request = requests.post(uri, json={'query': query}, headers=headers)
    if request.status_code == statusCode:
        return request.json()
    else:
        raise Exception(f"Unexpected status code returned: {request.status_code}")

The next step is defining variables needed for the request. I define the URI, headers, and status code directly in the Python script. I don’t see these values changing very often, if ever. The token, however, is both sensitive and frequently changed. It is stored elsewhere (Vault) and retrieved as an environmental variable when invoked by an AWS Lambda function.

gitlabURI = 'https://gitlab.com/api/graphql'
gitlabToken = 'string'
gitlabHeaders = {"Authorization": "Bearer " + gitlabToken}
gitlabStatusCode = 200

Finally, I execute the query while passing along all required information. The results are returned to a variable named result.

result = run_query(gitlabURI, gitlabQuery, gitlabStatusCode, gitlabHeaders)

Working with the Results

The result variable contains the value of return request.json(). This is JSON formatted content from the API request. A structured object is fairly easy to parse when compared to scraping a large string payload! 🙂

As an example, these rudimentary for statements will walk the JSON payload and store information in a format that is compatible with InfluxDB’s line protocol.

payloadWorkflow = []
for n in result["data"]["projects"]["nodes"]:
    for v in n["pipelines"]["nodes"]:
        if v["status"] is not None:
            name = n["name"]
            status = v["status"]
            duration = v["duration"]
            finishedAt = v["finishedAt"]
            payloadWorkflow.append(f"workflow,project={name},duration={duration},finishedAt={finishedAt} status=\"{status}\"")

Once the payload has been translated, it is sent over to InfluxDB by way of the influxdb-client for Python. Sensitive information has been replaced with the word string.

influxToken = "string"
influxOrg = "string"
influxBucket = "string"
influxClient = InfluxDBClient(url="https://us-west-2-1.aws.cloud2.influxdata.com", token=influxToken)

write_api = influxClient.write_api(write_options=SYNCHRONOUS)
write_api.write(influxBucket, influxOrg, payloadWorkflow)

This InfluxDB table displays the stored results.

Simple but effective

The steps up to this point should provide enough information to successfully query the GraphQL API of choice using Python. What is ultimately done with that information will be driven by the use case being solved.

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about APIs, or other modern technology approaches, head over to the Guided Learning page.

If there’s anything I missed, please reach out to me on Twitter. Cheers! 🙂