How to Create AWS Lambda Layers for Python

(Note: This post is a bit older; I suggest reading Importing Python Modules in AWS Lambda Functions)

I often use AWS Lambda to execute arbitrary Python glue code for use cases such as scraping API endpoints, rotating API tokens, or sending notifications. One shortcoming of this approach is the lack of pip to satisfy import requirements. While I could add the dependencies to the deployment package, this bloats the function code and increases operational toil.

The incorporation of Lambda layers for Python solves this shortcoming by supplying module dependencies without the use of pip. Operational toil is reduced by reusing these layers across numerous Lambda functions. But, how does one include Python library dependencies in a layer, especially when running on a Windows platform?

BMO is an expert in Python dependencies.

In this post, I review how to setup Python for Windows and describe my use case scenario for needing Lambda layers for Python. I then deep dive into installing a Python package to a custom location, creating a zip file with all necessary packages and libraries, publishing the zipped code to an AWS Lambda layer, and associating the layer to an AWS Lambda function.

Python Setup for Windows

Microsoft has a great article named Get started using Python on Windows for beginners. Give this a read if you are new to using Python 3.x on the Windows platform. The article suggests installing Python from the Microsoft Store. I enjoy using this method in addition to the Windows Subsystem for Linux (WSL) for creating Python code. You can use one method, or both, depending on your use case.

I’m a heavy Visual Studio Code (VSCode) consumer. Because of this, I also leverage the Python extension for Visual Studio Code. It provides numerous benefits, including linting, debugging, and testing.

With that out of the way, I’ll dive into the guide in the following sections.

Use Case Scenario

In this scenario, I construct a Python script to scrape the GitLab GraphQL API and upload the formatted results to InfluxDB using their Python client library. The objective is to monitor continuous integration workflows and environments in a live dashboard. The script must satisfy these requirements:

from datetime import datetime
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS

The influxdb_client package and all dependencies must be stored inside a Lambda layer before the script can be run as a Lambda function. Accomplishing this goal is broken into the following steps:

  1. Install Python Packages to a Custom Location
  2. Create the Zip File
  3. Publish an AWS Lambda Layer
  4. Associate the Layer to the Function

On with the show!

Install Python Packages to a Custom Location

A Lambda layer accepts content from a zip file or Amazon Simple Storage Service (Amazon S3) bucket. The Amazon S3 bucket option is suitable for uploads exceeding 50 MB in size while zip files are handy for smaller packages and libraries. My requirements are relatively small and are easily sent via zip file. The first step in this process is installing the packages locally into a custom target location.

The environmental variable PYTHONUSERBASE defines the user base directory. This is used to determine the installation paths when running python setup.py install --user. I leverage this variable to force a clean installation of the package, and its dependencies, into a custom location using the code below:

$env:PYTHONUSERBASE="\Code\temp"

Next, I invoke pip with the --user parameter and the name of the Python package.

pip install --user influxdb-client
<snip>
Installing collected packages: certifi, six, python-dateutil, rx, pytz, urllib3, influxdb-client
Successfully installed certifi-2020.6.20 influxdb-client-1.9.0 python-dateutil-2.8.1 pytz-2020.1 rx-3.1.1 six-1.15.0 urllib3-1.25.9

A squeaky clean requirements.txt file is generated in the user base directory.

rx >= 3.0.1
certifi >= 14.05.14
six >= 1.10
python_dateutil >= 2.5.3
setuptools >= 21.0.0
urllib3 >= 1.15.1
pytz>=2019.1

The module, and all dependencies, are stored in the root path \Code\temp. Python uses several more nested directories based on the version of Python. In this case, it is Python38 because I’m using Python 3.8. Finally, the packages themselves are stored in the site-packages directory.

The full directory and contents are shown below:

Directory: C:\Code\temp\Python38\site-packages
Mode   Length Name
----   ------ ----
d----         __pycache__
d----         certifi
d----         certifi-2020.6.20.dist-info
d----         dateutil
d----         influxdb_client
d----         influxdb_client-1.9.0.dist-info
d----         python_dateutil-2.8.1.dist-info
d----         pytz
d----         pytz-2020.1.dist-info
d----         rx
d----         Rx-3.1.1-py3.8.egg-info
d----         six-1.15.0.dist-info
d----         tests
d----         urllib3
d----         urllib3-1.25.9.dist-info
-a---   34159 six.py

This provides all of the required content necessary to construct a custom zip file.

Alternate Installation Method

There is an alternate method for installing the required packages, although it is not as clean. This section can be skipped when using the previous method.

Install the required package and dependencies using pip.

pip install influxdb-client

Use pip to show details on the package. The details include a Requires: section that lists all libraries and packages needed by parent package, influxdb-client. This information provides a list of packages to store into the zip file. All packages can be found at the Location: directory.

> pip show influxdb-client
Name: influxdb-client
Version: 1.9.0
Summary: InfluxDB 2.0 Python client library
Home-page: https://github.com/influxdata/influxdb-client-python
Author: None
Author-email: None
License: UNKNOWN
Location: c:\users\chris\appdata\local\packages\pythonsoftwarefoundation.python.3.8_qbz5n2kfra8p0\localcache\local-packages\python38\site-packages
Requires: setuptools, python-dateutil, rx, urllib3, certifi, pytz, six
Required-by:

This provides everything necessary to construct a custom zip file.

Create the Zip File

AWS Lambda has very specific requirements when loading content from a zip file. Layers are extracted to the /opt directory in the function execution environment. The package contents must be stored in a folder named python in order to load properly.

I move all of the site-packages content to a new folder at \Code\python. The python folder is zipped as python.zip. Thus, all contents in the zip file live in a folder named python as shown below:

This meets the requirements for AWS Lambda layers for Python.

Publish an AWS Lambda Layer

All further steps are accomplished with the AWS Command Line Interface (CLI). It is a very useful tool that can do almost anything without having to visit the Console. 🙂

I use publish-layer-version to generate a new layer. If the layer already exists, the command creates a new version of the layer. I’m a fan of using --compatible-runtimes to make compatibility clear, although this parameter is not required.

The command below generates a new layer. Note that fileb:// is a required prefix when reading a binary file such as the zip file.

aws lambda publish-layer-version `
--layer-name influxdb-client-python `
--description "InfluxDB Client for Python 3.x" `
--compatible-runtimes python3.6 python3.7 python3.8 `
--zip-file fileb://python.zip

The command returns a JSON payload as the result. Write down the LayerVersionArn for later.

{
    "Content": {
        "Location": "https://awslambda-us-west-2-layers.s3.us-west-2.amazonaws.com/stuff",
        "CodeSha256": "stuff",
        "CodeSize": 2987965
    },
    "LayerArn": "arn:aws:lambda:us-west-2:123456789012:layer:influxdb-client-python",
    "LayerVersionArn": "arn:aws:lambda:us-west-2:123456789012:layer:influxdb-client-python:1",
    "Description": "InfluxDB Client for Python 3.x",
    "CreatedDate": "2020-07-19T02:58:33.033+0000",
    "Version": 7,
    "CompatibleRuntimes": [
        "python3.6",
        "python3.7",
        "python3.8"
    ]
}

A Quick Note on Working with Layers

I’m often interrupted with numerous things. In some cases, I lose the LayerVersionArn or forget the layer information entirely. Using list-layers gives a list of the layers – surprise!

aws lambda list-layers

The --compatible-runtime filter is handy when searching for a specific type of layer.

aws lambda list-layers --compatible-runtime python3.8

Alternatively, list-layer-versions is helpful when the layer name is known but the version history is not.

aws lambda list-layer-versions --layer-name influxdb-client-python

These commands are great when sorting through layers to find the ARN or version.

Associate the Layer to the Function

The final step! In my scenario, the function is already created. Thus, I use the update-function-configuration command to associate the new layer with the existing function. The LayerVersionArn recorded earlier becomes the value passed to the --layers parameter.

aws lambda update-function-configuration `
--function-name gitlab-dashboard `
--layers arn:aws:lambda:us-west-2:123456789012:layer:influxdb-client-python:1

The function is now using the new influxdb-client-python layer, version 1, as part of the function code. Even if a new version of the layer is published, the function code will continue to operate based on version 1. To use a new layer version with the function, run the function update command again with the new layer version number appended to the layer’s ARN.

Cleanup Tasks

I destroy the temp folder containing the required modules after verifying that the function code is operational. The PYTHONUSERBASE environmental variable is cleared when the console session closes. This maintains a clean working environment.

Adding Layers in the Future

I later decide to use the requests package. Rather than update my layer, I leverage the work over at Klayers. Run the function update command an additional time while including both layers. The command is not additive – all desired layers must be declared.

aws lambda update-function-configuration `
--function-name gitlab-dashboard `
--layers arn:aws:lambda:us-west-2:123456789012:layer:influxdb-client-python:1 arn:aws:lambda:us-west-2:770693421928:layer:Klayers-python38-requests:8

The function code now uses my influxdb-client packages along with the requests package provided by Klayers. Lambda allows for 5 layers per function, providing the opportunity to perhaps add an AWSome Lambda Layers for Python in the future!

Next Steps

Please accept a crisp high five for reaching this point in the post!

If you’d like to learn more about Cloud Architecture, or other modern technology approaches, head over to the Guided Learning page.