Level up your Lambda Game with Canary Deployments
Have you ever deployed a new version of your Lambda function to production and immediately regretted it? Have you heard people advocating “testing in production” and wondered how that is possible?
In this tutorial, you will learn how to use canary deployments in SST to safely expose a small percentage of users to new versions. If sh*t happens, which it inevitably will, automatic rollbacks got your back.
1. How it works
Canary deployments is a deployment strategy that releases a new application version to a small subset of users. Let’s say that you deploy a new version of your application, and your new version includes a defect. If you send 100% of traffic to this version, you could break the application for all users.
Canary deployments aim to solve this problem by validating your new version against a small fraction of the traffic. You can, for example, opt to send 5% of traffic to the latest version for a set period while monitoring its behavior. If all metrics look good during this period, you can switch over 100% of traffic to the new version. But, if error metrics for the new version start rising, you can automatically roll back to the previous version.
The simplest way to implement canary deployments of Lambda functions is with Lambda aliases and AWS CodeDeploy.
1.1. Lambda Aliases
AWS Lambda lets you have multiple versions of a Lambda function deployed at the same time. A Lambda alias functions as a pointer to a specific version. Aliases also let you configure a second version and define the percentage of incoming events it should route to each version. CodeDeploy utilizes this feature to control the traffic weighting of the alias during a deployment.
You can configure API Gateways to point to aliases instead of functions. This means that all traffic entering your API Gateway will be routed to the different versions based on the weights in the alias.
1.2. AWS CodeDeploy
AWS CodeDeploy is a service that helps you automate releases to AWS Lambda, AWS ECS, and AWS EC2. To use CodeDeploy with AWS Lambda, you need a few resources:
- A CodeDeploy Application
- A CodeDeploy Deployment Group
- A CodeDeploy Deployment Configuration
CodeDeploy Application
An application is simply a container for deployments and deployment groups.
CodeDeploy Deployment Group
A deployment group defines the target of a deployment. For Lambda deployments, the target is a Lambda alias. It also defines the deployment configuration to use, as well as any alarms that should trigger a rollback.
CodeDeploy Deployment Configuration
A deployment configuration defines the deployment strategy. CodeDeploy comes with a few built-in strategies, but you can also create your own. The built-in strategies for Lambda functions are:
- Linear: Shift traffic in equal increments with an equal number of minutes between each increment. For example, 10% every 10 minutes.
- Canary: Shift traffic in two increments. For example, first 10% for 5 minutes and then 100% afterward.
- All at once: Shift all traffic to the new version immediately.
2. Tutorial
This tutorial assumes that your local environment is configured with AWS credentials. The tutorial uses SST to define and deploy infrastructure. You should be able to accomplish the same with any of the other CloudFormation-based tools (SAM, CDK, etc.).
2.1. Create a new SST Project
Initialize a new SST project. We use the standard/api as a starting point for illustrative purposes.
Open up the generated project in your favorite editor and open the stacks/MyStack.ts
file. Start by removing everything from the stack so that you start from a clean slate:
Pick your region
SST defaults to the us-east-1
region. You can specify another region in
sst.config.ts
.
2.2. Add a simple Lambda-backed API
Let’s add a simple API backed by a Lambda function to the stack. The standard/api
template you used comes pre-baked with a simple handler function in packages/functions/src/lambda.ts
that you can use for the purpose of this tutorial. This handler will be invoked when a GET
request hits the root path (/
) of the API Gateway.
In stacks/MyStack.ts
, add the following:
Deploy your application to your AWS account using the SST CLI:
Test your API to make sure everything is setup correctly:
2.3. Create a Lambda Alias
In stacks/MyStack.ts
, add a Lambda alias to your Lambda function and update your API Gateway to route traffic to the alias instead:
Deploy again with npx sst deploy --stage prod
to ensure the alias works.
2.4. Create an Alarm
CodeDeploy can automatically roll back a deployment in case one or more specified alarms get triggered during the deployment. To illustrate this, you will add an alarm that triggers when the new Lambda version produces any errors.
It is a bit annoying to have to specify the values of dimensionsMap
manually. It would be great if CDK could infer this automatically if you use alias.metricErrors
or func.currentVersion.metricErrors
. This is reported as a bug in the following GitHub issue.
In the stack, add the following:
This alarm will check for any errors in the newly deployed Lambda version. If your current Lambda version is X
, a new deploy will create a version X+1
alongside X
and send a percentage of traffic to it. Using the dimensions above ensures only the X+1
version can trigger the alarm.
Alarm Name
The alarm is named after the function name and version, which ensures that the alarm is recreated during a deployment. If the currently deployed version is experiencing errors, the alarm could be in an alarm state when you deploy a new version, resulting in an instant rollback. Updating the underlying metric of the alarm does not reset the alarm status. This workaround creates a new alarm with the correct state and underlying metric for each deployment.
2.5. Add CodeDeploy Configuration
You must first create a CodeDeploy Application. In your stack, add:
Next, add a new Deployment Group to your CodeDeploy Application, referencing the Lambda alias and alarm you created earlier:
This creates a CodeDeploy Deployment Group that uses a built-in deployment strategy. The strategy sends 10% of the traffic to the new version and it keeps that weight for a duration of five minutes. If any specified alarm is triggered during this time window, CodeDeploy automatically rolls back the deployment and sends 100% of traffic to the old version. If this happens, CloudFormation will roll back the stack to the previous state.
Deployment Time
CloudFormation will be stuck in the UPDATE_IN_PROGRESS
state until the
CodeDeploy deployment is complete.
2.6. Trying it out
Change the return value of your Lambda function (packages/functions/src/lambda.ts
):
Deploy your stack again with npx sst deploy --stage prod
. After some time you should notice the deployment getting stuck at API MyFunc/Aliaslive AWS::Lambda::Alias UPDATE_IN_PROGRESS
. This means that the CodeDeploy deployment is in progress.
Hit your endpoint a few times with curl
or similar. Most of the responses should be Hello world. The time is ...
but you should see a few Hello there. I'm a canary.
as well. After five minutes have passed, the deployment should be complete, and all future requests will return the new message.
Let’s simulate a failed deployment. Inject an error in your handler:
Deploy again and hit your endpoint again while the CodeDeploy deployment is in progress. Some of the requests should hit the canary and return a 500 Internal Server Error
. Shortly after this happens, the alarm will be triggered and the deployment will be rolled back. All requests you do should now return Hello there. I'm a canary.
. The CloudFormation stack will also be rolled back to its previous state.
2.7. Customizing the Deployment Strategy
CodeDeploy comes with a couple of built-in deployment strategies. If these do not fit your needs, you can create a custom deployment config. Imagine you want to send 50% of the traffic to the new version, and let it “bake” for 10 minutes. Simply create a new deployment configuration and update the deployment group:
2.8. Managing Different Environments
Currently, your application will use the same deployment strategy in all environments. This is probably not what you want. In ephemeral environments and when using SST’s Live Lambda Development you most likely want instant deploys without any canary. You can use the stage
parameter to conditionally define the deployment configuration to use:
3. Conclusion
In this tutorial you have learned how to use Lambda aliases and CodeDeploy to do canary deployments of your Lambda functions. You have also learned how to create custom deployment strategies as well as how to use different strategy for different environments.
With this knowledge, you can improve the robustness and resilience of your serverless architectures, and you are one step closer to testing in production.