Observability Best Practices when running FastAPI in a Lambda
Serverless makes it incredibly easy to get an API up and running in record time. But how do you achieve a high level of observability in a distributed architecture? In this post, we will be looking at instrumenting logging, metrics, and tracing capabilities for a FastAPI application using AWS Lambda Powertools for Python.
Note
This guide assumes that you know how to build and deploy a SAM application using the AWS SAM CLI. If you need a refresher on that, please refer to my previous blog post.
What is observability?
The rise of cloud and serverless computing has made a big impact on how we build, scale, and manage applications. Developers can now independently build and deploy applications with unmatched agility and speed. With distributed applications becoming more and more common, we must take care to not lose observability in our systems and services. Before long, we will get to a point where a single request traverses our architecture through any number of functions, containers, queues, and other services.
By implementing best practices when it comes to logging and metrics, as well as being able to trace a request through the entire system, we can gain a better understanding of how our applications are performing. The goal of observability is to instrument our applications so that when an issue occurs somewhere in our distributed system, we can quickly locate the root cause of the issue. Good observability also gives great insights into how our applications and services are being used, valuable insights that can guide future business decisions.
Observability can be achieved in several ways, and in this article you will learn how to instrument a FastAPI application, running inside a Lambda function, using the AWS Lambda Powertools for Python library. We will focus on implementing three different capabilities in our application.
- Structured Logging where we add contextual information to logs, such as request and correlation IDs, Lambda and FastAPI context, service name, exception information, and more.
- Metrics so that we can track how our application is used and how it is performing.
- Tracing so that we can trace requests as they travel through our systems.
Sample FastAPI Application
In this example we will use a simple FastAPI application, a Pets API that allows you to add, remove, list, fetch and update pets that are stored in DynamoDB. We will use a REST API Gateway to act as a proxy in front of the application.
Note
As shown in the AWS Docs, HTTP APIs do not (at the time of writing) support tracing with AWS X-ray, which is why we have to use a REST API instead.
To start, we will need the following files (they are also available on GitHub):
example/samconfig.toml
example/template.yml
example/src/requirements.txt
example/src/app/__init__.py
example/src/app/dynamo.py
example/src/app/models.py
Checking the baseline
Before we add more advanced features to the application, let’s deploy the API in its current state to establish a baseline. While testing the API, I have used this Python script. The script uses boto3
to fetch the API URL from the CloudFormation Stack outputs. It then runs a sequence of requests towards the API, touching all the endpoints. You could also issue requests with cURL, through the Swagger UI available at https://API_URL/STAGE/docs, or with something like Postman. The choice is yours.
After making a few requests, navigate to the CloudWatch console and go to Logs Insights. You should see a log group called /aws/lambda/FUNCTION_NAME
in the Select log group(s) dropdown.
The resulting log records contain almost no useful information, which is because right now we are only using simple print(...)
statements in the code. Let’s change that, shall we?
AWS Lambda Powertools Python
Lambda Powertools Python is a library that comes fully packed with utilities that make it easy to adopt best practices when it comes to the observability of AWS Lambda functions. We will focus on the three core utilities, which are: Logger, Metrics and Tracer.
First, in requirements.txt
, add the library.
In the application code, you will need to import and initialize the three utilities. In a new file example/src/app/utils.py
, add the following:
Logging
Let’s start by adding some structured logging to the application. We will use the Logger
utility from the Lambda Powertools library for this.
Upgrade from print statements
In the dynamo.py
file, add from .utils import logger
and change all the print(...)
statements to logger.info(...)
. After a few requests, the experience in CloudWatch should have gotten a tad better.
Here we can see the power of structured logging. CloudWatch has identified fields such as level
, location
, message
, service
, and so on. We can now use these fields to filter queries. If we wanted to see only error logs, we could add a filter to the query:
What is that service_undefined
value in the service
field you ask? That’s right, we forgot one thing in the SAM template. We can control what service name Lambda Powertools uses by setting environment variables in a function. If you have a lot of different services, being able to easily filter logs by service name will be critical when you are troubleshooting a problem in production.
That wasn’t so hard right? Just a few lines of code, and we now have nicely structured logs in the FastAPI application. Let’s explore some more features in the Lambda Powertools Logger.
Lambda context
Powertools makes it easy to add information from the Lambda context to logs with the inject_lambda_context decorator, like this:
But we do not have a handler function, do we? We have a Mangum object wrapping the FastAPI application. Luckily, the Mangum
object acts as a handler function, so we can just add the following in example/src/app/__init__.py
:
Here, we set the parameter clear_state=True
to clear the state on each invocation. This is useful if you want to ensure that the logs are not polluted with the state from previous invocations.
So, one more line of code. What did that line give us in the CloudWatch console?
Some information about the Lambda context, such as function name, memory configuration, as well as whether the invocation required a cold start or not. Nice!
Correlation ID
Correlation IDs can be used to uniquely identify a request as it traverses multiple services. This is a key component to being able to correlate logs from different services in distributed systems.
In our example, the API will accept an optional X-Correlation-Id
header, and if it is not present, it will use the request ID from the Lambda context. The correlation ID can then be added to requests towards downstream services (if any), to be able to get a complete view of the request flow. The Logger
utility includes two helper functions to handle correlation IDs, logger.set_correlation_id("ID")
and logger.get_correlation_id()
.
Since we want to extract the correlation ID in every request, as well as return it in every response, we will implement this using a FastAPI middleware.
In example/src/app/__init__.py
, add the following:
Hit the API with a few requests. If you supply a X-Correlation-Id
header in the request, you should see the same X-Correlation-Id
in the response. If you do not supply one, one will be generated for you. If the application makes further requests to downstream services, the correlation ID could be retrieved with logger.get_correlation_id()
and passed on to the downstream service. Navigate to CloudWatch and take a look at the shiny new correlation_id
field.
FastAPI context
Let’s add some information about the FastAPI request context to the logs. On the top of my head, it would be nice to be able to filter logs by the request path, request route, and the request method. Since we have the Request
object in the middleware created above, couldn’t we add this functionality there? We can’t. This is because middlewares are executed before any routing is made in the FastAPI application. So if we want to add the routes as they are declared in the application, such as /pets/{pet_id}
, we need to use a custom APIRoute class.
Add a new file, example/src/app/router.py
, and add the following:
Here we add a few fields to the logger
so that e.g. every future logger.info(...)
call includes those fields. If you do not fully understand how the custom APIRoute
works, please read the FastAPI documentation. He explains it a lot better than I possibly could.
Now, to use the custom APIRoute
class, we need to add the following in example/src/app/__init__.py
:
Let’s see what we have in CloudWatch this time.
Look at that beauty. We now have filterable fields for the method, route, and actual path. Want to get all logs that occurred when someone fetched a specific pet with ID 123456
? Simply add a filter to the query like | filter fastapi.method = "GET" and fastapi.path = "pets/123456"
. Want to see all logs from all calls to the DELETE /pets/{pet_id}
route? You know what to do.
Exceptions
The Logger
utility can also help us understand and debug errors easier by logging exceptions with logger.exception(...)
. To illustrate how Lambda Powertools can help us here, let us first add an endpoint in which we inject an error. Add the following code to example/src/app/__init__.py
:
Call the endpoint, you should receive a 500 Internal Server Error
. Head over to CloudWatch and see how the exception was logged.
Looks like we are missing a lot of information here, such as correlation ID, route, service, and all the other fields we added previously. Let’s fix that, by adding a custom exception handler.
Warning
Creating an exception handler that catches all exceptions does not seem to be fully supported. I’ve tried just creating an exception handler for exceptions of type Exception
, and it seems to enter the handler fine and the logging works. However, it does not catch the exception fully, so the exception is still propagated and finally caught in the Mangum event loop. It also does not enter the middleware that sets the correlation ID in the response.
The issue seems to be resolved by adding a starlette.exceptions.ExceptionMiddleware
, though I’m unsure if this can have any unforeseen side effects. Use at your own risk! :)
Further information on FastAPI GitHub Issue and Starlette GitHub Issue:
In example/src/app/__init__.py
, add the following:
Call the /fail
endpoint again and head over to CloudWatch.
Now we have a lot of relevant information in logs whenever an unhandled exception occurs, which makes it easy to find what caused a request to fail if you for example have the correlation ID at hand.
The logs are now a lot more useful, and we did not need to add that much code to the application. Most of the code resides in either middleware, exception handlers, or the custom APIRoute
class. The changes to the actual business logic have been minimal, which shows that Lambda Powertools can easily be added to existing applications to improve logging practices.
Metrics
Let’s explore the next core utility in Lambda Powertools, the Metrics
utility. This utility lets you easily push metrics to CloudWatch by taking care of all the necessary boilerplate. It works asynchronously by using Amazon CloudWatch Embedded Metrics Format, by logging the metrics to stdout. It also aggregates all metrics from each invocation to save on the number of calls to CloudWatch.
Info
There is some terminology to be aware of here. CloudWatch metrics are grouped in containers called namespaces. If an application comprises multiple services, you could for example use the same namespace for all of them to group all the metrics for that application. Metrics can also have dimensions, which are key-value pairs added as metadata to metrics which allows you to filter and aggregate metrics depending on the dimension values.
The default configuration for the Metrics
utility utilizes the two environment variables POWERTOOLS_METRICS_NAMESPACE
and POWERTOOLS_SERVICE_NAME
, where the former specifies the metric namespace and the latter adds a dimension service=POWERTOOLS_SERVICE_NAME
to all metrics.
Instrumenting
If you followed along with the guide so far you should already have a metrics
object in example/src/app/utils.py
. There we used the default configuration, which sets the namespace and service dimension from environment variables. You could also specify those explicitly, like this:
If you want to add another dimension to all metrics, such as environment, you can do so with the set_default_dimensions
method:
Now, to enable the functionality in the application, add the following to example/src/app/__init__.py
:
Note
Make sure that the log_metrics
decorator is added last to the handler
function.
Adding metrics
Let’s add the following metrics to the application:
- Number of times a pet is created
- Number of times an unhandled exception occurs
Created pets metric
To add a metric that counts how many times a pet is created, we can add the following to the POST /pets
route. We add it after the call to DynamoDB, to only record a metric if the pet was created successfully.
Unhandled exception metric
To count the number of unhandled exceptions, we can add the following to the exception handler:
Using different dimensions
When using CloudWatch EMF, all metrics in a document must have the same dimensions. So, all calls to add_metric
will generate metrics with identical dimensions, service
plus any additional default dimensions you’ve added. To add different dimensions for specific metrics, we need to use single_metric
. We might for example want to have a metric named RequestCount
, that has both service and route dimensions.
First, in example/src/app/utils.py
, update the following import to include single_metric
:
Then, in example/src/app/router.py
, add the following code:
CloudWatch showtime
Let’s hit the API a few times on different routes, as well as the /fail
endpoint. Then, navigate to CloudWatch and click on All metrics on the left-hand side. Then, click on the Namespace you specified. You should see the following.
First, click on function_name, service. Here you should see a metric that counts the number of times a function has been subject to a cold start which is automatically added by the Lambda Powertools metrics utility when you add the capture_cold_start_metric=True
parameter.
Next, click on service under the namespace. Here you will be able to see the number of pets that have been created, as well as the number of unhandled exceptions.
Finally, check out route, service. Here you should have metrics describing the total amount of requests to each route.
This showcases how easy it is to add metrics to a FastAPI Lambda application when using Lambda Powertools to take care of the heavy lifting.
Tracing
Now, let’s move on to the final core utility, the Tracer
. The utility makes it easy to instrument your Lambda function to capture traces and send them to AWS X-Ray. AWS X-Ray is a distributed tracing system that can help you analyze and debug distributed applications, and provides a way to track requests as they travel through your different services.
Powertools also has the convenience of auto-patching modules supported by X-Ray, such as boto3
. This will automatically create segments in your traces whenever you use boto3
to call DynamoDB or another AWS service from within your Lambda function, and likewise for other supported modules such as requests
.
Instrumenting
First, we need to enable tracing on the Lambda function and API Gateway. In the SAM template, add the following:
Then, as with the other utilities, we need to decorate our handler. Add the following to example/src/app/__init__.py
:
If you want to measure a specific method call and add it to your traces, you can decorate functions with the capture_method
decorator. For illustrative purposes, add the following to example/src/app/dynamo.py
:
Deploy the API, and send a couple of request to it. Mix it up a little and hit the /fail
endpoint a few times, as well as try to generate some 4xx
responses by trying to access non-existent pets, or create pets with invalid payloads. Go to the CloudWatch console and click on Service Map
on the left-hand side.
Here you can see that AWS X-Ray generated a service map of our entire architecture. We can see that a client has made a request to the API Gateway, which forwarded the request to the Lambda service, which in turn invoked a Lambda function, which then made further requests to the DynamoDB Table. As the system grows, the service map will be an invaluable tool to get a full picture of the entire system. For each node in the map, it is also possible to see the percentage of successful requests, average latency, and other useful metrics.
You can also drill down into individual traces. Below you can see timeline for a request to create a pet.
Advanced features
There are a lot of advanced features in the tracer utility, such as adding metadata and annotations, recording responses, and capturing exceptions.
For example, when we raise PetNotFoundError
, it will be visible in the relevant segment details.
We could also add the correlation ID as an annotation, to allow us to filter and query traces by correlation ID. In the add_correlation_id
middleware, add the following:
Now, by drilling down into the ##handler
segment, you should see the correlation ID in the correlation_id
annotation.
If you want to find traces for a specific ID, you can use the query functionality in the console.
Conclusion
That’s all for now. Hopefully, you have learned a thing or two about how you can use the Lambda Powertools library to implement best practices when it comes to observability for FastAPI applications (and Lambda functions in general). AWS CloudWatch and AWS X-Ray are wonderful tools to help you analyze and monitor serverless applications. Lambda Powertools makes it incredibly easy to start using those services, allowing you to focus on your application logic while leaving the implementation of metrics, logs, and traces to the library.