Implementing the Transactional Outbox Pattern for Serverless Domain Events
1. Introduction
Events are the key part of an Event-driven Architecture (EDA). They are the mechanism that allows services to communicate with each other in a decoupled way.
For example, an Order Service emits an OrderCreated event when a new order is created. The Shipping Service then consumes this event that starts the shipping process.
But how can you ensure that you can reliably publish events? Take a look at the following code:
This looks straightforward, right? The order service receives a request (or a command) to create a new order. It validates the order, persists it, and publishes a domain event.
When building distributed systems, eight fallacies describe false assumptions that developers often make. One of which is the network is reliable.
Look at the above code again. What happens if the publishDomainEvent
call fails? The order will be persisted, but no domain event will be published. If you don’t handle this case, you will end up with an inconsistent state. The shipping service will not initiate the shipping process.
There are many solutions of varying complexity to this problem. In this article, you will see how to implement the Transactional Outbox Pattern to solve it in a Serverless environment.
2. What is the Transactional Outbox Pattern?
In the transactional outbox pattern, you persist updates to business entities and any domain events related to those updates in the same transaction. A separate process reads the persisted domain events and publishes them to a message broker. The transactional nature ensures that the entity and related events are either all persisted or none of them are.
In the diagram above, when the Order Service receives a request, it:
- Persists any updates to business entities in the Order Table and any related domain events to the Outbox Table in a single transaction.
- The Message relay process gets the domain events from the Outbox Table.
- The Message relay process publishes the domain events to a message broker.
3. Using DynamoDB Streams for Change Data Capture
DynamoDB Streams is a powerful feature of DynamoDB that lets you capture all item-level changes made to a table. When you enable a stream on a table, you can configure a Lambda function to be invoked on every change to the table. Streams are fully managed by AWS and are a perfect fit to implement the Transactional Outbox Pattern.
Looking back at the code in the Introduction, we were worried about what happens if the publishDomainEvent
call fails after the persistOrder
call succeeds. DynamoDB streams can help us solve this.
Using streams, the Lambda function only needs to persist the order to the OrdersTable. This change will be sent through the stream to another Lambda function, which publishes the domain events. If there are any errors when publishing the events, the Lambda function will retry the operation until it succeeds or until the records expire.
4. Implementing the Pattern in Serverless Applications
4.1. Naive approach
Let’s start with the naive approach. We will use the following architecture:
In this pattern, the CreateOrder Lambda function persists the order to the OrdersTable. The table is configured with a DynamoDB stream that triggers the PublishEvents
Lambda function on every change to the table.
The PublishEvents
Lambda must parse the change and determine what kind of domain event to send. The code might look something like this:
This might look like a good and simple solution at first. But it has some drawbacks:
-
Coupling between the PublishEvents Lambda and the OrdersTable
The PublishEvents Lambda function is coupled to the schema of OrdersTable. If you change the schema, you must also change the PublishEvents Lambda function.
-
Complex logic in PublishEvents to handle different events
If you want to publish other events, such as OrderCancelled and OrderUpdated, you must add complex logic to the PublishEvents Lambda function. This means that you will inevitably have to duplicate the core domain logic since you must correctly determine which domain event to publish depending on what the change contains.
-
If you are using Single-Table Design in DynamoDB
If you use a single-table design for your DynamoDB table, the complexity in PublishEvents will be even greater. You might have multiple entities in the OrdersTable, such as Orders and OrderItems. The PublishEvents Lambda function must now determine what kind of entity has changed and what type of event to emit for that entity.
Single-Table Design. A silver bullet?
There are a lot of advocates for single-table design in DynamoDB. It is, however, not a silver bullet that solves all problems. This article by Pete Naylor tells a different story and is well worth a read.
The above drawbacks are not ideal. They force you to move core domain logic to the outer bounds of your application while simultaneously coupling the logic to your database schema.
There’s a better approach.
4.2. Use a Separate Outbox Table and DynamoDB Transactions
This pattern introduces another table, the OutboxTable. It relies on DynamoDB transactions to atomically persist both the entity and domain event in a transaction. This means that either both the entity and domain events are persisted or none of them are.
Now, the PublishEvents Lambda function only relays events from the OutboxTable to EventBridge. The code might look something like this:
The logic to determine what events to publish is kept within the core domain logic inside the CreateOrder function. The outer bounds of the application are responsible for persisting the order and any domain events in an atomic transaction.
This approach has several benefits:
-
The PublishEvents Lambda is decoupled
The PublishEvents Lambda function is now decoupled from the schema of the OrdersTable. It doesn’t need to know how entities are persisted.
-
The PublishEvents Lambda has low complexity
The PublishEvents Lambda function is now straightforward. It only needs to relay events from the OutboxTable to EventBridge.
-
Core domain logic doesn’t leak
The core domain logic is kept inside the domain layer. It doesn’t leak to the outer bounds of the application.
-
The OutboxTable can support different events and entities
The OutboxTable can support different events and entities. You can add new events and entities without changing the PublishEvents Lambda function.
5. Handling failures
The Transactional Outbox Pattern is all about reliably publishing events. Even though most of the services used are fully managed by AWS, errors can still occur. Let’s look at what happens if the PublishEvents Lambda function fails to publish events.
DynamoDB streams send batches of records to the PublishEvents Lambda function. If an invocation of the PublishEvents function fails, the entire batch will be retried until it succeeds or until the records expire.
For example, imagine that a batch of ten records is sent to the PublishEvents Lambda function, and the function fails to publish the 10th event. The entire batch is retried, and this time it succeeds. Events 1-9 will now have been published twice.
There are a couple of different ways to handle this:
-
Making clients idempotent
Make sure that clients that consume the events implement idempotency into their operations. This means that they must be able to handle the same event being published multiple times. I’d recommend always doing this since many AWS services use at-least-once delivery.
Even if you can guarantee that you only publish an event once, the client could still receive the same event more than once from EventBridge itself.
For a deeper dive on idempotency, refer to this excellent article written by Allen Helton.
-
Using a batch size of 1
You can configure the Lambda function to only receive one record at a time. If the function fails, it will only retry that single record. This is the simplest option, but it could increase the number of invocations and the cost of the Lambda function. This could also increase latency if you write many events to the OutboxTable.
-
Bisecting batch failures
A more advanced concept in DynamoDB streams lets you bisect a batch when an invocation fails. When an invocation fails in the middle of a batch, a function can report the last successfully processed record. The batch is then bisected at this record so that only the unprocessed records are retried.
6. Limitations
Since this pattern relies on DynamoDB transactions, it inherits the same limitations. Transactions are limited to 100 items. This means that you cannot publish, say, 150 domain events in a single transaction.
I can’t really see a reason for publishing that many events from a single operation, so this shouldn’t be a problem in practice.
7. Conclusion
The Transactional Outbox Pattern is a great pattern to publish domain events reliably. DynamoDB streams let you easily implement the pattern in your Serverless applications. Combined with DynamoDB transactions, your domain logic can be kept nicely encapsulated in the domain layer, while the outer parts of the application can be kept as simple as possible.