Fix the Biggest Problem of Domain-Driven Design Using Kafka

In the world of DDD, there's one piece that makes everything fit together: Projections.

What are projections

One of the most common problems when scaling our code is that different teams need access to the same entity. For example, the backoffice team needs access to users, but we don't want to create specific endpoints or give them direct access to our database.

Projections solve this problem by allowing each context to maintain its own version of the data it needs.

For example, in an online course platform, a user has one representation in the platform and a different one in the backoffice.

In the platform, the user will generate all the mutations of their entity (register, change their name, email, etc.). Each of these mutations will publish a domain event:

πŸ“‚ platform
β””πŸ“‚ users
  β”œπŸ“‚ application
  β”‚ β”œπŸ“ change_email
  β”‚ β”œπŸ“ register
  β”‚ β”œβ€¦
  β”‚ β””πŸ“ rename
  β”œπŸ“‚ domain
  β”‚ β”œπŸ“„ User
  β”‚ β”œπŸ“„ UserEmailChanged
  β”‚ β”œπŸ“„ UserRegistered
  β”‚ β”œβ€¦
  β”‚ β”œπŸ“„ UserRenamed
  β”‚ β””πŸ“„ UserRepository
  β””πŸ“‚ infrastructure
    β””πŸ“„ PostgresUserRepository

Then, the users module within the backoffice context will listen to these events and update its own projection:

πŸ“‚ backoffice
β””πŸ“‚ users
  β”œπŸ“‚ application
  β”‚ β”œπŸ“‚ create
  β”‚ β”‚ β””πŸ“„ CreateUserOnUserRegistered
  β”‚ β”œπŸ“‚ update_email
  β”‚ β”‚ β””πŸ“„ UpdateBackofficeUserEmailOnUserEmailChanged
  β”‚ β”œβ€¦
  β”‚ β””πŸ“‚ rename
  β”‚   β””πŸ“„ RenameBackofficeUserOnUserRenamed
  β”œπŸ“‚ domain
  β”‚ β”œπŸ“„ BackofficeUser
  β”‚ β””πŸ“„ BackofficeUserRepository
  β””πŸ“‚ infrastructure
    β””πŸ“„ PostgresBackofficeUserRepository

The backoffice could also listen to other events like course_progress.finished to maintain a list of courses that each user has completed.

This architecture allows both services to scale independently and have their own representation of the entity. Additionally, it gives teams autonomy to model only the data they really need.

The (big) problem with projections

Assuming we already have the infrastructure to publish and listen to domain events, the biggest problem with projections is the amount of repetitive code needed to create them, maintain them, and perform the first data import.

For example, to project users (a very common case), each team must implement similar code for:

  • Listening to the same domain events.
  • Updating their projection with similar logic.
  • Maintaining consistency when new fields are added.

When a new field is added to the user entity, all teams that project this entity must update their code to incorporate this change (if they're interested in it).

In a medium/large company, this implies many hours of development that don't provide direct value to users. And, besides being very repetitive work, it's also expensive given all the queues that need to be created and maintained.

But fortunately, thanks to Kafka (or other data streaming services), we can greatly simplify this process.

Projections with Kafka

Unlike traditional message queues, Kafka works with data streams. This means that consumed messages are not deleted, but rather kept in the stream.

This feature allows us to implement an elegant solution: each time a mutation occurs in the platform, in addition to publishing the specific domain event, we also publish a complete and updated snapshot of the entity in its own topic. This way, we always have the latest version of each entity available.

Now you might ask isn't this the same as sharing the users table? Yes, but no.

The key difference is that, by having it in a data stream, consumers can react to these changes in real time. With a shared table, we would have to implement some additional mechanism to detect changes.

With this approach, the backoffice context is significantly simplified:

πŸ“‚ backoffice
β””πŸ“‚ users
  β”œπŸ“‚ application
  β”‚ β””πŸ“‚ project
  β”‚   β””πŸ“„ BackofficeUserSnapshotProjector
  β”œπŸ“‚ domain
  β”‚ β”œπŸ“„ BackofficeUser
  β”‚ β””πŸ“„ BackofficeUserRepository
  β””πŸ“‚ infrastructure
    β””πŸ“„ PostgresBackofficeUserRepository

The BackofficeUserSnapshotProjector is responsible for reading the snapshots from the users topic and updating the projection. What previously required multiple files and event handlers now reduces to a single component.

What about the first import?

This solution also elegantly resolves the problem of the first data import. In the traditional event-based approach, we would need:

  1. Create a new queue to consume the events of the mutations to project.
  2. Start publishing to that queue.
  3. Manually import (via script, DB dump, etc.) all the historical data. Often this involves setting up a new replica of the source database to avoid bringing down the primary one.
  4. Start consuming events from the queue, discarding those we've already imported previously.

This process requires specific code for the initial import and coordination to create queues and correctly route events.

With Kafka, thanks to having snapshots in the stream, we can do a first import in the following way:

  1. Deploy the new service with the projector.

And that's it. The projector will automatically read all the historical snapshots from the beginning of the stream and update the local projection with all existing data.

What if I have millions of records

Having the entire history of snapshots in a data stream can be a problem if we have millions of records. These streams will start to weigh terabytes of data. Also, if we have to process the entire history when we're only interested in the latest version, we're wasting time and resources.

For this, Kafka offers us compacted topics. These topics ensure that only the latest version of the entity we pass to it, identified by its key, is maintained.

For example, if we publish multiple snapshots of the user with identifier "123", a compacted topic will keep only the latest one, automatically eliminating previous versions during the compaction process.

This way, the data stream doesn't grow indefinitely and reads are much more efficient, especially during the first import or when a projection needs to be reconstructed from scratch.

How to apply it in your project?

One of the big challenges of this architecture is seeing how to implement it:

  • Do we publish a snapshot every time an entity is mutated?
  • Do we do it by listening to domain events and reacting to them?
  • And why not in the repository if that's where mutations definitely occur?

Additionally, in legacy code it can be complicated to implement, since there isn't always a single place where mutations occur. There, a CDC system can be very helpful.

All of this, plus the experience of how Wallapop does it in production with terabytes of data in their compacted topics, we cover in the Projections with Kafka course that we've just launched.

Pay according to your needs

lite (only monthly)

19 €
per month
  • Access to a subset of courses to lay the foundation for maintainable, scalable, and testable code
  • Company invoice
Popular

standard

24,92 €
Save 121 €
Annual payment of 299 €
per month
  • Full course catalog
  • Design and architecture challenges
  • Highlighted solutions videos for challenges
  • Receive job offers verified by Codely
  • Company invoice

premium

41,58 €
Save 89 €
Annual payment of 499 €
per month
  • Everything above
  • More benefits coming soon

We won’t increase the price while your subscription is active