
Yunji Zhong Amit Gud Carlos Herrera

Real-time event processing is a critical component of a distributed system's scalability. At DoorDash, we rely on message queue systems based on Kafka to handle billions of real-time events. One of the challenges we face, however, is how to properly validate the system before going live.


Traditionally, an isolated environment such as staging is used to validate new features. But setting up a different data traffic pipeline in a staging environment to mimic billions of real-time events is difficult and inefficient, while requiring ongoing maintenance to keep data up-to-date. To address this challenge, the team at DoorDash embraced testing in production via a multi-tenant architecture, which leverages a single microservice stack for all kinds of traffic including test traffic.

传统上,会使用隔离环境(如 staging)来验证新功能。但是在一个隔离环境中设置一个不同的数据流量管道,以模拟数十亿条实时事件是困难且低效的,同时需要持续维护以保持数据的最新状态。为了解决这个挑战,DoorDash 团队通过多租户架构在生产环境中采用了在生产环境中进行测试的方式,该方式利用了单个微服务堆栈处理各种流量,包括测试流量。

In such a multi-tenant architecture, the isolation is implemented at the infrastructure layer. We will delve here into how we set up multi-tenancy with a messaging queue system based on Kafka.


DoorDash has pioneered the testing in production which utilizes the production environment for end-to-end testing. This provides a number of advantages including reduced operational overhead. But this also brings forth interesting challenges around isolating production and test traffic flowing through the same stack. We solve this using a fully multi-tenant architecture where data and traffic is isolated at the infrastructural layer with minimal interference with the application logic.


Multi-tenancy involves designing a software application and its supporting infrastructur...


