Architecting backends to serve millions of RPS

1. Architecting backends to serve millions of RPS Conor Gallagher

2. What problem is being solved and why?

3. Problem Statement Zalando adopted streaming architectures in the mid 2010s Pushing Product data into an event bus did not make it easy to interact with for our various business units Would it be possible to serve Product data centrally via API and make it so performant that the distributed Product datastores become redundant? 3

4. Requirements New Product Read API will be a tier one service serving Product data to Zalando’s Fashion Stores and internal systems across Europe. 4 ● Low Latency < 50ms p99 per single-get ● High Throughput: Millions requests per second ● High Availability ● Support for Batch Retrieval ● Handle Hot Products

5. Hot Product

6. Product Read API (PRAPI)

7.

8. Single GET Performance 8

9. Batch GET Performance 9

10. How do we achieve this performance?

11. Load Balancing

12. Hash some part of the incoming request to determine its location on the ring. LB will always send traffic to the closest POD to the right on the ring.

13. Consistent Hash Load Balancing for Products Use the product-id from the request URL as input to the LB strategy to consistently route product requests to particular pods. As Product Catalogs are only ever partially hot, a small bounded cache on each pod with a short TTL would have a huge impact. By hot, we mean popular products, think of your basic white t-shirts or new Nike shoes under campaign. 13

14. Extract the product-id from the path of the incoming request Hash the product-id to determine its location on the ring and the POD that will get the traffic

15. Skipper is configured to round-robin batch requests across a dedicated Batch deployment. This deployment makes N parallel consistently-routed requests to the Single Get deployment

16. Consistent Hash Load Balancing for Products Scaling Activities should not cause Mass Cache Invalidation https://github.com/zalando/skipper/issues/1712 ● Enter each pod at 100 random locations on the ring. ● Results in 1/N cache invalidations, where N is the total number of pods Avoid Overloading a single POD. Requests should spill-over consistently into neighbouring PODs: https://github.com/zalando/skipper/issues/1769 ● Introduce Bounded Load, with conﬁgurable loading factor ● A single pod can only ever serve N (1.5) times more requests than the average 16

17. Async vs Non Blocking What’s the difference?

18. Non Blocking IO Product-Sets -> Products API (NIO JAX-RS client) DynamoDB Client (Async NIO using Netty)

19. NIO - Resource Utilisation Under Load 10,000 Outbound requests per second: ● 4 CPU cores request limit ● 16 Active Threads ● 20% CPU Utilization

20. Caffeine Async Loading Cache https://github.com/ben-manes/caffeine

21.

22. Garbage Collection (GC) Tuning

23. GC Tuning - Before

24. GC Tuning - Fix

25. GC Tuning - After

26. Questions?