Architecting backends to serve millions of RPS
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. Architecting backends to
serve millions of RPS
Conor Gallagher
2. What problem is being solved and why?
3. Problem Statement
Zalando adopted streaming architectures in the mid 2010s
Pushing Product data into an event bus did not make it easy
to interact with for our various business units
Would it be possible to serve Product data centrally via API
and make it so performant that the distributed Product
datastores become redundant?
3
4. Requirements
New Product Read API will be a tier one service serving
Product data to Zalando’s Fashion Stores and internal
systems across Europe.
4
● Low Latency < 50ms p99 per single-get
● High Throughput: Millions requests per second
● High Availability
● Support for Batch Retrieval
● Handle Hot Products
5. Hot Product
6. Product Read API (PRAPI)
7.
8. Single GET Performance
8
9. Batch GET Performance
9
10. How do we achieve this performance?
11. Load Balancing
12. Hash some part of
the incoming
request to
determine its
location on the ring.
LB will always send
traffic to the closest
POD to the right on
the ring.
13. Consistent Hash Load Balancing for Products
Use the product-id from the request URL as input to the LB
strategy to consistently route product requests to particular
pods.
As Product Catalogs are only ever partially hot, a small
bounded cache on each pod with a short TTL would have a
huge impact. By hot, we mean popular products, think of
your basic white t-shirts or new Nike shoes under campaign.
13
14. Extract the
product-id from the
path of the incoming
request
Hash the product-id
to determine its
location on the ring
and the POD that
will get the traffic
15. Skipper is
configured to
round-robin batch
requests across a
dedicated Batch
deployment.
This deployment
makes N parallel
consistently-routed
requests to the
Single Get
deployment
16. Consistent Hash Load Balancing for Products
Scaling Activities should not cause Mass Cache Invalidation
https://github.com/zalando/skipper/issues/1712
● Enter each pod at 100 random locations on the ring.
● Results in 1/N cache invalidations, where N is the total
number of pods
Avoid Overloading a single POD. Requests should spill-over
consistently into neighbouring PODs:
https://github.com/zalando/skipper/issues/1769
● Introduce Bounded Load, with configurable loading
factor
● A single pod can only ever serve N (1.5) times more
requests than the average
16
17. Async vs Non Blocking
What’s the difference?
18. Non Blocking IO
Product-Sets -> Products API (NIO JAX-RS client)
DynamoDB Client (Async NIO using Netty)
19. NIO - Resource Utilisation Under Load
10,000 Outbound requests per
second:
● 4 CPU cores request limit
● 16 Active Threads
● 20% CPU Utilization
20. Caffeine Async Loading Cache
https://github.com/ben-manes/caffeine
21.
22. Garbage Collection (GC) Tuning
23. GC Tuning - Before
24. GC Tuning - Fix
25. GC Tuning - After
26. Questions?