BIGO基于Pulsar在高吞吐追赶读场景下的性能优化实践
如果无法正常显示,请先停止浏览器的去广告插件。
1. BIGO's Performance Optimization
Practice in High-throughput Catch-up
Read Scenarios Based on Pulsar
Zhanpeng Wu
2. Contents
• Background
• Measurement Study
• RA System Designing
• Read Acceleration Architecture
• Evaluation
• Conclusion
3. Background
Difference between TAILING read & CATCHUP read?
Our catchup read scenarios look like?
Why does catchup read hurt system performance?
4. Tailing Read & Catchup Read
5. Data in Specified Time Range
6. Performance Comparison
7. Performance Loss in Catchup Read
8. Measurement Study
How to build a performance monitoring system?
What is the most time-consuming stage in a read request?
9. Dataflow under Multi-layer Cache
10. Measurement Metrics
BP-44: https://github.com/apache/bookkeeper/issues/2834
11. Results
12. System Designing
Why do we need a whole new read-ahead system?
What an asynchronous read-ahead system should look like under ideal
conditions
13. Current Read-ahead Mechanism
14. Principles in Read-ahead Mode
• When should the read-ahead be triggered?
• Sequential read behavior should trigger read-ahead;
• Only read a single entry from disk should not trigger read-ahead;
• When should read-ahead locations be recorded?
• When all levels of cache fail to hit the target entry, a disk read must be triggered.
Before returning the entry, put the `entry+1` position into the `pending_ra_map`.
• When the asynchronous read-ahead task is completed, the `pre_ra_pos position` (the
default value is the position of the entry in the 75th percentile of the read-ahead entry
list) is put into the `pending_ra_map`;
15. Principles in Read-ahead Mode
• When should the read-ahead task actually be submitted?
• When the target entry exists in `pending_ra_map`, the read-
ahead task will be submitted asynchronously in the background.
• How does the read-ahead window change?
• Currently it is a fixed size, and the relative parameters are
configurable.
16. Principles in Read-ahead Mode
• How to read an entry that its corresponding read-ahead task has not yet completed?
• If the target entry is belong to an un-completed read-ahead task, it will block and wait until the
read-ahead task is completed, and then return the entry data.
• Sub-question: Granularity of blocking?
• At present, the blocking granularity is the window size of a read-ahead task, not the granularity
of each entry, so as to avoid creating too many locks.
• Where is the read-ahead data stored?
• The data generated by the read-ahead task is stored in
`org.apache.bookkeeper.bookie.storage.ldb.ReadCache`. The current cache structure is
reused, and the data exists in the off-heap space.
17. Acceleration Architecture
Detail implementation of read-ahead system
18. Overview
BP-49: https://github.com/apache/bookkeeper/issues/3085
19. Dataflow in Async Read-ahead
20. Detail Implementation
21. Detail Implementation
22. Detail Implementation
23. Evaluation
Evaluation metrics designing
The actual effect of the evaluation results
24. Metrics
• Summary
• read-ahead total time
• read-ahead async queue time
• read-ahead async execution time
• read entry blocking time
• Counter
• hit ReadCache count
• miss ReadCache count
• read-ahead entries count
• read-ahead bytes count
25. Evaluation Results
test-env: Hit / Miss Count (~50MB/s/node writes )
26. Evaluation Results
test-env: Cluster/Bookie-wide P99 & AVG Read Time (~50MB/s/node writes)
27. Evaluation Results
prod-env: Bookie-wide P50 & P99 & AVG Read Time (2~3GB/s/cluster
reads)
28. Evaluation Results
prod-env: Bookie-wide P50 & P99 & AVG Read Time | SSD rocksdb
29. Conclusion
A brief summary of our optimizations
30. Conclusion
• This talk proposes a new asynchronous read-ahead system,
which can effectively improve the efficiency of catchup-read.
• The work expands a lot of performance metrics of the read-ahead
system under the original monitoring system, laying a solid
foundation for the performance analysis of read latency.
• The new system has been running stably within BIGO for several
months and has fully served the machine learning platform, and
the training jobs on it have achieved lower read latency.
31. Thanks