BIGO基于Pulsar在高吞吐追赶读场景下的性能优化实践
如果无法正常显示,请先停止浏览器的去广告插件。
        
                相关话题:
                                    #Pulsar
                            
                        
                1. BIGO's Performance Optimization
Practice in High-throughput Catch-up
Read Scenarios Based on Pulsar
Zhanpeng Wu            
                        
                2. Contents
• Background
• Measurement Study
• RA System Designing
• Read Acceleration Architecture
• Evaluation
• Conclusion            
                        
                3. Background
Difference between TAILING read & CATCHUP read?
Our catchup read scenarios look like?
Why does catchup read hurt system performance?            
                        
                4. Tailing Read & Catchup Read            
                        
                5. Data in Specified Time Range            
                        
                6. Performance Comparison            
                        
                7. Performance Loss in Catchup Read            
                        
                8. Measurement Study
How to build a performance monitoring system?
What is the most time-consuming stage in a read request?            
                        
                9. Dataflow under Multi-layer Cache            
                        
                10. Measurement Metrics
BP-44: https://github.com/apache/bookkeeper/issues/2834            
                        
                11. Results            
                        
                12. System Designing
Why do we need a whole new read-ahead system?
What an asynchronous read-ahead system should look like under ideal
conditions            
                        
                13. Current Read-ahead Mechanism            
                        
                14. Principles in Read-ahead Mode
• When should the read-ahead be triggered?
• Sequential read behavior should trigger read-ahead;
• Only read a single entry from disk should not trigger read-ahead;
• When should read-ahead locations be recorded?
• When all levels of cache fail to hit the target entry, a disk read must be triggered.
Before returning the entry, put the `entry+1` position into the `pending_ra_map`.
• When the asynchronous read-ahead task is completed, the `pre_ra_pos position` (the
default value is the position of the entry in the 75th percentile of the read-ahead entry
list) is put into the `pending_ra_map`;            
                        
                15. Principles in Read-ahead Mode
• When should the read-ahead task actually be submitted?
• When the target entry exists in `pending_ra_map`, the read-
ahead task will be submitted asynchronously in the background.
• How does the read-ahead window change?
• Currently it is a fixed size, and the relative parameters are
configurable.            
                        
                16. Principles in Read-ahead Mode
• How to read an entry that its corresponding read-ahead task has not yet completed?
• If the target entry is belong to an un-completed read-ahead task, it will block and wait until the
read-ahead task is completed, and then return the entry data.
• Sub-question: Granularity of blocking?
• At present, the blocking granularity is the window size of a read-ahead task, not the granularity
of each entry, so as to avoid creating too many locks.
• Where is the read-ahead data stored?
• The data generated by the read-ahead task is stored in
`org.apache.bookkeeper.bookie.storage.ldb.ReadCache`. The current cache structure is
reused, and the data exists in the off-heap space.            
                        
                17. Acceleration Architecture
Detail implementation of read-ahead system            
                        
                18. Overview
BP-49: https://github.com/apache/bookkeeper/issues/3085            
                        
                19. Dataflow in Async Read-ahead            
                        
                20. Detail Implementation            
                        
                21. Detail Implementation            
                        
                22. Detail Implementation            
                        
                23. Evaluation
Evaluation metrics designing
The actual effect of the evaluation results            
                        
                24. Metrics
• Summary
• read-ahead total time
• read-ahead async queue time
• read-ahead async execution time
• read entry blocking time
• Counter
• hit ReadCache count
• miss ReadCache count
• read-ahead entries count
• read-ahead bytes count            
                        
                25. Evaluation Results
test-env: Hit / Miss Count (~50MB/s/node writes )            
                        
                26. Evaluation Results
test-env: Cluster/Bookie-wide P99 & AVG Read Time (~50MB/s/node writes)            
                        
                27. Evaluation Results
prod-env: Bookie-wide P50 & P99 & AVG Read Time (2~3GB/s/cluster
reads)            
                        
                28. Evaluation Results
prod-env: Bookie-wide P50 & P99 & AVG Read Time | SSD rocksdb            
                        
                29. Conclusion
A brief summary of our optimizations            
                        
                30. Conclusion
• This talk proposes a new asynchronous read-ahead system,
which can effectively improve the efficiency of catchup-read.
• The work expands a lot of performance metrics of the read-ahead
system under the original monitoring system, laying a solid
foundation for the performance analysis of read latency.
• The new system has been running stably within BIGO for several
months and has fully served the machine learning platform, and
the training jobs on it have achieved lower read latency.            
                        
                31. Thanks