Airbnb如何保障生产的变化
By: Mike Lin, Preeti Ramasamy, Toby Mao, Zack Loebel-Begelman
作者Mike Lin,Preeti Ramasamy,Toby Mao,Zack Loebel-Begelman
In our first post we discussed the need for a near real time Safe Deploy system and some of the statistics that power its decisions. In this post we will cover the architecture and engineering choices behind the various components that Safe Deploys comprises.
在第一篇文章中,我们讨论了对一个近乎实时的安全部署系统的需求,以及支持其决策的一些统计数据。在这篇文章中,我们将介绍Safe Deploys所包括的各种组件背后的架构和工程选择。
Designing a near real-time experimentation system required making explicit tradeoffs among speed, precision, cost, and resiliency. An early decision was to limit near real-time results to only the first 24 hours of an experiment — enough time to catch any major issues and transition to using comprehensive results from the batch pipeline. The idea being once batch results were available, experimenters would no longer need real time results. The following sections describe the additional design decisions in each component of the Safe Deploys system.
设计一个近实时的实验系统需要在速度、精度、成本和弹性之间做出明确的权衡。早期的决定是将近实时结果限制在实验的前24小时内--有足够的时间来捕捉任何重大问题,并过渡到使用批处理的综合结果。我们的想法是,一旦有了批处理结果,实验者就不再需要实时的结果。下面的章节描述了安全部署系统的每个组成部分的额外设计决定。
High Level Design
高层设计
There are 3 major components that make up the technical footprint of the Safe Deploys system:
有3个主要组成部分构成了Safe Deploys系统的技术足迹。
- Ramp Controller, a Flink job that acts as a centralized coordinator, providing experiment configuration to NRT via Kafka and invoking statistical computations by calling Measured via HTTP.
- Ramp Controller,一个Flink作业,作为一个集中的协调者,通过Kafka向NRT提供实验配置,并通过HTTP调用Measured来调用统计计算。
- Near Real Time (NRT) pipeline, another Flink job that extracts measures, joins and enriches those measures with assignment information (treatment and subject information), and stores the enriched measures into S3.
- 近实时(NRT)管道,是另一个Flink工作,它提取措施,用分配信息(治疗和受试者信息)连接和丰富这些措施,并将丰富的措施存储到S3。
- Measured, a python library (invoked via a Python HTTP server and worker pool) that consumes enriched measures from S3, aggregates them, and runs stats to determine if...