Lyft 拥有 30% 的市场份额，是美国仅次于优步的第二大的叫车公司。
ClickHouse是一个开源的高性能面向列的数据库，用于在线分析处理。Lyft决定扩展ClickHouse并废弃Druid，将现有的Druid用例迁移到ClickHouse。ClickHouse相对于Druid具有简化的基础设施管理、较低的学习曲线、数据去重、较低的成本和专门的引擎等优势。Lyft通过基准测试和性能分析来评估ClickHouse，并进行了平滑的迁移过程。他们在Lyft使用ClickHouse的架构是基于Altinity的Kubernetes Operator，在HA模式下运行，使用AWS M5类型的计算实例和EBS卷进行存储。数据的摄取主要通过Kafka和Kinesis进行，并通过内部代理和可视化工具进行读取查询。Lyft在ClickHouse上处理大量数据，并对查询性能进行了优化，包括使用排序键、跳过索引和投影等技术。他们在ClickHouse上处理多个用例，包括市场健康、政策报告、花费追踪、预测和实验等。然而，在使用ClickHouse过程中也遇到了一些问题，如查询缓存性能和与Kafka集成的问题。此外，Lyft计划进一步扩展ClickHouse的使用，包括稳定批处理架构和使用流式Kinesis摄取。他们还计划将Flink SQL迁移到ClickHouse，并考虑使用ClickHouse Keeper替代ZooKeeper以减少外部组件依赖。
High-quality data is necessary for the success of every data-driven company. It enables everything from reliable business logic to insightful decision-making and robust machine learning modeling. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. As such, we have reached a point where the quantity of data is no longer a boundary. Yet this has come at the cost of quality.
In this post we will define data quality at a high-level and explore our motivation to achieve better data quality. We will then introduce our in-house product, Verity, and showcase how it serves as a central platform for ensuring data quality in our Hive Data Warehouse. In future posts we will discuss how Verity addresses data quality elsewhere in our data platform.
Note: This publication assumes you have basic familiarity with the service mesh pattern (e.g. Istio, Linkerd, Envoy — created at Lyft!)
For years now, Lyft has not only been a proponent of but also a contributor to Apache Flink. Lyft’s pipelines have evolved drastically over the years, yet, time and time again, we run into unique cases that stretch Flink to its breaking points — this is one of those times.
In early 2022, Lyft already had a comprehensive Machine Learning Platform called LyftLearn composed of model serving, training, CI/CD, feature serving, and model monitoring systems.
On the real-time front, LyftLearn supported real-time inference and input feature validation. However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others.
While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort. On the flip side, there was a substantial appetite to build real-time ML systems from developers at Lyft.
Lyft is a real-time marketplace and many teams benefit from enhancing their machine learning models with real-time signals.
To meet the needs of our customers, we kicked off the Real-time Machine Learning with Streaming initiative. Our goal was to develop foundations that would enable the hundreds of ML developers at Lyft to efficiently develop new models and enhance existing models with streaming data.
In this blog post, we will discuss some what we built in support of that goal and the lessons we learned along the way.
Discover how Lyft identified and fixed performance issues in our streaming pipelines.
Building ML Models with Observability at Scale.
Distributed Profiling of Model Inference Logs.
How Lyft’s ML Platform Saves Time and Money on Big Data/ML Workloads.
Recommendation plays an important role in Lyft’s understanding of its riders and allows for customizing app experiences to better fulfill their needs. At times, recommendations are also leveraged to manage the marketplace, making sure there’s a healthy balance between ride demand and driver supply. This allows ride requests to be fulfilled with more desirable dispatch outcomes such as matching riders with the best driver nearby.
This blog post focuses on the scope and the goals of the recommendation system, and explores some of the most recent changes the Rider team has made to better serve Lyft’s riders.
We know what you’re thinking — testing in production is one of the cardinal sins of software development. However, at Lyft we have come to realize that load testing in production is a powerful tool to prepare systems for unexpected bursty traffic and peak events. We’ll explore why Lyft needed a custom performance testing framework that worked in production, how we built a cross-functional solution, and how we’ve continued to improve this testing platform since its launch in 2016.
What exactly do we mean by “Load Testing”? In the context of this article we mean any tool that creates traffic to stress test systems and see how they perform at the limits of their capacity.
Graph learning methods can reveal interesting insights that capture the underlying relational structures. Graph learning methods have many industry applications in areas such as product or content recommender systems and network analysis.
In this post, we discuss how we use graph learning methods at Lyft to generate embeddings — compact vector representation of high-dimensional information. We will share interesting rideshare insights uncovered by embeddings of riders, drivers, locations, and time. As the examples will show, trained embeddings from graphs can represent information and patterns that are hard to capture with traditional, straightforward features.
Across the past couple of years, different mobile app teams across Lyft have been moving to Server Driven UI (SDUI) for three main reasons:
- To deal with business complexity
- To increase release velocity
- To be more flexible in how we staff and build features
This post is about Lyft Bikes and Scooters’ journey to SDUI, why we’ve gone down this path, and what’s worked well for us.
The health of Lyft’s marketplace depends on how riders and drivers are distributed across space and time. Within the complex rideshare space, it is not easy to define typical marketplace concepts like “market efficiency” and “supply-demand balance”. A simple question such as “Do we have enough drivers right now?” has different answers depending on context:
- Are there enough drivers in the right places to maintain good service levels?
- Are there enough drivers system-wide, assuming a ride request will be accepted no matter how far away it is?
- Are there enough to maintain an attractive earning rate?
Each question leads in a different direction. Being able to answer such questions is the interesting (and challenging!) part of operating a healthy two-sided marketplace.
Hundreds of millions of real-time decisions are made each day at Lyft by online machine learning models. These model-based decisions include price optimization for rides, incentives allocation for drivers, fraud detection, ETA prediction, and innumerable others that impact how riders move and drivers earn.
Lyft hosts a dynamic marketplace connecting millions of people to a robust transportation network. In order to offer high value and quality service for both riders and drivers we need to make complex optimization decisions in near-real time. The environment can change quickly with traffic, events and weather, making these decisions even more challenging.
We have employed multi-arm bandits (MAB) algorithms, a common machine learning method for decision making using long-term rewards, to improve our real-time decision making capability. MABs allow us to not only iterate at a faster cadence and lower cost, but also allow for dynamic user experiences and responsive marketplace systems. We will walk through some of our most impactful MAB applications in UI optimization and personalized messaging, concluding with applications in our marketplace algorithms.