公司:lyft
来福车(英语:Lyft)是一家交通网络公司,总部位于美国加利福尼亚州旧金山,以开发移动应用程序连结乘客和司机,提供载客车辆租赁及媒合共乘的分享型经济服务。乘客可以通过发送短信或是使用移动应用程序来预约车辆,利用移动应用程序时还可以追踪车辆位置。
Lyft 拥有 30% 的市场份额,是美国仅次于优步的第二大的叫车公司。
Protocol Buffer Design: Principles and Practices for Collaborative Development
At Lyft Media, we’re obsessed with building flexible and highly reliable native ad products. Since our technical stack encompasses mobile clients on both iOS and Android, as well as multiple backend services, it is crucial to ensure robust and efficient communication between all involved entities. For this task we are leveraging Protocol Buffers, and we would like to share the best practices that are helping us achieve this goal. This article focuses on our experience addressing the challenges that come with collaborating on shared protocols in teams where people with different levels of familiarity and historical context, or even people outside the team, get to contribute. The problem of development process quality is prioritized over raw efficiency optimizations.
Building Lyft’s Next Emblem — Glow
Long time riders might remember the original fuzzy, pink Carstache emblem that made Lyft universally recognizable. Over the years, the emblem dropped the fuzz for pink lights in the Glowstache and later evolved with more colors as the beloved Amp, which has been in active use for over seven years! Recently, Lyft has introduced its brighter, bolder next generation emblem — Glow. Glow provides a daytime visible, auto-dimmable display showing rider customizable colors and new animations to help them find their ride faster. Glow also has enhanced GPS and IMU sensors for improved driver location accuracy.
ETA (Estimated Time of Arrival) Reliability at Lyft
文章介绍了Lyft如何使用基于树的分类模型来提供准确的乘车预计时间(ETA)和可靠性预测。他们利用多种特征,如附近可用司机、历史数据、市场状况等,通过训练模型来预测乘车时间和可靠性。他们还不断改进模型,整合更多实时信号以捕捉动态的市场状况。Lyft的目标是提高服务的可靠性,为乘客提供准确可靠的信息。
Keeping OSM fresh, accurate, and navigation-worthy at Lyft
Lyft选择OpenStreetMap (OSM)作为地图数据来源,并通过司机的遥测数据和真实数据更新地图。司机的数据反馈和图像收集项目使地图能够及时更新道路变化,从而实现准确导航、提高客户满意度,并优化路线以减少燃料消耗和环境影响。
Crafting Seamless Journeys with Live Activities
In this edition, we transition from pixels to code, exploring the technical side that orchestrates Live Activities at Lyft.
Lyft’s Reinforcement Learning Platform
Tackling decision making problems with a platform for developing & serving Reinforcement Learning models with a focus on Contextual Bandits
Postgres Aurora DB major version upgrade with minimal downtime
为了升级数据库并减少停机时间,需要执行以下步骤:在下游的pod中设置返回503错误的GET请求;启用断路器以保护数据库;将PG10数据库设置为只读模式,并验证写入事务是否被禁用;断开所有与PG10数据库的连接;检查复制延迟,确保PG10和PG13数据库同步;重置序列以避免序列号冲突;更新Route53,将数据库连接字符串指向PG13数据库;验证Route53的DNS更新;通过在应用程序pod中运行写入脚本来验证PG13数据库的写入功能;关闭断路器,恢复应用程序的入口流量。这种蓝绿部署方法在升级数据库时成功减少了停机时间。感谢Shailesh Rangani和Suyog Pagare的Postgres专业知识,使得这次升级的停机时间最小化。
Python Upgrade Playbook
该文介绍了Lyft团队在Python升级方面的经验和做法。团队通过定期发送更新邮件和使用Slack渠道来分享和回答问题,以及利用新功能只能在较新版本的Python中使用来激励升级。团队成功升级了1500多个代码库,并且没有遇到重大问题,得益于他们优秀的CI和预发布环境。他们的升级速度越来越快,并且在其他重大项目的同时取得了进展。他们的工作也带来了其他好处,例如加快了开发流程和数据集的标准化。团队计划将他们的工具推广到整个基础架构,以跟踪所有升级和推广最佳实践。
Druid Deprecation and ClickHouse Adoption at Lyft
ClickHouse是一个开源的高性能面向列的数据库,用于在线分析处理。Lyft决定扩展ClickHouse并废弃Druid,将现有的Druid用例迁移到ClickHouse。ClickHouse相对于Druid具有简化的基础设施管理、较低的学习曲线、数据去重、较低的成本和专门的引擎等优势。Lyft通过基准测试和性能分析来评估ClickHouse,并进行了平滑的迁移过程。他们在Lyft使用ClickHouse的架构是基于Altinity的Kubernetes Operator,在HA模式下运行,使用AWS M5类型的计算实例和EBS卷进行存储。数据的摄取主要通过Kafka和Kinesis进行,并通过内部代理和可视化工具进行读取查询。Lyft在ClickHouse上处理大量数据,并对查询性能进行了优化,包括使用排序键、跳过索引和投影等技术。他们在ClickHouse上处理多个用例,包括市场健康、政策报告、花费追踪、预测和实验等。然而,在使用ClickHouse过程中也遇到了一些问题,如查询缓存性能和与Kafka集成的问题。此外,Lyft计划进一步扩展ClickHouse的使用,包括稳定批处理架构和使用流式Kinesis摄取。他们还计划将Flink SQL迁移到ClickHouse,并考虑使用ClickHouse Keeper替代ZooKeeper以减少外部组件依赖。
From Big Data to Better Data: Ensuring Data Quality with Verity
High-quality data is necessary for the success of every data-driven company. It enables everything from reliable business logic to insightful decision-making and robust machine learning modeling. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. As such, we have reached a point where the quantity of data is no longer a boundary. Yet this has come at the cost of quality.
In this post we will define data quality at a high-level and explore our motivation to achieve better data quality. We will then introduce our in-house product, Verity, and showcase how it serves as a central platform for ensuring data quality in our Hive Data Warehouse. In future posts we will discuss how Verity addresses data quality elsewhere in our data platform.
Building a Control Plane for Lyft’s Shared Development Environment
Note: This publication assumes you have basic familiarity with the service mesh pattern (e.g. Istio, Linkerd, Envoy — created at Lyft!)
Where’s My Data — A Unique Encounter with Flink Streaming’s Kinesis Connector
For years now, Lyft has not only been a proponent of but also a contributor to Apache Flink. Lyft’s pipelines have evolved drastically over the years, yet, time and time again, we run into unique cases that stretch Flink to its breaking points — this is one of those times.
Building Real-time Machine Learning Foundations at Lyft
In early 2022, Lyft already had a comprehensive Machine Learning Platform called LyftLearn composed of model serving, training, CI/CD, feature serving, and model monitoring systems.
On the real-time front, LyftLearn supported real-time inference and input feature validation. However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others.
While several teams were using streaming data in their Machine Learning (ML) workflows, doing so was a laborious process, sometimes requiring weeks or months of engineering effort. On the flip side, there was a substantial appetite to build real-time ML systems from developers at Lyft.
Lyft is a real-time marketplace and many teams benefit from enhancing their machine learning models with real-time signals.
To meet the needs of our customers, we kicked off the Real-time Machine Learning with Streaming initiative. Our goal was to develop foundations that would enable the hundreds of ML developers at Lyft to efficiently develop new models and enhance existing models with streaming data.
In this blog post, we will discuss some what we built in support of that goal and the lessons we learned along the way.
Gotchas of Streaming Pipelines: Profiling & Performance Improvements
Discover how Lyft identified and fixed performance issues in our streaming pipelines.
Building a large scale unsupervised model anomaly detection system — Part 2
Building ML Models with Observability at Scale.
Building a large scale unsupervised model anomaly detection system — Part 1
Distributed Profiling of Model Inference Logs.