公司:Uber
优步(英语:Uber,/ˈuːbər/)是一间交通网络公司,总部位于美国加利福尼亚州旧金山,以开发移动应用程序连结乘客和司机,提供载客车辆租赁及媒合共乘的分享型经济服务。乘客可以透过应用程序来预约这些载客的车辆,并且追踪车辆的位置。营运据点分布在全球785个大都市。人们可以透过网站或是手机应用程序进入平台。
优步的名称大多认为是源自于德文über,和over是同源,意思是“在…上面”。 (页面存档备份,存于互联网档案馆)
然而其营业模式在部分地区面临法律问题,其非典型的经营模式在部分地区可能会有非法营运车辆的问题,有部分国家或地区已立法将之合法化,例如美国加州及中国北京及上海。原因在于优步是将出租车行业转型成社群平台,叫车的客户透过手机APP(应用程序),就能与欲兼职司机的优步用户和与有闲置车辆的租户间三者联系,一旦交易成功即按比例抽佣金、分成给予反馈等去监管化的金融手法。
2019年5月10日,优步公司透过公开分发股票成为上市公司,但首日即跌破分发价。
据估算,优步在全球有1.1亿活跃用户,在美国有69%的市占率。优步亦在大中华区开展业务,目前优步已在香港和台湾建成主流召车服务平台,并于中国大陆通过换股方式持有该市场最大网约车出行平台滴滴出行母公司小桔科技17.7%经济权益。
Balancing HDFS DataNodes in the Uber DataLake
Uber has one of the largest Apache HadoopⓇ Distributed File System (HDFS) deployments in the world, with exabytes of data across tens of clusters. HDFS team at Uber had to solve the problem of…
Model Excellence Scores: A Framework for Enhancing the Quality of Machine Learning Systems at Scale
With the introduction of Model Excellence Scores at Uber, we're setting a new standard for measuring, monitoring, and maintaining ML model quality–read how this innovative approach aims to enhance ML…
Scaling AI/ML Infrastructure at Uber
Machine Learning (ML) is celebrating its 8th year at Uber since we first started using complex rule-based machine learning models for driver-rider matching and pricing teams in 2016. Since then, our…
How LedgerStore Supports Trillions of Indexes at Uber
Uber connects the physical and digital worlds to help make movement happen at the tap of a button. Billions of trips, deliveries, and tens of billions of financial transactions across earners,…
Migrating a Trillion Entries of Uber’s Ledger Data from DynamoDB to LedgerStore
This week, we'll dive into how we migrated Uber’s business-critical ledger data to LedgerStore. We'll detail how we moved more than a trillion entries (making up a few petabytes of data) transparently…
CheckEnv: Fast Detection of RPC Calls Between Environments Powered by Graphs
CheckEnv是Uber开发的解决微服务架构中复杂调用链和跨环境通信问题的工具。它利用Grail和Local Graph平台监控和解决跨环境的RPC调用,并提供API进行异常检测、故障排除和优化。CheckEnv是Ballast负载测试平台的首个用户,通过MazeX数据摄取管道更新微服务架构的依赖关系。未来,CheckEnv将扩展数据摄取能力,构建更强大的图形,提升微服务的可靠性和效率。通过图形化分析,CheckEnv能够优化数据流,提高服务的整体效率。它可以应对微服务架构中的复杂挑战,如实时故障检测、事件预测、异常检测、效率改进和工作流管理等。
uVitals – An Anomaly Detection & Alerting System
uVitals是一个用于异常检测的系统,提供了多种模型和功能。它的后端服务使用Go编程语言构建,通过CRUD操作可以有效地管理指标配置。系统核心是“异常”API,可以提供给定指标和日期范围内的异常列表。用户友好的界面使用户可以轻松地导航和自主操作。uVitals不仅能够检测异常,还能帮助用户理解异常,并提供详细的数据可视化和深入分析功能。系统还提供及时的异常通知和用户反馈机制,以提高准确性。通过分析数据的分布和特征,uVitals可以减少低流量城市的误报。此外,uVitals还在性能方面进行了优化,采用Apache Spark®替代Pandas®,平均提高了77%的运行效率。未来,uVitals还将进一步发展,包括实时异常检测和利用AI提供更丰富的内容。
Attribute-Based Access Control at Uber
Uber利用微服务和基于属性的访问控制模型来实现灵活的授权策略。它使用授权引擎和属性存储来评估条件表达式和获取属性值,并选择了Google的CEL作为表达式语言。Uber结合基于属性的访问控制(ABAC)和基于角色的访问控制(RBAC)来满足不同的授权需求,并使用uOwn服务的所有权信息创建通用策略。策略管理中心Charter简化了策略的管理和分发。通过引入属性存储和CEL语言,Uber提供了更大的灵活性和可扩展性,提高了授权效率,节省了工程时间,并降低了系统的复杂性。总之,Uber通过采用属性访问控制策略,为所有服务提供了高效且可扩展的授权框架。该策略有助于保护Uber的数据安全和用户隐私。
Fast Copy-On-Write within Apache Parquet for Data Lakehouse ACID Upserts
With the evolution of storage table formats Apache Hudi®, Apache Iceberg®, and Delta Lake™, more and more companies are building up their lakehouse on top of these formats for many use cases, like incremental ingestion. But the speed of upserts sometimes is still a problem when the data volumes go up.
In storage tables, Apache Parquet is used as the main file format. In this article, we will discuss how we built a row-level secondary index and the innovations we introduced in Apache Parquet to speed up the upsert data inside a Parquet file. We will also demonstrate benchmarking results that show much faster speeds than traditional copy-on-write in Delta Lake and Hudi.
Automated Audit Framework For Internet Scale Financial Transactions
Uber's Financial Computation Service processes various types of money movements from multiple upstreams, generating journal entries that can be linked to actual transactions for accuracy. A rulePlan, a directed tree JSON representation, guides execution. Audit events are created for each transaction, containing all information needed to trace back to business flow and rules. Recompute and reconciliation are possible to ensure reproducibility. All upstream events are consumed and processed by the service.
Identifying Green Vehicles for a Zero-Emission Future
Uber has made a public commitment to phase out carbon emissions in the United States, Canada, and Europe by 2030, and worldwide by 2040. We maintain periodic updates on our progress via our Climate Assessment and Performance Report, which shows both how far we’ve come and how far we have yet to go.
Underlying this report is a large effort to prepare the data presented within, everything from identifying vehicle fuel type across markets, to carbon emissions per vehicle-mile and ultimately to passenger-mile traveled. In this blog post, we’ll take a closer look at the complexities of identifying “green” vehicles onboarded to Uber, and our solutions for managing those data.
Spark Analysers: Catching Anti-Patterns In Spark Apps
Apache Spark™ is a widely used open source distributed computing engine. It is one of the main components of Uber’s data stack.
Spark is the primary batch compute engine at Uber. Like any other framework, Spark comes with its own set of tradeoffs.
Optimizing HDFS with DataNode Local Cache for High-Density HDD Adoption
Uber has one of the largest Hadoop® Distributed File System (HDFS) deployments in the world, with exabytes of data across tens of clusters. It is important, but also challenging, to keep scaling our data infrastructure with the balance between efficiency, service reliability, and high performance. As a cost efficiency improvement effort that will save us tens of millions dollars every year, we aim to adopt higher density HDD (16+TB) SKUs to replace existing SKUs with 4TB HDDs that are still used by the majority of our HDFS clusters.
One of the biggest challenges when fully adopting high-density disk SKU comes from the disk IO bandwidth. While the capacity of each HDD increases by 2x to 4x, the I/O bandwidth of each HDD does not increase accordingly. This may cause IO throttling when DataNodes serve read/write requests. This can be seen from the chart below, which shows the trend of slow read packet read count from one DataNode. Given the persistent and sizable number of slow read occurrences, it is important to find new approaches to prevent performance degradation.
Cybersecurity Incident Simulation @ Uber
All the best things come in threes: the Three Musketeers, the Three Stooges, and, of course, your favorite three-cheese pizza ordered via the UberEats app. Engineering Security (EngSec) at Uber agrees and we have formed our own trio for how we simulate cybersecurity incidents at Uber to exercise our ability to act decisively should an incident occur. This three-pronged approach consists of tabletop exercises, red team operations, and atomic simulations.
Bootstrapping Uber’s Infrastructure on arm64 with Zig
In November 2021 we decided to evaluate arm64 for Uber. Most of our services are written in either Go or Java, but our build systems only supported compiling to x86_64. Today, thanks to Open Source collaboration, Uber has a system-independent (hermetic) build toolchain that seamlessly powers multiple architectures. We used this toolchain to bootstrap our arm64 hosts. This post is a story with how we went about it, our early thinking, problems, some achievements, and next steps.
Measuring Performance for iOS Apps at Uber Scale
At Uber, we obsess over delivering highly performant and reliable experiences to our partners and customers. We treat degradations to app performance the same way as any other functional regressions.…