公司:Uber
优步(英语:Uber,/ˈuːbər/)是一间交通网络公司,总部位于美国加利福尼亚州旧金山,以开发移动应用程序连结乘客和司机,提供载客车辆租赁及媒合共乘的分享型经济服务。乘客可以透过应用程序来预约这些载客的车辆,并且追踪车辆的位置。营运据点分布在全球785个大都市。人们可以透过网站或是手机应用程序进入平台。
优步的名称大多认为是源自于德文über,和over是同源,意思是“在…上面”。 (页面存档备份,存于互联网档案馆)
然而其营业模式在部分地区面临法律问题,其非典型的经营模式在部分地区可能会有非法营运车辆的问题,有部分国家或地区已立法将之合法化,例如美国加州及中国北京及上海。原因在于优步是将出租车行业转型成社群平台,叫车的客户透过手机APP(应用程序),就能与欲兼职司机的优步用户和与有闲置车辆的租户间三者联系,一旦交易成功即按比例抽佣金、分成给予反馈等去监管化的金融手法。
2019年5月10日,优步公司透过公开分发股票成为上市公司,但首日即跌破分发价。
据估算,优步在全球有1.1亿活跃用户,在美国有69%的市占率。优步亦在大中华区开展业务,目前优步已在香港和台湾建成主流召车服务平台,并于中国大陆通过换股方式持有该市场最大网约车出行平台滴滴出行母公司小桔科技17.7%经济权益。
Cost-Efficient Open Source Big Data Platform at Uber
As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest…
Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data
Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last four years, the scale of our big data platform multiplied from single-digit petabytes to many hundreds of petabytes.
Uber’s big data stack is built on top of the open source ecosystem. We run some of the largest deployments of Hadoop, Hive, Spark, Kafka, Presto, and Flink in the world. Open source software allows us to quickly scale up to meet Uber’s business needs without reinventing the wheel.
The cost of running our big data platform also rose significantly in that same period. The Big Data Platform was one of the most costly among the 3 internal platforms at Uber. That was when we started taking a serious look at our big data platform’s cost, aiming to reduce overhead while preserving the reliability, productivity and the value it provides to the business.
How Uber Achieves Operational Excellence in the Data Quality Experience
Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly impact business operations and decisions. Without data quality guarantees, downstream service computation or machine learning model performance quickly degrade, which requires a lot of laborious manual efforts to investigate and backfill poor data. In the worst cases, degradations could go unnoticed, silently resulting in inconsistent behaviors.
This led us to build a consolidated data quality platform (UDQ), with the purpose of monitoring, automatically detecting, and handling data quality issues. With the goal of building and achieving data quality standards across Uber, we have supported over 2,000 critical datasets on this platform, and detected around 90% of data quality incidents. In this blog, we describe how we created data quality standards at Uber and built the integrated workflow to achieve operational excellence.
Uber’s Finance Computation Platform
For a company of our size and scale, robust, accurate, and compliant accounting and analytics are a necessity, ensuring accurate and granular visibility into our financials, across multiple lines of business.
Most standard, off-the-shelf finance engineering solutions cannot support the scale and scope of the transactions on our ever-growing platform. The ride-sharing business alone has over 4 billion trips per year worldwide, which translates to more than 40 billion journal entries (financial microtransactions). Each of these entries has to be produced in accordance with Generally Accepted Accounting Principles (GAAP), and managed in an idempotent, consistent, accurate, and reproducible manner.
To meet these specific requirements, we built an in-house Uber’s Finance Computation Platform (FCP) —a solution designed to accommodate our scale, while providing strong guarantees on accuracy and explainability. The same solution also serves in obtaining insights on business operations.
There were many challenges in building our financial computation platform, from our architectural choices to the types of controls for accuracy and explainability.
Pinot Real-Time Ingestion with Cloud Segment Storage
Apache Pinot is an open source data analytics engine (OLAP), which allows users to query data ingested from as recently as a few seconds ago to as old as a few years back. Pinot’s ability to ingest real-time data and make them available for low-latency queries is the key reason why it has become an important component of Uber’s data ecosystem. Many products built in Uber require real-time data analytics to operate in our mobile marketplace for shared rides and food delivery. For example, the chart in Figure 1 shows the breakdown of Uber Eats job states over a period of minutes. Our Uber Eats city operators need such insights to balance marketplace supply and demand, and detect ongoing issues.
Uber’s Fulfillment Platform: Ground-up Re-architecture to Accelerate Uber’s Go/Get Strategy
Uber’s mission is to help our consumers effortlessly go anywhere and get anything in thousands of cities worldwide. At its core, we capture a consumer’s intent and fulfill it by matching it with the right set of providers.
Fulfillment is the “act or process of delivering a product or service to a customer.” The Fulfillment organization at Uber develops platforms to orchestrate and manage the lifecycle of ongoing orders and user sessions with millions of active participants.
Containerizing Apache Hadoop Infrastructure at Uber
In 2019, we started a journey to re-architect the Hadoop deployment stack. Fast forward 2 years, over 60% of Hadoop runs in Docker containers, bringing major operational benefits to the team. As a result of the initiative, the team handed off many of their responsibilities to other infrastructure teams, and was able to focus more on core Hadoop development.
This article provides a summary of problems we faced, and how we solved them along the way.
‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data
By its nature, Uber’s business is highly real-time and contingent upon geospatial data. PBs of data are continuously being collected from our drivers, riders, restaurants, and eaters. Real-time analytics over this geospatial data could provide powerful insights.
In this blog, we will highlight the Orders near you feature from the Uber Eats app, illustrating one example of how Uber generates insights across our geospatial data.
Orders near you was a recent collaboration between the Data and Uber Eats teams at Uber. The project’s goal was to create an engaging and unique social experience for eaters. We hoped to inspire new food and restaurant discovery by showing what your neighbors are ordering right now. Since this feature is part of our home feed, we needed it to be fast, personalized, and scalable.
Analyzing Customer Issues to Improve User Experience
The primary goal for customer support is to ensure users’ issues are addressed and resolved in a timely and effective manner. The kind of issues users face and what they say in their support interactions provides a lot of information about the product experience, any technical or operational gaps and even their general sentiment towards the product / company. At Uber, we don’t stop at just resolving user issues. We also use the issues reported by customers to improve our support experience and our products. This article describes the technology that makes it happen.
Customer Support Automation Platform at Uber
If you’ve used any online/digital service, chances are that you are familiar with what a typical customer service experience entails: you send a message (usually email aliased) to the company’s support staff, fill out a form, expect some back and forth with a customer service representative (CSR), and hopefully have your issue resolved. This process can often feel inefficient and slow. Typically, this might be attributable to the tooling/processes made available to CSRs for solving your issue. For any given issue, the CSR has to navigate standard operating procedures (SOPs, a.k.a. flow) with proliferating undocumented branches/edge cases making their work mundane, tedious, and imprecise. The manual maintenance and navigation of these SOPs can create a bureaucratic bottleneck, which ultimately leaves the customer dissatisfied.
Tuning Model Performance
Uber uses machine learning (ML) models to power critical business decisions. An ML model goes through many experiment iterations before making it to production. During the experimentation phase, data scientists or machine learning engineers explore adding features, tuning parameters, and running offline analysis or backtesting. We enhanced the platform to reduce the human toil and time in this stage, while ensuring high model quality in production.
Elastic Distributed Training with XGBoost on Ray
In this blog, we discuss how moving to distributed XGBoost on Ray helps address these concerns and how finding the right abstractions allows us to seamlessly incorporate Ray and XGBoost Ray into Uber’s ML ecosystem. Finally, we cover how moving distributed XGBoost onto Ray, in parallel with efforts to move Elastic Horovod onto Ray, serves as a critical step towards a unified distributed compute backend for end-to-end machine learning workflows at Uber.
Continuous Integration and Deployment for Machine Learning Online Serving and Models
At Uber, we have witnessed a significant increase in machine learning adoption across various organizations and use-cases over the last few years. Our machine learning models are empowering a better customer experience, helping prevent safety incidents, and ensuring market efficiency, all in real time. The figure above is a high level view of CI/CD for models and service binary.
One thing to note is we have continuous integration (CI)/continuous deployment (CD) for models and services, as shown above in Figure 1. We arrived at this solution after several iterations to address some of MLOps challenges, as the number of models trained and deployed grew rapidly. The first challenge was to support a large volume of model deployments on a daily basis, while keeping the Real-time Prediction Service highly available. We will discuss our solution in the Model Deployment section.
Efficient and Reliable Compute Cluster Management at Scale
Uber relies on a containerized microservice architecture. Our need for computational resources has grown significantly over the years, as a consequence of business’ growth. It is an important goal now to increase the efficiency of our computing resources. Broadly speaking, the efficiency efforts in compute cluster management involve scheduling more workloads on the same number of machines. This approach is based on the observation that the average CPU utilization of a typical cluster is far lower than the CPU resources that have been allocated to it. The approach we have adopted is to overcommit CPU resources, without compromising the reliability of the platform, which is achieved by maintaining a safe headroom at all times. Another possible and complementary approach is to reduce the allocations of services that are overprovisioned, which we also do. The benefit of overcommitment is that we are able to free up machines that can be used to run non-critical, preemptible workloads, without purchasing extra machines.
Handling Flaky Unit Tests in Java
Unit testing forms the bedrock of any Continuous Integration (CI) system. It warns software engineers of bugs in newly-implemented code and regressions in existing code, before it is merged. This ensures increased software reliability. It also improves overall developer productivity, as bugs are caught early in the software development lifecycle. Hence, building a stable and reliable testing system is often a key requirement for software development organizations.
Scaling of Uber's API gateway
作为上一篇文章的回顾,Uber的API网关提供了一个接口,并作为我们所有后端服务的单一访问点,将功能和数据暴露给移动和第三方合作伙伴。像API网关这样的系统的两个主要组成部分是配置管理和运行时。运行时组件负责验证、授权、转换和路由请求到适当的下游服务,并将响应传回给移动。配置管理组件负责管理开发人员的工作流程,以便轻松配置他们在网关上的端点。这包括确保配置的端点是向后兼容的,并且在运行期间没有功能退步。Uber的所有后端工程师每天都依赖这个组件来开发、测试并向互联网发布他们的端点。
这样一个系统的可靠性和效率是极其重要的。你可以想象,拥有一个可靠和高效的网关平台直接有助于乘客的体验(尤其是运行时组件)和开发者的体验(配置管理组件上的任何问题都会对功能开发速度产生负面影响)。虽然运行时组件的可靠性和效率极为关键,因为它们直接贡献于Uber的顶线,但配置管理的可靠性和效率也极为关键,直接关系到Uber的底线。
当一个平台被大量的工程师用来开发端点时,自然会产生争论点,这会拖慢人们的速度,最终降低整个公司的整体开发速度。在这篇文章中,我们将谈论我们如何扩大这个平台的规模,使其每天被Uber的数百名工程师使用。我们将深入探讨我们的配置管理组件的代码构建方面,我们在推出时面临的挑战,以及我们如何解决这些问题。