话题公司 › Uber

公司:Uber

关联话题: 优步

优步(英语:Uber,/ˈuːbər/)是一间交通网络公司,总部位于美国加利福尼亚州旧金山,以开发移动应用程序连结乘客和司机,提供载客车辆租赁及媒合共乘的分享型经济服务。乘客可以透过应用程序来预约这些载客的车辆,并且追踪车辆的位置。营运据点分布在全球785个大都市。人们可以透过网站或是手机应用程序进入平台。

优步的名称大多认为是源自于德文über,和over是同源,意思是“在…上面”。 (页面存档备份,存于互联网档案馆)

然而其营业模式在部分地区面临法律问题,其非典型的经营模式在部分地区可能会有非法营运车辆的问题,有部分国家或地区已立法将之合法化,例如美国加州及中国北京及上海。原因在于优步是将出租车行业转型成社群平台,叫车的客户透过手机APP(应用程序),就能与欲兼职司机的优步用户和与有闲置车辆的租户间三者联系,一旦交易成功即按比例抽佣金、分成给予反馈等去监管化的金融手法。

2019年5月10日,优步公司透过公开分发股票成为上市公司,但首日即跌破分发价。

据估算,优步在全球有1.1亿活跃用户,在美国有69%的市占率。优步亦在大中华区开展业务,目前优步已在香港和台湾建成主流召车服务平台,并于中国大陆通过换股方式持有该市场最大网约车出行平台滴滴出行母公司小桔科技17.7%经济权益。

uVitals – An Anomaly Detection & Alerting System

uVitals是一个用于异常检测的系统,提供了多种模型和功能。它的后端服务使用Go编程语言构建,通过CRUD操作可以有效地管理指标配置。系统核心是“异常”API,可以提供给定指标和日期范围内的异常列表。用户友好的界面使用户可以轻松地导航和自主操作。uVitals不仅能够检测异常,还能帮助用户理解异常,并提供详细的数据可视化和深入分析功能。系统还提供及时的异常通知和用户反馈机制,以提高准确性。通过分析数据的分布和特征,uVitals可以减少低流量城市的误报。此外,uVitals还在性能方面进行了优化,采用Apache Spark®替代Pandas®,平均提高了77%的运行效率。未来,uVitals还将进一步发展,包括实时异常检测和利用AI提供更丰富的内容。

Attribute-Based Access Control at Uber

Uber利用微服务和基于属性的访问控制模型来实现灵活的授权策略。它使用授权引擎和属性存储来评估条件表达式和获取属性值,并选择了Google的CEL作为表达式语言。Uber结合基于属性的访问控制(ABAC)和基于角色的访问控制(RBAC)来满足不同的授权需求,并使用uOwn服务的所有权信息创建通用策略。策略管理中心Charter简化了策略的管理和分发。通过引入属性存储和CEL语言,Uber提供了更大的灵活性和可扩展性,提高了授权效率,节省了工程时间,并降低了系统的复杂性。总之,Uber通过采用属性访问控制策略,为所有服务提供了高效且可扩展的授权框架。该策略有助于保护Uber的数据安全和用户隐私。

Fast Copy-On-Write within Apache Parquet for Data Lakehouse ACID Upserts

With the evolution of storage table formats Apache Hudi®, Apache Iceberg®, and Delta Lake™, more and more companies are building up their lakehouse on top of these formats for many use cases, like incremental ingestion. But the speed of upserts sometimes is still a problem when the data volumes go up.

In storage tables, Apache Parquet is used as the main file format. In this article, we will discuss how we built a row-level secondary index and the innovations we introduced in Apache Parquet to speed up the upsert data inside a Parquet file. We will also demonstrate benchmarking results that show much faster speeds than traditional copy-on-write in Delta Lake and Hudi.

Automated Audit Framework For Internet Scale Financial Transactions

Uber's Financial Computation Service processes various types of money movements from multiple upstreams, generating journal entries that can be linked to actual transactions for accuracy. A rulePlan, a directed tree JSON representation, guides execution. Audit events are created for each transaction, containing all information needed to trace back to business flow and rules. Recompute and reconciliation are possible to ensure reproducibility. All upstream events are consumed and processed by the service.

Identifying Green Vehicles for a Zero-Emission Future

Uber has made a public commitment to phase out carbon emissions in the United States, Canada, and Europe by 2030, and worldwide by 2040. We maintain periodic updates on our progress via our Climate Assessment and Performance Report, which shows both how far we’ve come and how far we have yet to go.

Underlying this report is a large effort to prepare the data presented within, everything from identifying vehicle fuel type across markets, to carbon emissions per vehicle-mile and ultimately to passenger-mile traveled. In this blog post, we’ll take a closer look at the complexities of identifying “green” vehicles onboarded to Uber, and our solutions for managing those data.

Spark Analysers: Catching Anti-Patterns In Spark Apps

Apache Spark™ is a widely used open source distributed computing engine. It is one of the main components of Uber’s data stack.

Spark is the primary batch compute engine at Uber. Like any other framework, Spark comes with its own set of tradeoffs.

Optimizing HDFS with DataNode Local Cache for High-Density HDD Adoption

Uber has one of the largest Hadoop® Distributed File System (HDFS) deployments in the world, with exabytes of data across tens of clusters. It is important, but also challenging, to keep scaling our data infrastructure with the balance between efficiency, service reliability, and high performance. As a cost efficiency improvement effort that will save us tens of millions dollars every year, we aim to adopt higher density HDD (16+TB) SKUs to replace existing SKUs with 4TB HDDs that are still used by the majority of our HDFS clusters.

One of the biggest challenges when fully adopting high-density disk SKU comes from the disk IO bandwidth. While the capacity of each HDD increases by 2x to 4x, the I/O bandwidth of each HDD does not increase accordingly. This may cause IO throttling when DataNodes serve read/write requests. This can be seen from the chart below, which shows the trend of slow read packet read count from one DataNode. Given the persistent and sizable number of slow read occurrences, it is important to find new approaches to prevent performance degradation.

Cybersecurity Incident Simulation @ Uber

All the best things come in threes: the Three Musketeers, the Three Stooges, and, of course, your favorite three-cheese pizza ordered via the UberEats app. Engineering Security (EngSec) at Uber agrees and we have formed our own trio for how we simulate cybersecurity incidents at Uber to exercise our ability to act decisively should an incident occur. This three-pronged approach consists of tabletop exercises, red team operations, and atomic simulations.

Bootstrapping Uber’s Infrastructure on arm64 with Zig

In November 2021 we decided to evaluate arm64 for Uber. Most of our services are written in either Go or Java, but our build systems only supported compiling to x86_64. Today, thanks to Open Source collaboration, Uber has a system-independent (hermetic) build toolchain that seamlessly powers multiple architectures. We used this toolchain to bootstrap our arm64 hosts. This post is a story with how we went about it, our early thinking, problems, some achievements, and next steps.

Measuring Performance for iOS Apps at Uber Scale

At Uber, we obsess over delivering highly performant and reliable experiences to our partners and customers. We treat degradations to app performance the same way as any other functional regressions.…

InsureTech: Insurance Compliance

At Uber, we put safety first in order to minimize risks for users on the Uber platform. Uber Insurance Tech focuses on three pillars; claims, compliance, and affinity programs.

Demand and ETR Forecasting at Airports

Airports currently hold a significant portion of Uber’s supply and open supply hours (i.e., supply that is not utilized, but open for dispatch) across the globe. At most airports, drivers are obligated to join a “first-in-first-out” (FIFO) queue from which they are dispatched. When the demand for trips is high relative to the supply of drivers in the queue (“undersupply”), this queue moves quickly and wait times for drivers can be quite low. However, when demand is low relative to the amount of available supply (“oversupply”), the queue moves slowly and wait times can be very high. Undersupply creates a poor experience for riders, as they are less likely to get a suitable ride. On the other hand, oversupply creates a poor experience for drivers as they are spending more time waiting for each ride and less time driving. What’s more, drivers don’t currently have a way to see when airports are under- or over-supplied, which perpetuates this problem.

One way to tackle this undersupply/oversupply issue at airports is to forecast supply balance and use this to optimize resource allocation. Our first application of these models is in estimating the time to request (ETR) for the airport driver queue. We estimate the length of time a driver would have to wait before they receive a trip request, thereby giving drivers the information they need to identify and reposition in periods of undersupply (short waits), or to remain in the city during periods of oversupply (long waits).

Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi

The Global Data Warehouse team at Uber democratizes data for all of Uber with a unified, petabyte-scale, centrally modeled data lake. The data lake consists of foundational fact, dimension, and aggregate tables developed using dimensional data modeling techniques that can be accessed by engineers and data scientists in a self-serve manner to power data engineering, data science, machine learning, and reporting across Uber. The ETL (extract, transform, load) pipelines that compute these tables are thus mission-critical to Uber’s apps and services, powering core platform features like rider safety, ETA predictions, fraud detection, and more. At Uber, data freshness is a key business requirement. Uber invests heavily in engineering efforts that process data as quickly as possible to keep it up to date with the happenings in the physical world.

In order to achieve such data freshness in our ETL pipelines, a key challenge is incrementally updating these modeled tables rather than recomputing all the data with each new ETL run. This is also necessary to operate these pipelines cost-effectively at Uber’s enormous scale. In fact, as early as 2016, Uber introduced a new “transactional data lake” paradigm with powerful incremental data processing capabilities through the Apache Hudi project to address these challenges. We later donated the project to the Apache Software Foundation. Apache Hudi is now a top-level Apache project used industry wide in a new emerging technology category called the lakehouse. During this time, we are excited to see that the industry has largely moved away from bulk data ingestion towards a more incremental ingestion model that Apache Hudi ushered in at Uber. In this blog, we share our work over the past year or so in extending this incremental data processing model to our complex ETL pipelines to unlock true end-to-end incremental data processing.

How We Unified Configuration Distribution Across Systems at Uber

Uber has multiple, domain-specific products to manage and distribute configuration changes at runtime across our many systems. These configuration products cater to different use cases: some have a web UI that can be used by non-engineers to change product configuration for different cities, and others expose a Git-based interface that primarily caters to engineers.

While these domain-specific configuration products have different applications, they share common parts that can be consolidated for simplicity and to reduce the overhead of operations, maintenance, and compliance. This article will cover how we consolidated and streamlined our underlying configuration and rollout mechanisms, including some of the interesting challenges we solved along the way, and the efficiencies we achieved by doing so.

Uber’s Sustainable Engineering Journey

Uber has made a commitment to sustainability by setting several goals across various sectors. By 2030, Uber plans to become a zero-emission mobility platform in Canada, Europe, and the US – and by 2040, worldwide. Uber Green, which offers no- or low-emission rides, has become the most widely-available option of its kind globally. However, this commitment encompasses more than just rides, as it also includes Uber’s engineering infrastructure such as its data centers and hardware resources, both on-premise and in public clouds.

As engineers and technology leaders, we nurture and develop the concept of responsible ownership, which is often thought of as maintaining high quality of our products. Responsible ownership also implies building efficient services, of which metrics for energy efficiency and sustainability should be an integral part.

In late 2021, we embarked on a journey to find out the best sustainable engineering practices, tools, and technologies, and began building them into our services, products, and training sessions. In this article, we present our vision and roadmap, walk through Uber Eng best practices for engineering sustainably towards a zero-emission world, and introduce novel, sustainability-oriented services.

D3: An Automated System to Detect Data Drifts

Data powers almost all critical, customer-facing flows at Uber. Bad data quality impacts our ML models, leading to a bad user experience (incorrect fares, ETAs, products, etc.) and revenue loss.

Still, many data issues are manually detected by users weeks or even months after they start. Data regressions are hard to catch because the most impactful ones are generally silent. They do not impact metrics and ML models in an obvious way until someone notices something is off, which finally unearths the data issue. But by that time, bad decisions are already made, and ML models have already underperformed.

This makes it critical to monitor data quality thoroughly so that issues are caught proactively.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.130.1. UTC+08:00, 2024-07-27 09:57
浙ICP备14020137号-1 $访客地图$