Identifying Green Vehicles for a Zero-Emission Future
Uber has made a public commitment to phase out carbon emissions in the United States, Canada, and Europe by 2030, and worldwide by 2040. We maintain periodic updates on our progress via our Climate Assessment and Performance Report, which shows both how far we’ve come and how far we have yet to go.
Underlying this report is a large effort to prepare the data presented within, everything from identifying vehicle fuel type across markets, to carbon emissions per vehicle-mile and ultimately to passenger-mile traveled. In this blog post, we’ll take a closer look at the complexities of identifying “green” vehicles onboarded to Uber, and our solutions for managing those data.
Spark Analysers: Catching Anti-Patterns In Spark Apps
Apache Spark™ is a widely used open source distributed computing engine. It is one of the main components of Uber’s data stack.
Spark is the primary batch compute engine at Uber. Like any other framework, Spark comes with its own set of tradeoffs.
Optimizing HDFS with DataNode Local Cache for High-Density HDD Adoption
Uber has one of the largest Hadoop® Distributed File System (HDFS) deployments in the world, with exabytes of data across tens of clusters. It is important, but also challenging, to keep scaling our data infrastructure with the balance between efficiency, service reliability, and high performance. As a cost efficiency improvement effort that will save us tens of millions dollars every year, we aim to adopt higher density HDD (16+TB) SKUs to replace existing SKUs with 4TB HDDs that are still used by the majority of our HDFS clusters.
One of the biggest challenges when fully adopting high-density disk SKU comes from the disk IO bandwidth. While the capacity of each HDD increases by 2x to 4x, the I/O bandwidth of each HDD does not increase accordingly. This may cause IO throttling when DataNodes serve read/write requests. This can be seen from the chart below, which shows the trend of slow read packet read count from one DataNode. Given the persistent and sizable number of slow read occurrences, it is important to find new approaches to prevent performance degradation.
Cybersecurity Incident Simulation @ Uber
All the best things come in threes: the Three Musketeers, the Three Stooges, and, of course, your favorite three-cheese pizza ordered via the UberEats app. Engineering Security (EngSec) at Uber agrees and we have formed our own trio for how we simulate cybersecurity incidents at Uber to exercise our ability to act decisively should an incident occur. This three-pronged approach consists of tabletop exercises, red team operations, and atomic simulations.
Bootstrapping Uber’s Infrastructure on arm64 with Zig
In November 2021 we decided to evaluate arm64 for Uber. Most of our services are written in either Go or Java, but our build systems only supported compiling to x86_64. Today, thanks to Open Source collaboration, Uber has a system-independent (hermetic) build toolchain that seamlessly powers multiple architectures. We used this toolchain to bootstrap our arm64 hosts. This post is a story with how we went about it, our early thinking, problems, some achievements, and next steps.
Measuring Performance for iOS Apps at Uber Scale
At Uber, we obsess over delivering highly performant and reliable experiences to our partners and customers. We treat degradations to app performance the same way as any other functional regressions.…
InsureTech: Insurance Compliance
At Uber, we put safety first in order to minimize risks for users on the Uber platform. Uber Insurance Tech focuses on three pillars; claims, compliance, and affinity programs.
Demand and ETR Forecasting at Airports
Airports currently hold a significant portion of Uber’s supply and open supply hours (i.e., supply that is not utilized, but open for dispatch) across the globe. At most airports, drivers are obligated to join a “first-in-first-out” (FIFO) queue from which they are dispatched. When the demand for trips is high relative to the supply of drivers in the queue (“undersupply”), this queue moves quickly and wait times for drivers can be quite low. However, when demand is low relative to the amount of available supply (“oversupply”), the queue moves slowly and wait times can be very high. Undersupply creates a poor experience for riders, as they are less likely to get a suitable ride. On the other hand, oversupply creates a poor experience for drivers as they are spending more time waiting for each ride and less time driving. What’s more, drivers don’t currently have a way to see when airports are under- or over-supplied, which perpetuates this problem.
One way to tackle this undersupply/oversupply issue at airports is to forecast supply balance and use this to optimize resource allocation. Our first application of these models is in estimating the time to request (ETR) for the airport driver queue. We estimate the length of time a driver would have to wait before they receive a trip request, thereby giving drivers the information they need to identify and reposition in periods of undersupply (short waits), or to remain in the city during periods of oversupply (long waits).
Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi
The Global Data Warehouse team at Uber democratizes data for all of Uber with a unified, petabyte-scale, centrally modeled data lake. The data lake consists of foundational fact, dimension, and aggregate tables developed using dimensional data modeling techniques that can be accessed by engineers and data scientists in a self-serve manner to power data engineering, data science, machine learning, and reporting across Uber. The ETL (extract, transform, load) pipelines that compute these tables are thus mission-critical to Uber’s apps and services, powering core platform features like rider safety, ETA predictions, fraud detection, and more. At Uber, data freshness is a key business requirement. Uber invests heavily in engineering efforts that process data as quickly as possible to keep it up to date with the happenings in the physical world.
In order to achieve such data freshness in our ETL pipelines, a key challenge is incrementally updating these modeled tables rather than recomputing all the data with each new ETL run. This is also necessary to operate these pipelines cost-effectively at Uber’s enormous scale. In fact, as early as 2016, Uber introduced a new “transactional data lake” paradigm with powerful incremental data processing capabilities through the Apache Hudi project to address these challenges. We later donated the project to the Apache Software Foundation. Apache Hudi is now a top-level Apache project used industry wide in a new emerging technology category called the lakehouse. During this time, we are excited to see that the industry has largely moved away from bulk data ingestion towards a more incremental ingestion model that Apache Hudi ushered in at Uber. In this blog, we share our work over the past year or so in extending this incremental data processing model to our complex ETL pipelines to unlock true end-to-end incremental data processing.
How We Unified Configuration Distribution Across Systems at Uber
Uber has multiple, domain-specific products to manage and distribute configuration changes at runtime across our many systems. These configuration products cater to different use cases: some have a web UI that can be used by non-engineers to change product configuration for different cities, and others expose a Git-based interface that primarily caters to engineers.
While these domain-specific configuration products have different applications, they share common parts that can be consolidated for simplicity and to reduce the overhead of operations, maintenance, and compliance. This article will cover how we consolidated and streamlined our underlying configuration and rollout mechanisms, including some of the interesting challenges we solved along the way, and the efficiencies we achieved by doing so.
Uber’s Sustainable Engineering Journey
Uber has made a commitment to sustainability by setting several goals across various sectors. By 2030, Uber plans to become a zero-emission mobility platform in Canada, Europe, and the US – and by 2040, worldwide. Uber Green, which offers no- or low-emission rides, has become the most widely-available option of its kind globally. However, this commitment encompasses more than just rides, as it also includes Uber’s engineering infrastructure such as its data centers and hardware resources, both on-premise and in public clouds.
As engineers and technology leaders, we nurture and develop the concept of responsible ownership, which is often thought of as maintaining high quality of our products. Responsible ownership also implies building efficient services, of which metrics for energy efficiency and sustainability should be an integral part.
In late 2021, we embarked on a journey to find out the best sustainable engineering practices, tools, and technologies, and began building them into our services, products, and training sessions. In this article, we present our vision and roadmap, walk through Uber Eng best practices for engineering sustainably towards a zero-emission world, and introduce novel, sustainability-oriented services.
D3: An Automated System to Detect Data Drifts
Data powers almost all critical, customer-facing flows at Uber. Bad data quality impacts our ML models, leading to a bad user experience (incorrect fares, ETAs, products, etc.) and revenue loss.
Still, many data issues are manually detected by users weeks or even months after they start. Data regressions are hard to catch because the most impactful ones are generally silent. They do not impact metrics and ML models in an obvious way until someone notices something is off, which finally unearths the data issue. But by that time, bad decisions are already made, and ML models have already underperformed.
This makes it critical to monitor data quality thoroughly so that issues are caught proactively.
Fixing Go’s Linker: An Unexpected Journey into ARM64, DWARF, and Linker Internals
We encountered an unusual problem recently at Uber with Golang™ debugging, as our engineers began transitioning to Apple® Silicon hardware, which uses the ARM64 Instruction Set Architecture (ISA), rather than the x86/AMD64 ISA many of us have been using for many years now. This required some rather complex debugging of the toolchain itself by Uber engineers.
uAct - Unified Action Platform
At a company as large and as complex as Uber, the volume of internal communication can easily become overwhelming. Our employees use multiple independent systems to send and receive a variety of notifications every single day. These include:
- Pullo – to raise access requests for software tools and services
- Uber Feedback – to provide and receive co-worker feedback
- Uber Learning – to undertake assigned training
- Workday – to apply for leaves, sign HR contracts, etc.
- ERD-PRD tool – to seek and provide approval of engineering and product documentation
- UberHub – to make requests for IT infra, hardware & software, etc.
Thus employees receive several action items and approval requests from multiple applications on a daily basis. Unified Action Platform or uAct has been built with a view to help employees keep on top of their assigned tasks and action items. uAct aggregates all such requests into one place for employees to easily view and address.
How the Uber Membership Team Developed the ActionCard Design Pattern to Do More with Less
The ActionCard pattern reduces app screen UI, navigation (routing) logic, and other app logic into simple, decoupled elements. The UI elements are called cards, and the associated reusable logical elements are called actions. Together cards and actions are configured to create app screens and features. Every screen is backed by a server-driven feed of card data models.
The ActionCard pattern implementation described here is the result of our team leveraging learnings from across Uber engineering and a lot of our own trial and error. The result is a pattern that allows us to quickly launch new features across multiple screens and apps with a focus on rapid iteration.
The ActionCard pattern has allowed us to reduce complexity and eliminate redundancy. Our hope is that it might be helpful for other teams who want to go fast.
Containerizing the Beast – Hadoop NameNodes in Uber’s Infrastructure
There are several online references on how to run Apache Hadoop® (referred to as “Hadoop” in this article) in Docker for demo and test purposes. In a previous blog post, we described how we containerized Uber’s production Hadoop infrastructure spanning 21,000+ hosts.
HDFS NameNode is the most performance-sensitive component within large multi-tenant Hadoop clusters. We had deferred NameNode containerization to the end of our containerization journey to leverage learnings from containerizing other Hadoop components. In this blog post, we’ll share our experience on how we containerized HDFS NameNodes and architected a zero-downtime migration for 32 clusters.