At Uber, millions of machine learning (ML) predictions are made every second, and hundreds of applied scientists, engineers, product managers, and researchers work on ML solutions daily.
Uber wins by scaling machine learning. We recognize org-wide that a powerful way to scale machine learning adoption is by educating. That’s why we created the Machine Learning Education Program: a program driven by Engineering Principles that provides a framework for delivering Uber-specific ML educational resources to Uber Tech employees.
Like a production system, education resources, contents, and distribution channels need to be continuously measured, evaluated, and improved. Ensuring each component of the ML Education Program is designed on this premise enabled us to quickly deliver new courses and curriculum that are tailored to engineers and scientists of various backgrounds.
This 2-part article will focus on how we have applied engineering principles when designing and scaling this program, and how it has helped us achieve the desired outcome. Part 1 will introduce our design principles and explain the benefits of applying these principles to technical education content design and program frameworks, specifically in the ML domain. Part 2 will take a closer look at critical components of the program and reflect on the outcomes that make ML Education at Uber a success.
In early 2020, we took a deeper look at this ecosystem. We discovered that a large percentage of the experiments had fatal problems and often needed to be rerun. Obtaining high-quality results required an expert-level understanding of experimentation and statistics, and an inordinate amount of toil (custom analysis, pipelining, etc.). This slowed down decision-making, and re-running poorly conducted experiments was common.
After assessing the customer problems and the internals of Morpheus we concluded that the core abstractions supported only a very narrow set of experiment designs correctly, and even minute deviations from such designs resulted in incomparable cohorts of users in control and treatment and compromised experiment results. To give a very simple example, while gradually rolling out an experiment that was split 30/70 between control and treatment, due to peculiarities of rollout and treatment assignment logic it would be ok to roll it out to 10% of users but not to 5%. Furthermore, the system was not able to support advanced experiment configurations needed to support Uber’s diverse use cases, or other advanced functionality at scale such as monitoring/rolling back experiments that were negatively impacting business metrics. So we decided to build a new platform from scratch with correct abstractions.
This blog post describes the implementation of an automated vertical CPU scaling system in which every storage workload running at Uber is allocated the ideal amount of cores. The framework is used today to right-size more than 500,000 Docker containers, and since its inception it has applied a net reduction of allocations of more than 120,000 cores, leading to annual multi-million dollar savings in infrastructure spending.
Uber is a data-driven company that heavily relies on offline and online analytics for decision-making. As Uber’s data grows exponentially every year, it’s crucial to process this data very efficiently and with minimum cost. Over the years, Apache Spark™ has become the primary compute engine at Uber to satisfy such data needs. Spark empowers many business-critical use cases at Uber with its unique features, including Uber rides, Uber Eats, autonomous vehicles, ETAs, Maps, and many more. Spark is the primary engine for data warehousing, data science, and AI/ML. In the last few years, Uber’s Spark usage has grown exponentially year over year, running on more than 10,000 nodes in production. Spark jobs now account for more than 95% of analytics cluster compute resources which process hundreds of petabytes of data every day.
Shadower is a load testing tool that allows us to provide load testing as a service to any microservice at Uber.
Shadower started as a command line application that allowed us to read a local file to load test a local application. At the time, Maps PEs were heavily investing in Java GC tuning. We needed Shadower to be able to do request mirroring to make sure two different applications get about the same load and different types of loads (test multiple endpoints). In summary, we were starting the same application twice: once with the production configuration and once with the new GC tuning. How’s this different from testing on staging? There are a few differences:
- Velocity: Testing on staging means pushing code to a branch, generating a build, and deploying it. This can take ~10 minutes compared to seconds locally, because locally we don’t need to generate a new build for configuration changes.
- Complexity: Our current infrastructure requires extra steps to allow multiple builds to run concurrently in the same environment, the reason for this is to reduce inconsistencies.
- Reliability: The current tools didn’t provide request mirroring, so we are at the mercy of the load balancing algorithm and how well randomized the requests are that we are using for our load test.
- Ease of use: The current tools needed developers to write code to be able to onboard a new service/endpoint.
Shadower stayed as a command line application for about a year until we built Ballast. Ballast required a reliable and easy-to-manage load test generator. Ballast is in charge of tracking resources utilization and deciding what is the right load for any load test. Shadower only focuses on executing those load tests.
Before 2021, Uber engineers would have to take quite a taxing journey to make a code change to the Go Monorepo. First, the engineer would make their changes on a local branch and put up a code revision to our internal code review system, Phabricator. Next, our infrastructure would see the request and initiate a number of validation jobs on our CI. Those jobs would run build and test validation using the Bazel™ build system, check the coverage, do some other work, and report back to the user a red light (i.e., tests failed or some other issues) or green light. Next, the user, after seeing the “green light,” would get their code reviewed and then initiate a “land” request to the Submit Queue. The queue, after receiving their request, would patch their changes on the latest HEAD of the main branch and re-run these associated builds and tests to make sure their change would be valid at the current state of the repository. If everything looked good, the changes would be pushed and the revision would be closed.
This sounds pretty easy, right? Make sure everything is green, reviewed, and then let the queue do the work to push your change!
Well… what if the change is to a fundamental and widely used library? All packages depending on the library will need to be validated, even though the change itself is only a few lines of code. This might slow down the “build and test” part. Validating this change could sometimes take several hours. Internally, we call such changes big changes.
At Uber we use data from user support interactions to identify gaps in our products and create better, more delightful experiences for our users. Support interactions with customers include information about broken product experiences, any technical or operational issues faced, and even their general sentiment towards the product and company. Understanding the root cause of a broken product experience requires additional context, such as details of the trip or the order. For example, the root cause for a customer issue about a delayed order might be due to a bad route given to the courier. In this case, we would want to attribute the poor customer experience to courier routing errors so that the Maps team can fix the same.
Initially, we had manual agents review a statistically significant sample from resolved support interactions. They would manually verify and label the resolved support issues and assign root cause attribution to different categories and subcategories of issue types. We wanted to build a proof-of-concept (POC) that automates and scales this manual process by applying ML and NLP algorithms on the semi-structured or unstructured data from all support interactions, on a daily basis.
This article describes the approach we took and the end-to-end design of our data processing and ML pipelines for our POC, which optimized the ease of building and maintaining such high scale offline inference workflows by engineers and data scientists on the team.
Latin America is a rich cultural region, known for its world-renowned gastronomy, its abundant biodiversity, and its welcoming population. However, socio-economic inequality has been a challenge for the region, and is generally considered a major contributing factor to high levels of violence.
The platform is not immune to the environment in which it operates. Therefore, to be one step ahead of opportunistic users, Uber has created the Rider Identity Team with 3 main goals in mind: reduce rider anonymity by verifying riders, reduce the rate of conflicts caused by riders, and improve driver safety sentiment regarding new riders.
Uber has adopted Golang (Go for short) as a primary programming language for developing microservices. Our Go monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Go services (and growing).
Go makes concurrency a first-class citizen; prefixing function calls with the go keyword runs the call asynchronously. These asynchronous function calls in Go are called goroutines. Developers hide latency (e.g., IO or RPC calls to other services) by creating goroutines. Two or more goroutines can communicate data either via message passing (channels) or shared memory. Shared memory happens to be the most commonly used means of data communication in Go.
Uber has operations in over 10,000 cities worldwide and its services include ridesharing, food delivery, package delivery, couriers, freight transportation, electric bicycle and motorized scooter rental, and ferry transport.
Every year we have millions of users going through signup and login on our various apps. Over the years we’ve built independent signup and login experiences for each of our lines of business which allowed us to innovate and move a lot quicker. However, as we scaled and added additional lines of business, our experiences began to diverge leading to some of these inconsistencies being amplified. USL (unified signup and login) enables the vision of “One Uber Identity” through a unified signup and login experience across all Uber apps.
Subsetting is a common technique used in load balancing for large-scale distributed systems. In this blog post, we will briefly introduce Uber’s current service mesh architecture that has been powering thousands of critical microservices in Uber since 2016. We will then discuss the challenges we faced when trying to scale the number of tasks in the mesh and issues with our initial subsetting approach. We’ll finish with how we came up with the real-time dynamic subsetting solution and its results in production.
Uber has extensively adopted Go as a primary programming language for developing microservices. Our Go monorepo consists of about 50 million lines of code and contains approximately 2,100 unique Go services. Go makes concurrency a first-class citizen; prefixing function calls with the go keyword runs the call asynchronously. These asynchronous function calls in Go are called goroutines. Developers hide latency (e.g., IO or RPC calls to other services) by creating goroutines within a single running Go program. Goroutines are considered “lightweight”, and the Go runtime context switches them on the operating-system (OS) threads. Go programmers often use goroutines liberally. Two or more goroutines can communicate data either via message passing (channels) or shared memory. Shared memory happens to be the most commonly used means of data communication in Go.
A data race occurs in Go when two or more goroutines access the same memory location, at least one of them is a write, and there is no ordering between them, as defined by the Go memory model. Outages caused by data races in Go programs are a recurring and painful problem in our microservices. These issues have brought down our critical, customer-facing services for hours in total, causing inconvenience to our customers and impacting our revenue. In this blog, we discuss deploying Go’s default dynamic race detector to continuously detect data races in our Go development environment. This deployment has enabled detection of more than 2,000 races resulting in ~1,000 data races fixed by more than two hundred engineers.
Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de facto standard for query federation that has been used for interactive queries, near-real-time data analysis, and large-scale data analysis. Kafka is the backbone for data streaming that supports many use cases such as pub/sub, streaming processing, etc. In the following article we will discuss how we have connected these two important services together to enable a lightweight, interactive SQL query directly over Kafka via Presto at Uber scale.
Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as well as financial transaction events between the backend services. As Kafka forms a critical component of Uber’s core workflows, it is important to secure the data being published and subscribed from the topics to maintain the integrity of the data and to provide an access control mechanism for who can publish/subscribe to a given topic.
Safety has long been a top priority at Uber, as Uber’s CEO Dara Khosrowshahi wrote in ‘Raising the Bar on Safety’ in September 2018. In order to #StandForSafety, the team at Uber has rolled out a set of features, such as Safety Center, Trusted Contacts, and the in-app Emergency Button, among others.
The first version of Emergency Button was rolled out in India in 2015. The original system allowed riders and drivers to contact the local police authority while remaining inside the app, and automatically alerted regional support teams to proactively reach out to the user. In 2018, the team took the opportunity to improve the system with enhanced features, such as surfacing live location information in the app, sharing trip details with authorities, and making Emergency Button available on both the Rider and Driver apps in global markets. Since then, the team has made additional enhancements in select markets by introducing the ability for users to discreetly text local police instead of a call-only option. Also, in select markets we have launched an interactive voice response solution to provide follow-up to our users.
At Uber, all stateful workloads run on a common containerized platform across a large fleet of hosts. Stateful workloads include MySQL®, Apache Cassandra®, ElasticSearch®, Apache Kafka®, Apache HDFS™, Redis™, Docstore, Schemaless, etc., and in many cases these workloads are co-located on the same physical hosts.
With 65,000 physical hosts, 2.4 million cores, and 200,000 containers, increasing utilization to reduce cost is an important and continuous effort. Until recently efforts were blocked due to CPU throttling, which indicates that not enough resources have been allocated.
It turned out that the issue was how the Linux kernel allocates time for processes to run. In this post we will describe how switching from CPU quotas to cpusets (also known as CPU pinning) allowed us to trade a slight increase in P50 latencies for a significant drop in P99 latencies. This in turn allowed us to reduce fleet-wide core allocation by up to 11% due to less variance in resource requirements.