The ActionCard pattern reduces app screen UI, navigation (routing) logic, and other app logic into simple, decoupled elements. The UI elements are called cards, and the associated reusable logical elements are called actions. Together cards and actions are configured to create app screens and features. Every screen is backed by a server-driven feed of card data models.
The ActionCard pattern implementation described here is the result of our team leveraging learnings from across Uber engineering and a lot of our own trial and error. The result is a pattern that allows us to quickly launch new features across multiple screens and apps with a focus on rapid iteration.
The ActionCard pattern has allowed us to reduce complexity and eliminate redundancy. Our hope is that it might be helpful for other teams who want to go fast.
There are several online references on how to run Apache Hadoop® (referred to as “Hadoop” in this article) in Docker for demo and test purposes. In a previous blog post, we described how we containerized Uber’s production Hadoop infrastructure spanning 21,000+ hosts.
HDFS NameNode is the most performance-sensitive component within large multi-tenant Hadoop clusters. We had deferred NameNode containerization to the end of our containerization journey to leverage learnings from containerizing other Hadoop components. In this blog post, we’ll share our experience on how we containerized HDFS NameNodes and architected a zero-downtime migration for 32 clusters.
At Uber, we have been operating an Apache Hadoop® based Data analytics platform since 2015. As adoption picked up exponentially in 2016, we decided to secure our platform with Kerberos authentication. Since then, Kerberos has become a critical component of our security infrastructure supporting not only Uber’s Hadoop ecosystem but also other services that are considered mission critical to our tech stack.
Uber has a large and diverse tech stack deployed on thousands of machines and storing more than 300 PB of analytics data. Systems using Kerberos authentication include YARN, HDFS, Apache Hive™, Apache Kafka®, Apache Zookeeper™, Presto®, Apache Spark™, Apache Flink®, and Apache Pinot™, to name a few. Besides open-source systems, many of the internally developed platforms and services also use Kerberos for authentication. All these services need to securely communicate with each other, as well as provide secure access for all their user clients.
In this blog post, we’ll share some of the key challenges and the solutions that enabled us to scale adoption of Kerberos for authentication at Uber.
In this blog, we share how we improved the daily edit-build-run developer experience using DevPods, our remote development environment. We will start with some of the initial challenges, the pain points we addressed with Devpod, our architecture, and some of our recent successes in terms of adoption and cost reduction. We will finally leave you with some thoughts around the future of remote development at Uber.
Go is a popular programming language used in microservice development, with one of the key features being first-class support for concurrency. Given its burgeoning popularity, it is no surprise that Uber adopted it: the Go monorepo is a centerpiece of its development platform, housing a significant portion of Uber’s codebase in the form of critical business logic, support libraries, or key components of the infrastructure.
Go’s concurrency model is built on goroutines–lightweight threads. Any function call prefixed with the “go” keyword launches the function asynchronously. Goroutines are used abundantly in Go codebases owing to low syntactical overhead and resource requirements, with programs often involving tens to hundreds or thousands of goroutines simultaneously. Two or more goroutines can communicate with each other via message passing over channels, a paradigm inspired by Hoare’s Communicating Sequential Processes. While traditional shared-memory communication is still an option, the development team behind Go encourages users to favor channels, arguing for higher tolerance against data races when used properly.
Push notifications are an integral channel for Uber Eats customers to discover new restaurants, valuable promotions, new offerings such as grocery and alcohol, and the perks of becoming a member, among other things. Push notifications are sent from various teams internally such as Marketing, City Operations, and Product. Since marketing push notifications were introduced in March 2020, not only did the list of teams sending notifications grow quickly, but there was also quick growth in volume to billions of notifications per month by the end of 2020.
At Uber, data informs every decision. Presto is one of the very core engines that powers all sorts of data analytics at Uber. For example, the operations team makes heavy use of Presto for services such as dashboarding; the Uber Eats and marketing teams rely on the results of these queries to make decisions on prices. In addition, Presto is also used in Uber’s compliance department, growth marketing department, and ad-hoc data analytics.
The scale of Presto at Uber is large. Currently, Presto has 9,000 daily active users, processing 500K queries per day and handling over 50PB of data. Uber’s infrastructure encompasses 2 data centers with 7,000 nodes and 20 Presto clusters across 2 regions.
Modern-day technical system deployments generally follow SOA or a microservice-based architecture that allows for clearer separation of concerns, ownership, well-defined dependencies, and abstracts out a single unit of business logic.
Uber has thousands of services coordinating to power up the platform that drives the company at scale. To offer a great experience to customers, developers have to ensure that their service is meeting its functional requirements. Building confidence in meeting functional requirements requires testing of a service.
Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows
At Uber, over 120,000 production workflows are orchestrated, scheduled, and executed every day. These workflows are owned by over 3,000 users across many teams within Uber, powering critical ETL jobs, business metrics, dashboards, machine learning models, or critical regulatory reports. Internally, the Data Workflow Platform (DWP) team makes this happen by developing Uber’s centralized workflow management system with high infrastructure reliability and minimum scheduling latency. The workflow management system also comes with a user-friendly application that allows users to create, author, and manage both streaming and batch workflows in a self-serve way.
Long, long ago, the amount of data our systems output to logs was small enough that we were able to retain all of the log files. This allowed our engineers to freely analyze the logs, say for troubleshooting our systems or improving applications. But as Uber’s business grew rapidly, the amount of data being logged increased dramatically. And so we were forced to discard log files after just a short period of time, given the prohibitive cost of retaining them–that is, until we integrated CLP into the logging library (Log4j) of our big data platform. In aggregate, CLP achieves a 169x compression ratio on our log data, saving storage, memory, and disk/network bandwidth at every level. As a result, we can now retain all logs at a fraction of the cost, without throwing away any insights, and the compressed logs can be efficiently searched without decompression.
Uber has been on a multi-year journey to reimagine our infrastructure stack for a hybrid, multi-cloud world. The internal code name for this project is Crane. In this post we’ll examine the original motivation behind Crane, requirements we needed to satisfy, and some key features of our implementation. Finally, we’ll wrap up with some forward-looking views for Uber’s infrastructure.
The Uber Eats system handles several hundred million product images and millions of image updates are performed every hour. We have implemented a content-addressable caching layer that very effectively detects duplicates and thereby reduces download times, processing times, and storage costs. This full system was developed and completely rolled out in the course of less than 2 months, which improved the latency and reliability of the image service and unblocked projects on our new catalog API development.
Uber uses MySQL as the underlying database engine for Schemaless and Docstore, our distributed databases. By default, MySQL uses the most popular InnoDB engine, a B+Tree structure for data storage. MyRocks is a MySQL storage engine that integrates with RocksDB, an open source project. The RocksDB store is based on the log-structured merge-tree (or LSM tree) and is optimized for fast storage and combines outstanding space and write efficiency with acceptable read performance.
The Uber Storage Platform team has migrated all Schemaless instances and some Docstore instances to MyRocks since 2019. In this post, we are going to talk about the journey of our migration to MyRocks.
While most people are familiar with Uber, not all are familiar with Uber Freight. Uber Freight has been around since 2016 and is dedicated to provide a platform to seamlessly connect shippers with carriers. We’re simplifying the lives of trucking companies by providing a platform for carriers to browse through all available shipment opportunities with upfront pricing and book with the tap of a button, and making the fulfillment process more scalable and efficient.
Providing reliable service to shippers is critical for Uber Freight in order to gain their trust. Because carriers’ performance could significantly impact reliability of Freight’s service, we need to be transparent with carriers about the level to which we are holding them accountable, providing them with a clear view of how well they are performing and, if needed, where they can improve.
To achieve this, Uber Freight developed the Carrier Scorecard to show the carriers several metrics, including engagement with the app, on-time pickup/delivery, tracking automation, and late cancellations. By showing this information in near real time on the Carrier App, we are able to provide feedback to carriers in real time and set ourselves apart from most of our competitors in the industry.
In our last blog post we talked about how we went from polling for refreshing the app to a push-based flow to build our app experience.
All our apps need to be synced with real-time information, whether it’s through pickup time, arrival time, and route lines on the screen, or nearby drivers when you open the app. We use our push platform to deliver these messages that power the real-time user experiences as described in our previous post, which we strongly recommend that you review to learn about the details of the architecture before proceeding.
This blog post will cover how we changed our protocol from Server Sent Events (HTTP1.1) to gRPC-based bidirectional streaming (QUIC/HTTP3), the challenges we faced, the final results, and some key learnings.
If you have read our previous article, ML Education at Uber: Frameworks Inspired by Engineering Principles, you have seen several examples of how Uber benefits from applying Engineering Principles to drive the ML Education Program’s content design and program frameworks.
In this follow-up, we will dig deeper into what we believe to be other unique aspects of ML Education at Uber: our approach to Content Components, Content Delivery, Observability, and Marketing & Reach.