Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

摘要

Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One of the areas of expertise within GSS is the Digitization vertical. The Digitization team efficiently converts physical signals into digital assets and provides services in labeling, in-field testing, data curation and validations for maps, product incubation, freight BOL (bill of lading), Eats menu uploads, etc.

All these digitization services are performed by thousands of humans (operators) working on our internal applications across many locations around the globe. While an operator is digitizing data, our backend collects a clickstream of all the user interactions in the form of raw events to the scale of 10 million events per day in AWS (Amazon Web Services) cloud infrastructure. Sometimes this data is also moved to Uber’s own data centers. Our data analytics team performs analysis on this data to improve/tweak the process, augment tooling infrastructure, address operator motivation, and improve operator skills. Analytics is usually performed by querying big data lakes and using different frontend tools for visualisation. Generally, any analytics setup has a latency (source to user) component to it and the latency of our existing (pre-COVID) infrastructure was 1 hour. With the onset of COVID-19 crisis, the digitization process had to be transitioned to work-from-home mode, leading to additional operational complexity of remotely managing a huge workforce of operators. This complexity created a gap in team’s communication, decision making, and collaboration. Where 1-hour latency of our analytics platform was previously acceptable, real-time analytics was needed to fill this gap. This blog describes how we improved latency of our data architecture by building a real-time analytics system.

While we researched approaches used for building real-time dashboards (example), we did not find an end-to-end solution, considering how rich visualization can be achieved at lower cost. We considered different visualization approaches and also looked at commercial solutions to come up with our choices. Another differentiating aspect was that our solution also addresses the need for a “single source of truth” on Amazon S3 (Amazon’s “simple storage service”), from which both streaming and batch processed dashboards would to be sourced, rather than hooking directly into the Amazon Kinesis Data Firehose stream itself. This intermediate storage lets us recover data (for the streaming window) with a replay. We production tested our visualizations with thousands of users for low load times and reliability.

欢迎在评论区写下你对这篇文章的看法。

评论

ホーム - Wiki
Copyright © 2011-2024 iteam. Current version is 2.134.0. UTC+08:00, 2024-09-29 14:23
浙ICP备14020137号-1 $お客様$