Uber’s Highly Scalable and Distributed Shuffle as a Service

Uber’ s Highly Scalable and Distributed Shuffle as a Service

出处：www.uber.com

存档：存档

译文：中文

摘要

Uber is a data-driven company that heavily relies on offline and online analytics for decision-making. As Uber’s data grows exponentially every year, it’s crucial to process this data very efficiently and with minimum cost. Over the years, Apache Spark™ has become the primary compute engine at Uber to satisfy such data needs. Spark empowers many business-critical use cases at Uber with its unique features, including Uber rides, Uber Eats, autonomous vehicles, ETAs, Maps, and many more. Spark is the primary engine for data warehousing, data science, and AI/ML. In the last few years, Uber’s Spark usage has grown exponentially year over year, running on more than 10,000 nodes in production. Spark jobs now account for more than 95% of analytics cluster compute resources which process hundreds of petabytes of data every day.

阅读原文

帅气君哥于 2022-07-08 分享

8147

关联话题： #Uber #Spark

欢迎在评论区写下你对这篇文章的看法。

Uber’s Highly Scalable and Distributed Shuffle as a Service

Uber’ s Highly Scalable and Distributed Shuffle as a Service

摘要

评论

文库