Uber’ s Highly Scalable and Distributed Shuffle as a Service
摘要
Uber is a data-driven company that heavily relies on offline and online analytics for decision-making. As Uber’s data grows exponentially every year, it’s crucial to process this data very efficiently and with minimum cost. Over the years, Apache Spark™ has become the primary compute engine at Uber to satisfy such data needs. Spark empowers many business-critical use cases at Uber with its unique features, including Uber rides, Uber Eats, autonomous vehicles, ETAs, Maps, and many more. Spark is the primary engine for data warehousing, data science, and AI/ML. In the last few years, Uber’s Spark usage has grown exponentially year over year, running on more than 10,000 nodes in production. Spark jobs now account for more than 95% of analytics cluster compute resources which process hundreds of petabytes of data every day.
欢迎在评论区写下你对这篇文章的看法。


