基于机器学习的Flink预测性自动扩展

As Grab transitions to derive more valuable insights from our wealth of operational data, we are witnessing a steep increase in stream-processing applications. Over the past year, the number of Flink applications grew 2.5 times, driven by interest in real-time stream processing and the improved accessibility of developing such applications with Flink SQL. At this scale, it has become crucial for the internal Flink platform team to provide a cost-effective and self-service offering that supports users of diverse backgrounds.

随着 Grab 转向从我们丰富的操作数据中获取更有价值的见解,我们目睹了流处理应用程序的急剧增加。在过去一年中,Flink 应用程序的数量增长了 2.5 倍,这得益于对实时流处理的兴趣以及使用 Flink SQL 开发此类应用程序的可访问性提高。在这种规模下,内部 Flink 平台团队提供一个具有成本效益自助服务的产品,以支持不同背景的用户变得至关重要。

Flink at Grab is deployed in application mode, each pipeline has its own isolated resources for JobManager and TaskManager. Flink pipeline creators control both application logic and deployment configuration that affect throughput and performance, including OSS configurations:

Grab的Flink以应用模式部署,每个管道都有自己隔离的JobManager和TaskManager资源。Flink管道创建者控制影响吞吐量和性能的应用逻辑和部署配置,包括OSS配置:

  • Number of TaskManagers and task slots per TaskManager
  • 每个TaskManager的TaskManagers数量和任务槽数量
  • CPU cores per TaskManager
  • 每个TaskManager的CPU核心数
  • Memory per TaskManager
  • 每个 TaskManager 的内存

As pipeline creation has become more accessible, users of different backgrounds (analyst, data scientist, engineers, etc.) often struggle to choose a set of configurations that work for their applications. Many go through a long process of trial and error and still end up over-provisioning their applications, leading to huge resource waste. Moreover, pipeline behavior changes over time due to changes in application logic or data pattern, invalidating previous efforts in tuning and causing users to repeat the exercise.

随着管道创建变得更加可访问,不同背景的用户(分析师、数据科学家、工程师等)常常在选择适合其应用程序的一组配置时遇到困难。许多人经历了漫长的试错过程,最终仍然过度配置他们的应用程序,导致巨大的资源浪费。此外,由于应用程序逻辑或数据模式的变化,管道行为随时间变化,使得之前的调优努力失效,导致用户不得不重复这一过程。

In this article, we focus on addressing the challenge of efficient CPU provisioning for TaskManagers, as CPU...

开通本站会员,查看完整译文。

Главная - Вики-сайт
Copyright © 2011-2025 iteam. Current version is 2.147.1. UTC+08:00, 2025-10-31 09:02
浙ICP备14020137号-1 $Гость$