从 SSH 到 REST:Slack 的 EMR 数据管道的安全驱动现代化

Mahendran VasagamStaff Software Engineer

Mahendran Vasagam 资深软件工程师

From SSH to REST_ A Security-Driven Modernization of Slack's EMR Data Pipelines

From SSH to REST_ A Security-Driven Modernization of Slack's EMR Data Pipelines

从 SSH 到 REST:Slack 的 EMR 数据管道的安全驱动现代化

Excerpt

摘录

By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security surface, and we couldn’t move forward on any infrastructure modernization. Not ideal.

到 2024 年,Slack 的数据平台积累了 700+ 个基于 SSH 的 operators,这些 operators 编排关键数据管道。我们说的是每天处理数 TB 数据的搜索索引、为商业智能提供动力的分析作业,整个一大堆。这些作业中的每一个都需要直接 SSH 访问生产 AWS Elastic MapReduce (EMR) 集群。我们有一个巨大的攻击面,无法推进任何基础设施现代化。不理想。

We needed to eliminate SSH entirely. The solution? Migrate all 700+ jobs to a REST-based architecture. This is the story of how we killed SSH entirely, across 8 data regions, with zero downtime.

我们需要完全消除 SSH。解决方案?将所有 700+ 个作业迁移到基于 REST 的架构。这是我们如何在 8 个数据区域完全消灭 SSH,并在零停机时间的情况下完成的故事。

How We Got Here

我们是如何走到这一步的

Slack’s data platform was built around 2017 with a straightforward pattern. Airflow, our data pipeline orchestrator, needed to run jobs on EMR clusters, and SSH was the most direct path. Connect to the EMR master node, execute a command, done. Simple.

Slack 的数据平台大约在 2017 年构建,使用了一种简单的模式。Airflow,我们的数据管道编排器,需要在 EMR 集群上运行作业,而 SSH 是最直接的路径。连接到 EMR 主节点,执行命令,完成。简单。

# The old way - simple, but problematic
task = SSHOperator(
    task_id='run_spark_job',
    ssh_conn_id='emr_master',
    command='spark-submit /path/to/job.py',
)

This pattern proliferated across the platform. Teams built custom SSH-based operators for different use cases (because hey, if SSH works for Spark, why not everything else). By the time we took stock, we had 700+ jobs in production running everything from MapReduce jobs to AWS CLI commands to custom Python scripts.

这种模式在整个平台上泛滥。团队为不同用例构建了自定义的基于 SSH...

开通本站会员,查看完整译文。

ホーム - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-05-08 23:17
浙ICP备14020137号-1 $お客様$