你最大的客户可能正是你最大的瓶颈

Broccoli queueing illustration

An illustration of messages being bottlenecked.

消息被瓶颈化的示意图。

It was a late night at the Trieve office, and we had just onboarded our largest customer to date onto the platform. That’s when our phones started lighting up with complaints flooding our Slack and Discord channels. Customers were reporting that their documents had been failing to index for hours.

那是一个深夜,Trieve 办公室里灯火通明,我们刚刚把迄今为止最大的客户接入平台。就在这时,手机开始疯狂亮起,投诉如潮水般涌入 Slack 和 Discord 频道。客户纷纷反馈,他们的文档已经连续数小时无法索引。

The culprit was our newest customer, who was dumping millions of documents into our service and had completely clogged our ingestion pipeline. We tried throwing more workers at the problem, but it didn’t help. They were all tied up processing this massive job while everyone else’s data sat in an endless queue.

罪魁祸首是我们最新的客户,他们一次性向我们的服务灌入了数百万份文档,彻底堵死了我们的摄取管道。我们尝试增加更多 worker,但无济于事——所有 worker 都被这个巨大的任务拖住,而其他人的数据只能无限期地排在队列里。

We needed a better way to manage our queues and ensure that all customers were treated fairly.

我们需要一种更好的方式来管理队列,并确保所有客户都得到公平对待。

The Anatomy of a Noisy Neighbor

吵闹邻居的剖析

What we were seeing had a name: the “noisy neighbor” problem. In multi-tenant systems, one greedy tenant can starve out the rest.

我们遇到的现象有个名字:“吵闹邻居”问题。在多租户系统中,一个贪婪的租户就能饿死其他所有租户。

Imagine you’re Cursor, indexing millions of codebases. Each repo is broken down into thousands of files, and each file needs to get pushed through an ingestion pipeline—tokenized, vectorized, chunked, and embedded.

想象一下你是 Cursor,正在为数百万个代码库建立索引。每个仓库被拆分成数千个文件,每个文件都需要经过摄取管道——分词、向量化、分块和嵌入。

Then, a new customer signs up with a massive monorepo containing tens of millions of files. The moment their job hits the pipeline, you can only watch in horror as your other customers’ indexing requests grind to a halt. For the next several hours, your system is held hostage by a single tenant.

然后,一位新客户注册了一个包含数千万个文件的巨型单体仓库。他们的任务一进入管道,你只能惊恐地看着其他客户的索引请求陷入停滞。接下来的几个小时里,你的系统被单一租户“绑架”。

FIFO queue illustration

In a traditional FIFO queue, smaller jobs are stuck waiting behind one massive job.

在传统 FIFO 队列中,较小的任务只能被卡在一个巨型任务后面等待。

This is a fundamental fl...

开通本站会员,查看完整译文。

inicio - Wiki
Copyright © 2011-2025 iteam. Current version is 2.146.0. UTC+08:00, 2025-09-19 07:22
浙ICP备14020137号-1 $mapa de visitantes$