训练 Sidekick 学会拒绝：基于 LLM 评审共识的自动化数据筛选

Training data has always been the hard part of machine learning. Not the architecture, not the compute, but the data itself. You need enough of it, you need it clean and diverse, and you need it to cover the cases that actually matter. Before LLMs, that mostly meant labeling pipelines and feature engineering. Now, with LLMs, the problem doesn't go away but it shifts. When you fine-tune a foundation model for a specific domain, you're teaching behavior: how the model should interpret requests, handle ambiguity, and recognize when a request is genuinely impossible. We’ve learned that the last one is the hardest to teach.

训练数据一直是机器学习的难点。不是架构，不是算力，而是数据本身。你需要足够多的数据，需要它干净且多样化，并且需要它覆盖真正重要的情况。在LLM出现之前，这主要意味着标注流程和特征工程。现在，有了LLM，问题并没有消失，而是发生了转移。当你针对特定领域微调基础模型时，你是在教授行为：模型应该如何解释请求、处理歧义，以及识别何时请求是真正不可能的。我们已经了解到，最后一点是最难教授的。

It's hardest because production data has a blind spot. Every example in a production training corpus is a success story: the model did something right, it got evaluated, it shipped. The cases where it should have refused, the edge cases, the impossible requests… those never make it into the logs. So you end up with a model trained entirely on successful queries. When it hits something it can't fulfill, there's no learned behavior to fall back on. It improvises, usually badly.

这之所以最难，是因为生产数据存在盲区。生产训练语料库中的每个样本都是成功案例：模型做对了某事，它被评估，它被发布。那些它本该拒绝的案例、边缘案例、不可能的请求……这些永远不会进入日志。所以你最终得到一个完全在成功查询上训练的模型。当它遇到无法满足的事情时，没有学到的行为可以依赖。它只能自行编造，通常表现得很糟。

We ran into this while building Sidekick, our AI assistant for merchants. Sidekick works on two layers: an outer planner that interprets the merchant's overall intent, and a set of specialized skill models that each handle a specific capability. The planner routes "send a discount to my best customers" to segmentation, analytics, email, and so on. As we covered in our article on building production-ready agentic systems, keeping those skills performing well takes continuous work.

我们在构建面向商家的 AI 助手 Sidekick 时遇到了这个问题。Sidekick 在两个层级上运行：一个解释商家整体意图的外部规划器，以及一组各自处理特定能力的专业技能模型。规划器将"向我最好的客户发送...