实际的 LLM 代理即将到来

Agents are everywhere these days. And yet, the most consequential research development in agentic LLM research is almost unnoticed.

代理人如今无处不在。然而,代理 LLM 研究中最重要的研究进展几乎未被注意。

In January 2025, OpenAI released DeepResearch, a specialized variant of O3 for web and document search. Thanks to "reinforcement learning training on these browsing tasks", Deep Research has gained the capacity to plan for a search strategy, cross-reference sources and niche piece of knowledge on queries based on intermediary feedback. Claude Sonnet 3.7 seems to apply successfully the same recipe for code. The model alone outperform existing orchestrations of past models on complex sequences of programming tasks.

在2025年1月,OpenAI发布了DeepResearch,这是O3的一个专用变体,用于网络和文档搜索。得益于"在这些浏览任务上的强化学习训练",Deep Research获得了规划搜索策略、交叉引用来源和基于中介反馈的小众知识的能力。Claude Sonnet 3.7似乎成功地将相同的配方应用于代码。该模型单独在复杂的编程任务序列上超越了现有的过去模型的编排。

In short, as William Brown puts it, "LLM agents can work for long multi-step tasks".

简而言之,正如 William Brown 所说,"LLM 代理可以处理长时间的多步骤任务"。

This advancement raises the question of what LLM agents really are. In December, Anthropic unveiled a new definition: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks."

这一进展引发了关于 LLM 代理真正是什么的问题。去年12月,Anthropic 揭示了一个新定义:“系统,其中 LLM 动态地指导自己的过程和工具使用,保持对完成任务方式的控制。”

In contrast, all the more commons form of agentic system are contrasted as workflows, "where LLMs and tools are orchestrated through predefined code paths". The recently hyped Manus AI fits exactly this definition. All my tests over the week-end show the same fundamental limitations of workflow systems that were already apparent in the time of AutoGPT, and are especially striking for search:

相比之下,所有更常见的代理系统形式被称为工作流,“在这里,LLM 和工具通过预定义的代码路径进行编排”。最近备受关注的 Manus AI 完全符合这个定义。我在周末进行的所有测试显示了工作流系统的相同基本限制,这些限制在 AutoGPT 时代就已经显而易见,并且在搜索方面尤其显著

  • They can't plan and frequently get stuck in the middle of nowhere.
  • 它们无法规划,常常陷入无处可去的境地。
  • They can't memorize and struggle to maintain a task for more than 5-10 minutes.
  • 他们...
开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.1. UTC+08:00, 2025-03-18 20:00
浙ICP备14020137号-1 $访客地图$