ART·E:我们如何构建一个超越o3的电子邮件研究代理

OpenAI's version of Deep Research showed how effective reinforcement learning (RL) can be for teaching an agent a specific task. Compared to previous research agents, it was a major step forward in effectiveness. With our new project, "ART·E", we've applied this training recipe to a new, realistic task: answering natural-language questions by searching an email inbox. We've produced a model that is faster, cheaper, and more accurate than o3 on this task.

OpenAI 的 深度研究 版本展示了强化学习(RL)在教授代理特定任务方面的有效性。与之前的研究代理相比,这是有效性的重大进步。在我们的新项目“ART·E”中,我们将这一训练方法应用于一个新的、现实的任务:通过搜索电子邮件收件箱回答自然语言问题。我们生产的模型在这个任务上比 o3 更快更便宜更准确

Accuracy comparison

ART·E outperforms o3 on a realistic agentic research task, answering 60% of the questions that o3 missed.

ART·E 在一个现实的代理研究任务中优于 o3,回答了 o3 错过的 60% 的问题。

Cost and latency comparison

ART·E is 5x faster and 64x cheaper to run than o3.

ART·E的运行速度比o3快5倍,成本低64倍。

We've open-sourced both the final model and all training code. In this post we'll cover the details of how we built it, and lessons learned you can apply to build your own!

我们已经开源了 最终模型所有训练代码。在这篇文章中,我们将详细介绍我们是如何构建它的,以及您可以应用于自己构建的经验教训!

Let's Define Our Task

让我们定义我们的任务

For this project we wanted a task that was realistic and useful, while still being narrow enough to see quick improvement through RL.

对于这个项目,我们希望有一个现实且有用的任务,同时又足够狭窄,以便通过RL看到快速的改进。

Searching my email inbox felt like the perfect match. I've wanted an agent that can do this for me forever. Today, when I have a question like "how do I RSVP for my daughter's classroom party" or "what time is my brother's flight on Friday" I have to open my inbox, think up keywords, and read through search results. So 2022. Why isn't AI doing this for me? ART·E is designed to do exactly this, with style!

搜索我的电子邮件收件箱感觉就像是完美的匹配。我一直想要一个可以为我做这件事的代理。今天,当我有一个问题,比如“我如何回复我女儿的课堂聚会”或“我兄弟周五的航班是什么时候”时,我必须打开我的收件箱,想出关键词,并阅读搜索结果。所以 2022。为什么AI不为我做这件事呢?ART·E的设计正是为了做到这一点,且风格独特!

"Synthetic" Data

“合成”数据

To train a model and evaluate its performance, we needed a realistic dataset of answerable email queries. Luckily for us, when notorious energy trader Enron was sued fo...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.144.0. UTC+08:00, 2025-06-05 03:58
浙ICP备14020137号-1 $访客地图$