智能体测试：智能体在E2E测试栈中的定位

Sergii GorbachovStaff Software Engineer

Sergii GorbachovStaff 软件工程师

Agentic vs. traditional testing paths

Agent驱动与传统测试路径

Agentic vs. traditional testing paths

Agentic 与传统测试路径

Abstract

摘要

Agent-driven end-to-end (E2E) tests add a new exploratory layer to testing, but should they replace traditional deterministic tests? We ran more than 200 agentic E2E workflows using the Playwright MCP, Playwright CLI, and agent-generated Playwright tests in test workspaces using non-production data to find out how agentic testing could fit into both our and your testing stacks.

智能体驱动的端到端 (E2E) 测试为测试增加了一个新的探索层，但它们是否应该取代传统的确定性测试？我们在测试工作区中使用非生产数据，通过 Playwright MCP、Playwright CLI 和智能体生成的 Playwright 测试运行了 200 多个智能体 E2E 工作流，以了解智能体测试如何融入我们和您的测试技术栈。

1. From Journeys to Goals

1. 从旅程到目标

Traditional end-to-end tests validate a specific journey through the UI.

传统的端到端测试验证通过 UI 的特定流程。

click → click → type → assert

点击 → 点击 → 输入 → 断言

Agent-driven tests instead validate whether a goal can be achieved, often expressed as an instruction (e.g. “send a thread message”):

而智能体驱动的测试则验证目标是否能够实现，通常以指令的形式表达（例如“发送一条帖子消息”）：

goal → agent adapts → verify result

目标 → 智能体调整 → 验证结果

This difference can be summarized simply:

这种差异可以简单地总结为：

Tests enforce journeys. Agents verify goals.

测试强制执行流程。Agent验证目标。

Across our agentic test runs, the overall workflow remained consistent (e.g. login → search → result → clear), but the exact sequence of actions varied. In practice, agents took different paths to reach the same outcome:

在我们的智能体测试运行中，整体工作流保持一致（例如，登录 → 搜索 → 结果 → 清除），但具体的操作序列有所不同。在实践中，智能体采取了不同的路径来达到相同的结果：

Different input methods (clicking a search suggestion vs pressing Enter)
不同的输入方式（点击搜索建议与按下回车键）
Different navigation patterns (reopening search vs reusing existing state)
不同的导航模式（重新打开搜索与重用现有状态）
Additional or skipped steps (extra clicks, snapshots, or intermediate actions)
额外或跳过的步骤（额外的点击、快照或中间操作）

Agents can still validate intermediate steps when needed, but this flexibility comes with tradeoffs in reliability, cost, and execution time, which we explore in the next sections.

智能体在需...