介绍 AutoPatchBench：一个用于 AI 驱动安全修复的基准测试

We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing.
我们正在推出 AutoPatchBench，这是一个用于自动修复通过模糊测试识别的漏洞的基准。
By providing a standardized benchmark, AutoPatchBench enables researchers and practitioners to objectively evaluate and compare the effectiveness of various AI program repair systems.
通过提供标准化基准，AutoPatchBench 使研究人员和从业者能够客观评估和比较各种 AI 程序修复系统的有效性。
This initiative facilitates the development of more robust security solutions, and also encourages collaboration within the community to address the critical challenge of software vulnerability repair.
这一倡议促进了更强大安全解决方案的开发，并鼓励社区内的合作，以应对软件漏洞修复的关键挑战。
AutoPatchBench is available now on GitHub.
AutoPatchBench 现在可以在 GitHub.

AI is increasingly being applied to solve security challenges, including repairing vulnerabilities identified through fuzzing. However, the lack of a standardized benchmark for objectively assessing AI-driven bug repair agents specific to fuzzing has impeded progress in academia and the broader community. Today, we are publicly releasing AutoPatchBench, a benchmark designed to evaluate AI program repair systems. AutoPatchBench sits within CyberSecEval 4, Meta’s new benchmark suite for evaluating AI capabilities to support defensive use cases. It features 136 fuzzing-identified C/C++ vulnerabilities in real-world code repos along with verified fixes sourced from the ARVO dataset.

AI 正越来越多地应用于解决安全挑战，包括修复通过模糊测试识别的漏洞。然而，缺乏一个标准化的基准来客观评估特定于模糊测试的 AI 驱动的错误修复代理，阻碍了学术界和更广泛社区的进展。今天，我们公开发布了 AutoPatchBench，这是一个旨在评估 AI 程序修复系统的基准。AutoPatchBench 位于 CyberSecEval 4 中，这是 Meta 新的基准套件，用于评估 AI 在支持防御用例方面的能力。它包含 136 个在真实代码库中通过模糊测试识别的 C/C++ 漏洞，以及来自 ARVO 数据集的经过验证的修复。

AutoPatchBench provides a standardized evaluation framework for assessing the effectiveness of AI-assisted vulnerability repair tools. This benchmark aims to facilitate a comprehensive understanding of the capabilities and limitations of various AI-driven approaches to repairing fuzzing-found bugs. By offering a consistent set of evaluation criteria, Au...