介绍 AutoPatchBench:一个用于 AI 驱动安全修复的基准测试
- We are introducing AutoPatchBench, a benchmark for the automated repair of vulnerabilities identified through fuzzing.
- 我们正在推出 AutoPatchBench,这是一个用于自动修复通过模糊测试识别的漏洞的基准。
- By providing a standardized benchmark, AutoPatchBench enables researchers and practitioners to objectively evaluate and compare the effectiveness of various AI program repair systems.
- 通过提供标准化基准,AutoPatchBench 使研究人员和从业者能够客观评估和比较各种 AI 程序修复系统的有效性。
- This initiative facilitates the development of more robust security solutions, and also encourages collaboration within the community to address the critical challenge of software vulnerability repair.
- 这一倡议促进了更强大安全解决方案的开发,并鼓励社区内的合作,以应对软件漏洞修复的关键挑战。
- AutoPatchBench is available now on GitHub.
- AutoPatchBench 现在可以在 GitHub.
AI is increasingly being applied to solve security challenges, including repairing vulnerabilities identified through fuzzing. However, the lack of a standardized benchmark for objectively assessing AI-driven bug repair agents specific to fuzzing has impeded progress in academia and the broader community. Today, we are publicly releasing AutoPatchBench, a benchmark designed to evaluate AI program repair systems. AutoPatchBench sits within CyberSecEval 4, Meta’s new benchmark suite for evaluating AI capabilities to support defensive use cases. It features 136 fuzzing-identified C/C++ vulnerabilities in real-world code repos along with verified fixes sourced from the ARVO dataset.
AI 正越来越多地应用于解决安全挑战,包括修复通过模糊测试识别的漏洞。然而,缺乏一个标准化的基准来客观评估特定于模糊测试的 AI 驱动的错误修复代理,阻碍了学术界和更广泛社区的进展。今天,我们公开发布了 AutoPatchBench,这是一个旨在评估 AI 程序修复系统的基准。AutoPatchBench 位于 CyberSecEval 4 中,这是 Meta 新的基准套件,用于评估 AI 在支持防御用例方面的能力。它包含 136 个在真实代码库中通过模糊测试识别的 C/C++ 漏洞,以及来自 ARVO 数据集 的经过验证的修复。
AutoPatchBench provides a standardized evaluation framework for assessing the effectiveness of AI-assisted vulnerability repair tools. This benchmark aims to facilitate a comprehensive understanding of the capabilities and limitations of various AI-driven approaches to repairing fuzzing-found bugs. By offering a consistent set of evaluation criteria, Au...