使用RAGAs评估RAG应用程序

RAGAs (Retrieval-Augmented Generation Assessment) is a framework (GitHub, Docs) that provides you with the necessary ingredients to help you evaluate your RAG pipeline on a component level.

RAGAs（Retrieval-Augmented Generation Assessment）是一个框架（GitHub，Docs），为您提供了必要的元素，帮助您在组件级别上评估您的RAG管道。

Evaluation Data

评估数据

What’s interesting about RAGAs is that it started out as a framework for “reference-free” evaluation [1]. That means, instead of having to rely on human-annotated ground truth labels in the evaluation dataset, RAGAs leverages LLMs under the hood to conduct the evaluations.

RAGAs有趣的地方在于它最初是一个用于“无参考”评估的框架[1]。这意味着，不需要依赖于评估数据集中人工标注的基准标签，RAGAs在内部利用LLMs进行评估。

To evaluate the RAG pipeline, RAGAs expects the following information:

为了评估 RAG pipeline，RAGAs 需要以下信息：

question: The user query that is the input of the RAG pipeline. The input.
question: 用户查询，作为 RAG 流程的输入。
answer: The generated answer from the RAG pipeline. The output.
answer：RAG管道生成的答案。输出结果。
contexts: The contexts retrieved from the external knowledge source used to answer the question.
contexts：从外部知识源检索到的上下文，用于回答question。
ground_truths: The ground truth answer to the question. This is the only human-annotated information. This information is only required for the metric context_recall (see Evaluation Metrics).
ground_truths：对question的人工标注的基准答案。这是唯一的人工标注信息。此信息仅在度量标准context_recall（参见Evaluation Metrics）中需要。

Leveraging LLMs for reference-free evaluation is an active research topic. While using as little human-annotated data as possible makes it a cheaper and faster evaluation method, there is still some discussion about its shortcomings, such as bias [3]. However, some papers have already shown promising results [4]. For detailed information, see the “Related Work” section of the RAGAs [1] paper.

利用 LLMs 进行无参考评估是一个活跃的研究课题。尽可能少地使用人工标注数据使其成为一种更便宜、更快速的评估方法，但关于其缺点（如偏见 [3]）仍存在一些讨论。然而，一些论文已经展示了有希望的结果 [4]。有关详细信息，请参阅 RAGAs [1] 论文的“相关工作”部分。

Note that the framework has expanded to provide metrics and paradigms that require ground truth labels (e.g., context_recall and ...