使用RAGAs评估RAG应用程序

RAGAs (Retrieval-Augmented Generation Assessment) is a framework (GitHub, Docs) that provides you with the necessary ingredients to help you evaluate your RAG pipeline on a component level.

RAGAs(Retrieval-Augmented Generation Assessment)是一个框架(GitHubDocs),为您提供了必要的元素,帮助您在组件级别上评估您的RAG管道。

Evaluation Data

评估数据

What’s interesting about RAGAs is that it started out as a framework for “reference-free” evaluation [1]. That means, instead of having to rely on human-annotated ground truth labels in the evaluation dataset, RAGAs leverages LLMs under the hood to conduct the evaluations.

RAGAs有趣的地方在于它最初是一个用于“无参考”评估的框架[1]。这意味着,不需要依赖于评估数据集中人工标注的基准标签,RAGAs在内部利用LLMs进行评估。

To evaluate the RAG pipeline, RAGAs expects the following information:

为了评估 RAG pipeline,RAGAs 需要以下信息:

  • question: The user query that is the input of the RAG pipeline. The input.
  • question: 用户查询,作为 RAG 流程的输入。
  • answer: The generated answer from the RAG pipeline. The output.
  • answer:RAG管道生成的答案。输出结果。
  • contexts: The contexts retrieved from the external knowledge source used to answer the question.
  • contexts:从外部知识源检索到的上下文,用于回答question
  • ground_truths: The ground truth answer to the question. This is the only human-annotated information. This information is only required for the metric context_recall (see Evaluation Metrics).
  • ground_truths:对question的人工标注的基准答案。这是唯一的人工标注信息。此信息仅在度量标准context_recall(参见Evaluation Metrics)中需要。

Leveraging LLMs for reference-free evaluation is an active research topic. While using as little human-annotated data as possible makes it a cheaper and faster evaluation method, there is still some discussion about its shortcomings, such as bias [3]. However, some papers have already shown promising results [4]. For detailed information, see the “Related Work” section of the RAGAs [1] paper.

利用 LLMs 进行无参考评估是一个活跃的研究课题。尽可能少地使用人工标注数据使其成为一种更便宜、更快速的评估方法,但关于其缺点(如偏见 [3])仍存在一些讨论。然而,一些论文已经展示了有希望的结果 [4]。有关详细信息,请参阅 RAGAs [1] 论文的“相关工作”部分。

Note that the framework has expanded to provide metrics and paradigms that require ground truth labels (e.g., context_recall and ...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 15:50
浙ICP备14020137号-1 $访客地图$