类库 › evals
openai

openai/evals

OpenAI Evals是一个用于评估大语言模型(LLM)及其系统的框架,并提供公开的基准测试库。用户可运行预设评估项来测试模型各方面表现,也能基于自身需求和数据编写私有评估,用以精确衡量和比较不同LLM版本在实际应用场景中的效果。

18,231 2,929 18,231 185
在 GitHub 上查看

技术栈

evals/elsuite/hr_ml_agent_bench/benchmarks/bipedal_walker/scripts python

查看全部依赖 (2)

依赖

gymnasium swig

evals/elsuite/hr_ml_agent_bench/benchmarks/cartpole/scripts python

查看全部依赖 (1)

依赖

gymnasium

evals/elsuite/hr_ml_agent_bench/benchmarks/cifar10/scripts python

查看全部依赖 (1)

依赖

torchvision

evals/elsuite/hr_ml_agent_bench/benchmarks/humanoid/scripts python

查看全部依赖 (2)

依赖

gymnasium stable-baselines3

evals/elsuite/hr_ml_agent_bench/benchmarks/imdb/scripts python

查看全部依赖 (1)

依赖

accelerate

evals/elsuite/hr_ml_agent_bench/benchmarks/ogbn_arxiv/scripts python

查看全部依赖 (5)

依赖

ogb pyg-lib torch-geometric torch-scatter torch-sparse

evals/elsuite/hr_ml_agent_bench/benchmarks/spaceship_titanic/scripts python

查看全部依赖 (1)

依赖

xgboost

evals/elsuite/hr_ml_agent_bench python

查看全部依赖 (6)

依赖

dacite gymnasium scikit-learn stable-baselines3 torch transformers

evals/elsuite/multistep_web_tasks/docker/homepage python

框架

Flask

evals/elsuite/steganography/scripts/dataset python

查看全部依赖 (6)

依赖

apache-beam datasets jiwer nltk scipy spacy-universal-sentence-encoder

evals/elsuite/text_compression/scripts/dataset python

查看全部依赖 (6)

依赖

apache-beam datasets jiwer nltk scipy spacy-universal-sentence-encoder

evals/solvers/providers/google python

查看全部依赖 (1)

依赖

google-generativeai

评论

Home - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.1. UTC+08:00, 2026-04-20 18:33
浙ICP备14020137号-1 $Map of visitor$