类库
› evals
openai/evals
OpenAI Evals是一个用于评估大语言模型(LLM)及其系统的框架,并提供公开的基准测试库。用户可运行预设评估项来测试模型各方面表现,也能基于自身需求和数据编写私有评估,用以精确衡量和比较不同LLM版本在实际应用场景中的效果。
技术栈
evals/elsuite/hr_ml_agent_bench/benchmarks/bipedal_walker/scripts python
查看全部依赖 (2)
依赖
gymnasium
swig
evals/elsuite/hr_ml_agent_bench/benchmarks/cartpole/scripts python
查看全部依赖 (1)
依赖
gymnasium
evals/elsuite/hr_ml_agent_bench/benchmarks/cifar10/scripts python
查看全部依赖 (1)
依赖
torchvision
evals/elsuite/hr_ml_agent_bench/benchmarks/humanoid/scripts python
查看全部依赖 (2)
依赖
gymnasium
stable-baselines3
evals/elsuite/hr_ml_agent_bench/benchmarks/imdb/scripts python
查看全部依赖 (1)
依赖
accelerate
evals/elsuite/hr_ml_agent_bench/benchmarks/ogbn_arxiv/scripts python
查看全部依赖 (5)
依赖
ogb
pyg-lib
torch-geometric
torch-scatter
torch-sparse
evals/elsuite/hr_ml_agent_bench/benchmarks/spaceship_titanic/scripts python
查看全部依赖 (1)
依赖
xgboost
evals/elsuite/hr_ml_agent_bench python
查看全部依赖 (6)
依赖
dacite
gymnasium
scikit-learn
stable-baselines3
torch
transformers
evals/elsuite/multistep_web_tasks/docker/homepage python
框架
Flask
evals/elsuite/steganography/scripts/dataset python
查看全部依赖 (6)
依赖
apache-beam
datasets
jiwer
nltk
scipy
spacy-universal-sentence-encoder
evals/elsuite/text_compression/scripts/dataset python
查看全部依赖 (6)
依赖
apache-beam
datasets
jiwer
nltk
scipy
spacy-universal-sentence-encoder
evals/solvers/providers/google python
查看全部依赖 (1)
依赖
google-generativeai