类库 › SDAR
ZJU-REAL

ZJU-REAL/SDAR

SDAR是自我蒸馏智能体强化学习方法的官方代码,旨在提升ALFWorld、WebShop等环境中的RL性能。该仓库提供完整的安装指南与实验脚本,适用于研究人员复现论文结果及进行相关算法开发。

技术栈

agent_system/environments/env_package/webshop/webshop/baseline_models python

查看全部依赖 (5)

依赖

accelerate datasets faiss-gpu transformers wandb

agent_system/environments/env_package/webshop/webshop python

测试

pytest

网络

Requests
查看全部依赖 (25)

依赖

Flask NumPy Pandas PyYAML Pydantic Werkzeug beautifulsoup4 cleantext env gdown gradio gym peft pyserini rank_bm25 requests_mock rich scikit_learn selenium spacy thefuzz torch tqdm train transformers

根目录 python

框架

FastAPI
查看全部依赖 (49)

依赖

NumPy Pandas _libgcc_mutex 0.1=conda_forge _openmp_mutex 4.5=2_gnu accelerate bzip2 1.0.8=hd590300_5 ca-certificates 2025.1.31=hbcca054_0 codetiming datasets dill flash-attn hydra-core icu 73.2=h59595ed_0 ld_impl_linux-64 2.40=h41732ed_0 libexpat 2.6.2=h59595ed_0 libffi 3.4.2=h7f98852_5 libgcc-ng 13.2.0=h807b86a_5 libgomp 13.2.0=h807b86a_5 libnsl 2.0.1=hd590300_0 libsqlite 3.45.3=h2797004_0 libstdcxx-ng 13.2.0=h7e041cc_5 libuuid 2.38.1=h0b41bf4_0 libuv 1.46.0=hd590300_0 libzlib 1.2.13=hd590300_5 liger-kernel ncurses 6.4.20240210=h59595ed_0 nodejs 16.20.2=h1990674_2 openssl 3.2.1=hd590300_1 packaging peft pip 24.0=pyhd8ed1ab_0 pre-commit pyarrow pybind11 pylatexenc python 3.12.0=hab00c5b_0_cpython qwen-vl-utils ray readline 8.2=h8228510_1 tensordict tk 8.6.13=noxft_h4845f30_101 torchdata transformers tzdata 2024a=h0c530f3_0 uvicorn wandb wheel 0.43.0=pyhd8ed1ab_1 xz 5.2.6=h166bdaf_0 zlib 1.2.13=hd590300_5

评论

inicio - Wiki
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-05-26 23:50
浙ICP备14020137号-1 $mapa de visitantes$