类库
› SDAR
ZJU-REAL/SDAR
SDAR是自我蒸馏智能体强化学习方法的官方代码,旨在提升ALFWorld、WebShop等环境中的RL性能。该仓库提供完整的安装指南与实验脚本,适用于研究人员复现论文结果及进行相关算法开发。
技术栈
agent_system/environments/env_package/webshop/webshop/baseline_models python
查看全部依赖 (5)
依赖
accelerate
datasets
faiss-gpu
transformers
wandb
agent_system/environments/env_package/webshop/webshop python
测试
pytest
网络
Requests
查看全部依赖 (25)
依赖
Flask
NumPy
Pandas
PyYAML
Pydantic
Werkzeug
beautifulsoup4
cleantext
env
gdown
gradio
gym
peft
pyserini
rank_bm25
requests_mock
rich
scikit_learn
selenium
spacy
thefuzz
torch
tqdm
train
transformers
根目录 python
框架
FastAPI
查看全部依赖 (49)
依赖
NumPy
Pandas
_libgcc_mutex
0.1=conda_forge
_openmp_mutex
4.5=2_gnu
accelerate
bzip2
1.0.8=hd590300_5
ca-certificates
2025.1.31=hbcca054_0
codetiming
datasets
dill
flash-attn
hydra-core
icu
73.2=h59595ed_0
ld_impl_linux-64
2.40=h41732ed_0
libexpat
2.6.2=h59595ed_0
libffi
3.4.2=h7f98852_5
libgcc-ng
13.2.0=h807b86a_5
libgomp
13.2.0=h807b86a_5
libnsl
2.0.1=hd590300_0
libsqlite
3.45.3=h2797004_0
libstdcxx-ng
13.2.0=h7e041cc_5
libuuid
2.38.1=h0b41bf4_0
libuv
1.46.0=hd590300_0
libzlib
1.2.13=hd590300_5
liger-kernel
ncurses
6.4.20240210=h59595ed_0
nodejs
16.20.2=h1990674_2
openssl
3.2.1=hd590300_1
packaging
peft
pip
24.0=pyhd8ed1ab_0
pre-commit
pyarrow
pybind11
pylatexenc
python
3.12.0=hab00c5b_0_cpython
qwen-vl-utils
ray
readline
8.2=h8228510_1
tensordict
tk
8.6.13=noxft_h4845f30_101
torchdata
transformers
tzdata
2024a=h0c530f3_0
uvicorn
wandb
wheel
0.43.0=pyhd8ed1ab_1
xz
5.2.6=h166bdaf_0
zlib
1.2.13=hd590300_5