使用 gpt-oss 和 Hugging Face Transformers 进行微调
Authored by: Edward Beeching, Quentin Gallouédec, and Lewis Tunstall
作者:Edward Beeching、Quentin Gallouédec 和 Lewis Tunstall
Large reasoning models like OpenAI o3 generate a chain-of-thought to improve the accuracy and quality of their responses. However, most of these models reason in English, even when a question is asked in another language.
像 OpenAI o3 这样的大型推理模型会生成思维链,以提高其回答的准确性和质量。然而,这些模型中的大多数都用英语进行推理,即使问题是用另一种语言提出的。
In this notebook, we show how OpenAI's open-weight reasoning model OpenAI gpt-oss-20b can be fine-tuned to reason effectively in multiple languages. We'll do this by adding a new "reasoning language" option to the model's system prompt, and applying supervised fine-tuning with Hugging Face's TRL library on a multilingual reasoning dataset.
在本笔记本中,我们展示了如何对 OpenAI 的开源推理模型 OpenAI gpt-oss-20b 进行微调,使其能够在多种语言中有效推理。我们将通过在模型的系统提示中添加新的 “推理语言” 选项,并使用 Hugging Face 的 TRL 库 在多语言推理数据集上进行 监督微调 来实现这一目标。
We'll cover the following steps:
我们将涵盖以下步骤:
- Setup: Install the required libraries.
- 设置: 安装所需的库。
- Prepare the dataset: Download and format the dataset for fine-tuning.
- 准备数据集:下载并格式化数据集,以便进行微调。
- Prepare the model: Loading the base model and configure it for fine-tuning LoRA, a memory efficient technique.
- 准备模型: 加载基础模型并配置为使用 LoRA 进行微调,这是一种节省内存的技术。
- Fine-tuning: Train the model with our multilingual reasoning data.
- 微调: 使用我们的多语言推理数据训练模型。
- Inference: Generate reasoning responses in different languages using the fine-tuned model.
- 推理:使用微调后的模型,以多种语言生成推理回复。
The end result is a multilingual reasoning model that can generate a chain-of-thought in English, Spanish, French, Italian, or German. You can even mix languages—for example, ask a question in Spanish, request reasoning in German, and receive the final response in Spanish:
最终得到的是一个多语言推理模型,它可以用英语、西班牙语、法语、意大利语或德语生成思维链。你甚至可以混合语言——例如,用西班牙语提问,要求用德语推理,最终用西班牙语获得回答:
User:
¿Cuál es el capital de Australia?
Assistant reasoning:
Okay, der Benutzer fragt nach der Hauptstadt Australiens. Ich erinnere mich, dass Canberra die Hauptstadt ist. Ich
sollte das bestätigen. Lass mich sehen...