使用 gpt-oss 和 Hugging Face Transformers 进行微调

Authored by: Edward Beeching, Quentin Gallouédec, and Lewis Tunstall

作者:Edward BeechingQuentin GallouédecLewis Tunstall

Large reasoning models like OpenAI o3 generate a chain-of-thought to improve the accuracy and quality of their responses. However, most of these models reason in English, even when a question is asked in another language.

OpenAI o3 这样的大型推理模型会生成思维链,以提高其回答的准确性和质量。然而,这些模型中的大多数都用英语进行推理,即使问题是用另一种语言提出的。

In this notebook, we show how OpenAI's open-weight reasoning model OpenAI gpt-oss-20b can be fine-tuned to reason effectively in multiple languages. We'll do this by adding a new "reasoning language" option to the model's system prompt, and applying supervised fine-tuning with Hugging Face's TRL library on a multilingual reasoning dataset.

在本笔记本中,我们展示了如何对 OpenAI 的开源推理模型 OpenAI gpt-oss-20b 进行微调,使其能够在多种语言中有效推理。我们将通过在模型的系统提示中添加新的 “推理语言” 选项,并使用 Hugging Face 的 TRL 库 在多语言推理数据集上进行 监督微调 来实现这一目标。

We'll cover the following steps:

我们将涵盖以下步骤:

  1. Setup: Install the required libraries.
  2. 设置: 安装所需的库。
  3. Prepare the dataset: Download and format the dataset for fine-tuning.
  4. 准备数据集:下载并格式化数据集,以便进行微调。
  5. Prepare the model: Loading the base model and configure it for fine-tuning LoRA, a memory efficient technique.
  6. 准备模型: 加载基础模型并配置为使用 LoRA 进行微调,这是一种节省内存的技术。
  7. Fine-tuning: Train the model with our multilingual reasoning data.
  8. 微调: 使用我们的多语言推理数据训练模型。
  9. Inference: Generate reasoning responses in different languages using the fine-tuned model.
  10. 推理:使用微调后的模型,以多种语言生成推理回复。

The end result is a multilingual reasoning model that can generate a chain-of-thought in English, Spanish, French, Italian, or German. You can even mix languages—for example, ask a question in Spanish, request reasoning in German, and receive the final response in Spanish:

最终得到的是一个多语言推理模型,它可以用英语、西班牙语、法语、意大利语或德语生成思维链。你甚至可以混合语言——例如,用西班牙语提问,要求用德语推理,最终用西班牙语获得回答:

User:
 ¿Cuál es el capital de Australia?
Assistant reasoning:
 Okay, der Benutzer fragt nach der Hauptstadt Australiens. Ich erinnere mich, dass Canberra die Hauptstadt ist. Ich
 sollte das bestätigen. Lass mich sehen...
开通本站会员,查看完整译文。

Home - Wiki
Copyright © 2011-2025 iteam. Current version is 2.144.3. UTC+08:00, 2025-08-08 22:03
浙ICP备14020137号-1 $Map of visitor$