深入探讨像 ChatGPT 这样的 LLMs，作者：Andrej Karpathy（简而言之）

LLM ChatGPT TL;DR

A few days ago, Andrej Karpathy released a video titled "Deep dive into LLMs like ChatGPT." It’s a goldmine of information, but it’s also 3 hours and 31 minutes long. I watched the whole thing and took a bunch of notes, so I figured why not put together a TL;DR version for anyone who wants the essential takeaways without the large time commitment.

几天前，Andrej Karpathy发布了一段视频，标题为“深入探讨像ChatGPT这样的LLMs。”这是一座信息的金矿，但它也有3小时31分钟长。我看了整个视频并做了很多笔记，所以我想为什么不为那些想要获取精华而不想花费大量时间的人整理一个TL;DR版本呢。

If any of this sounds like you, this post (and the original video) is worth checking out:

如果这些听起来像你，这篇文章（和原始视频）值得一看：

You want to understand how LLMs actually work not just at the surface level.
你想理解 LLM 实际是如何工作的，而不仅仅是表面层面。
You want to understand confusing fine-tuning terms like chat_template and ChatML (especially if you're using Axolotl).
你想理解令人困惑的微调术语，如 chat_template 和 ChatML（特别是如果你正在使用 Axolotl）。
You want to get better at prompt engineering by understanding why some prompts work better than others.
你想通过理解为什么某些提示比其他提示更有效来提高提示工程能力。
You’re trying to reduce hallucinations and want to know how to keep LLMs from making things up.
您正在尝试减少幻觉，并想知道如何防止 LLM 编造内容。
You want to understand why DeepSeek-R1 is such a big deal right now.
您想了解为什么 DeepSeek-R1 现在如此重要。

I won’t be covering everything in the video, so if you have time, definitely watch the whole thing. But if you don’t, this post will give you the key takeaways.

我不会在视频中涵盖所有内容，所以如果你有时间，绝对要观看完整视频。但如果没有，这篇文章将为你提供关键要点。

Note: If you are looking for the excalidraw diagram that Andrej made for the video, you can download it here. He shared it through Google Drive and it invalidates the link after a certain time. That's why I have decided to host it on my CDN as well.

注意：如果您在寻找Andrej为视频制作的excalidraw图表，您可以在这里下载。他通过Google Drive分享了它，并且在一段时间后链接会失效。这就是我决定将其托管在我的CDN上的原因。

Pretraining Data

预训练数据

Internet

互联网

equation

LLMs start by crawling the internet to build a massive text dataset. The problem? Raw data is noisy and full of duplicate content, low-quality text, and irrelevant information. Before training, it ne...