GPT-3架构，在一张餐巾纸上

There are so many brilliant posts on GPT-3, demonstrating what it can do, pondering its consequences, vizualizing how it works. With all these out there, it still took a crawl through several papers and blogs before I was confident that I had grasped the architecture.

有很多关于GPT-3的精彩帖子，展示了它能做什么，思考了它的后果，设想了它的工作方式。虽然有这么多，但我还是花了不少时间去阅读一些论文和博客，才确信自己已经掌握了这个架构。

So the goal for this page is humble, but simple: help others build an as detailed as possible understanding of the GPT-3 architecture.

因此，这个页面的目标很卑微，但很简单：帮助其他人建立一个尽可能详细的对GPT-3架构的理解。

Or if you're impatient, jump straight to the full-architecture sketch.

或者如果你没有耐心，可以直接跳到完整的建筑草图。

Original Diagrams

原始图示

As a starting point, the original transformer and GPT papers[1][2][3] provide us with the following diagrams:

作为一个起点，原始的变压器和GPT论文[1][2][3]为我们提供了以下图表。

Not bad as far as diagrams go, but if you're like me, not enough to understand the full picture. So let's dig in!

就图表而言，这并不坏，但如果你像我一样，就不足以了解全貌。因此，让我们深入了解一下!

In / Out

输入/输出

Before we can understand anything else, we need to know: what are the inputs and outputs of GPT?

在我们了解其他事情之前，我们需要知道：GPT的输入和输出是什么？

The input is a sequence of N words (a.k.a tokens). The output is a guess for the word most likely to be put at the end of the input sequence.

输入是一个由N个词组成的序列（又称token）。输出是对最有可能被放在输入序列末尾的词的猜测。

That's it! All the impressive GPT dialogues, stories and examples you see posted around are made with this simple input-output scheme: give it an input sequence – get the next word.

这就是了!所有你看到的令人印象深刻的GPT对话、故事和例子都是用这个简单的输入-输出方案制作的：给它一个输入序列--得到下一个词。

Not all heroes wear -> capes

不是所有的英雄都穿着->斗篷

Of course, we often want to get more than one word, but that's not a problem: after we get the next word, we add it to the sequence, and get the following word.

当然，我们经常想得到一个以上的词，但这不是问题：在得到下一个词后，我们把它加到序列中，然后得到下面的词。

Not all heroes wear capes -> but
Not all heroes wear capes but -> all
Not all heroes wear capes but all -> villans
Not all heroes wear capes but all villans -> do

不是所有的英雄都穿斗篷->但
不是所有的英雄都穿斗篷但->所有
不是所有的英雄都穿斗篷但所有->村民
不是所有的英雄都穿斗篷但所有村民->做

repeat as much as ...