ChatGPT是如何工作的：深入探讨

How does a Large Language Model like ChatGPT actually work? Well, they are both amazingly simple and exceedingly complex at the same time. Hold on to your butts, this is a deep dive ↓

像ChatGPT这样的大型语言模型究竟是如何工作的？嗯，它们既是惊人的简单，同时也是极其复杂的。抓紧你们的屁股，这是一次深入的研究↓。

You can think of a model as calculating the probabilities of an output based on some input. In language models, this means that given a sequences of words they calculate the probabilities for the next word in the sequence. Like a fancy autocomplete.

你可以认为一个模型是根据一些输入计算输出的概率。在语言模型中，这意味着给定一个单词序列，它们会计算出该序列中下一个单词的概率。就像一个花哨的自动完成程序。

To understand where these probabilities come from, we need to talk about something called a neural network. This is a network like structure where numbers are fed into one side and probabilities are spat out the other. They are simpler than you might think.

为了理解这些概率的来源，我们需要谈论一个叫做神经网络的东西。这是一个类似于网络的结构，数字被送入一侧，概率从另一侧吐出。它们比你想象的要简单。

Imagine we wanted to train a computer to solve the simple problem of recognising symbols on a 3x3 pixel display. We would need a neural net like this: - an input layer - two hidden layers - an output layer

想象一下，我们想训练一台计算机来解决识别3x3像素显示器上的符号这一简单问题。我们将需要一个这样的神经网络：- 一个输入层 - 两个隐藏层 - 一个输出层

Our input layer consists of 9 nodes called neurons - one for each pixel. Each neuron would hold a number from 1 (white) to -1 (black). Our output layer consists of 4 neurons, one for each of the possible symbols. Their value will eventually be a probability between 0 and 1.

我们的输入层由9个称为神经元的节点组成--每个像素一个。每个神经元将持有一个从1（白色）到-1（黑色）的数字。我们的输出层由4个神经元组成，每个神经元代表一个可能的符号。它们的值最终将是0和1之间的概率。

In between these, we have rows of neurons, called "hidden" layers. For our simple use case we only need two. Each neuron is connected to the neurons in the adjacent layers by a weight, which can have a value between -1 and 1.

在这两者之间，我们有一排排的神经元，称为 "隐藏 "层。对于我们的简单用例，我们只需要两个。每个神经元都通过一个权重与相邻层的神经元相连，权重的值在-1和1之间。

When a value is passed from the input neuron to the next layer its multiplied by the weight. That neuron then simply adds up all the values it receives, squashes the value...