## ChatGPT是如何工作的：深入探讨

How does a Large Language Model like ChatGPT actually work? Well, they are both amazingly simple and exceedingly complex at the same time. Hold on to your butts, this is a deep dive ↓

You can think of a model as calculating the probabilities of an output based on some input. In language models, this means that given a sequences of words they calculate the probabilities for the next word in the sequence. Like a fancy autocomplete.

To understand where these probabilities come from, we need to talk about something called a neural network. This is a network like structure where numbers are fed into one side and probabilities are spat out the other. They are simpler than you might think.

Imagine we wanted to train a computer to solve the simple problem of recognising symbols on a 3x3 pixel display. We would need a neural net like this: - an input layer - two hidden layers - an output layer

Our input layer consists of 9 nodes called neurons - one for each pixel. Each neuron would hold a number from 1 (white) to -1 (black). Our output layer consists of 4 neurons, one for each of the possible symbols. Their value will eventually be a probability between 0 and 1.

In between these, we have rows of neurons, called "hidden" layers. For our simple use case we only need two. Each neuron is connected to the neurons in the adjacent layers by a weight, which can have a value between -1 and 1.

When a value is passed from the input neuron to the next layer its multiplied by the weight. That neuron then simply adds up all the values it receives, squashes the value...