随机 Transformer

Understand how transformers work by demystifying all the math behind them

通过揭示背后的所有数学,了解变压器的工作原理

In this blog post, we’ll do an end-to-end example of the math within a transformer model. The goal is to get a good understanding of how the model works. To make this manageable, we’ll do lots of simplification. As we’ll be doing quite a bit of the math by hand, we’ll reduce the dimensions of the model. For example, rather than using embeddings of 512 values, we’ll use embeddings of 4 values. This will make the math easier to follow! We’ll use random vectors and matrices, but you can use your own values if you want to follow along.

在这篇博文中,我们将通过一个端到端的示例来解释Transformer模型中的数学原理。目标是对模型的工作原理有一个很好的理解。为了使其可行,我们将进行大量的简化。由于我们将手动进行大量的数学计算,我们将减少模型的维度。例如,我们将使用4个值的嵌入,而不是512个值的嵌入。这将使数学计算更容易理解!我们将使用随机向量和矩阵,但如果您想跟随操作,您也可以使用自己的值。

As you’ll see, the math is not that complicated. The complexity comes from the number of steps and the number of parameters. I recommend you to read the The Illustrated Transformer blog before reading this blog post (or reading in parallel). It’s a great blog post that explains the transformer model in a very intuitive (and illustrative!) way and I don’t intend to explain what it’s already explained there. My goal is to explain the “how” of the transformer model, not the “what”. If you want to dive even deeper, check out the famous original paper: Attention is all you need.

正如你将看到的,数学并不那么复杂。复杂性来自于步骤的数量和参数的数量。 我建议你在阅读本博客文章之前先阅读 The Illustrated Transformer 博客文章(或者同时阅读)。这是一篇非常直观(和有图解!)地解释了 Transformer 模型的博客文章,我不打算再解释已经在那里解释过的内容。我的目标是解释 Transformer 模型的“如何”,而不是“什么”。如果你想深入了解,可以查看著名的原始论文:Attention is all you need

Prerequisites

先决条件

A basic understanding of linear algebra is required - we’ll mostly do simple matrix multiplications, so no need to be an expert. Apart from that, basic understanding of Machine Learning and Deep Learning will be useful.

需要基本的线性代数理解 - 我们主要进行简单的矩阵乘法,所以不需要成为专家。除此之外,对机器学习和深度学习的基本理解将会有帮助。

What is covered here?

这里涵盖了什么内容?

  • An end-to-end example of the math within a transformer model during inferen...
开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-23 05:32
浙ICP备14020137号-1 $访客地图$