Stable Diffusion 是如何工作的？

Stable Diffusion is a latent diffusion model that generates AI images from text. Instead of operating in the high-dimensional image space, it first compresses the image into the latent space.

Stable Diffusion 是一种潜在扩散模型，它根据文本生成 AI 图像。它并不直接在高维图像空间中操作，而是先将图像压缩到潜在空间。

We will dig deep into understanding how it works under the hood.

我们将深入探讨其内部工作原理。

Why do you need to know? Apart from being a fascinating subject in its own right, some understanding of the inner mechanics will make you a better artist. You can use the tool correctly to achieve results with higher precision.

为什么你需要了解？除了它本身就非常有趣之外，对内部机制有所了解会让你成为更优秀的创作者。你可以更精准地使用工具，获得更高质量的结果。

How does text-to-image differ from image-to-image? What’s the CFG scale? What’s denoising strength? You will find the answer in this article.

text-to-image 与 image-to-image 有何不同？CFG scale 是什么？denoising strength 又是什么？你会在本文中找到答案。

Let’s dive in.

让我们深入探讨。

Take out the guesswork. Master AUTOMATIC1111/ComfyUI/Forge quickly step-by-step.

去掉猜测。按步骤快速掌握 AUTOMATIC1111/ComfyUI/Forge。

In the simplest form, Stable Diffusion is a text-to-image model. Give it a text prompt. It will return an AI image matching the text.

在最简单的形式中，Stable Diffusion 是一个 text-to-image model。给它一个 text prompt，它就会返回一张与文本匹配的 AI 图像。

Example of stable diffusion prompt and images.

Stable diffusion turns text prompts into images.

Stable Diffusion 将文本提示词转化为图像。

Stable Diffusion belongs to a class of deep learning models called diffusion models. They are generative models, meaning they are designed to generate new data similar to what they have seen in training. In the case of Stable Diffusion, the data are images.

Stable Diffusion 属于一类被称为 diffusion models 的深度学习模型。它们是生成模型，旨在生成与训练时所见数据相似的新数据。在 Stable Diffusion 的情况下，这些数据就是图像。

Why is it called the diffusion model? Because its math looks very much like diffusion in physics. Let’s go through the idea.

为什么它被称为扩散模型？因为它的数学形式与物理学中的扩散过程非常相似。让我们来梳理一下这个概念。

Let’s say I trained a diffusion model with only two kinds of images: cats and dogs. In the figure below, the two peaks on the left represent the groups of cat and dog images.

假设我只用两类图像训练 diffusion...