如何设计大规模AI系统

[

Fareed Khan

](https://medium.com/@fareedkhandev?source=post_page---byline--6cf6831990e1---------------------------------------)

Read this story for free: link

免费阅读此故事：链接

It is one thing to train a machine learning model, maybe achieve state-of-the-art accuracy on a benchmark dataset. But deploying that model, making it serve millions of users, process terabytes of data, and operate reliably 24/7 is a very different challenge.

训练一个机器学习模型，也许在基准数据集上达到最先进的准确性是一回事。但部署该模型，使其为数百万用户服务，处理数TB的数据，并可靠地全天候运行则是一个截然不同的挑战。

From the start, every part of training and deploying a machine learning model, each stage requires careful planning and the right tools.

从一开始，训练和部署机器学习模型的每个部分，每个阶段都需要仔细的规划和合适的工具。

Building and running an AI system from early development to full deployment is where …

从早期开发到全面部署，构建和运行一个AI系统是……

Strong software development skills become important, a gap where many AI engineers fall short
强大的软件开发技能变得重要，这是许多AI工程师的短板

In this blog, we will explore each development stage required to build a large-scale AI system capable of creating LLMs, multimodal models, and various other AI products. How each development stage relate to one another, and their individual responsibilities.

在这篇博客中，我们将探讨构建一个能够创建LLM、多模态模型和各种其他AI产品的大规模AI系统所需的每个开发阶段。每个开发阶段之间的关系，以及它们各自的责任。

Special thanks to

特别感谢

from Meta for the guidance provided in his GitHub repo.

感谢Meta在其GitHub仓库中提供的指导。