从 AI 平台演进获得的十点架构启示
如果无法正常显示,请先停止浏览器的去广告插件。
1. 从 AI 平台演进
获得的十点架构启示
Google Cloud / 王顺 AI/ML专家
2.
3. 王顺
Google Cloud AI/ML专家
2018 年 7 月加入 Google
助力客户AI/ML训练和推理
’
s A I t
ec
hnol
ogi
es t
o wor
k
4.
5. 1. 变与不变:
训练和推理是AI的两大核心任务
6.
7. AI Accelerators
8. 2. 合二为一:
AutoML和定制化训练SDK统一
9.
10.
11. 3. 敏捷开发(CI/CT/CD/CM):
可持续的集成/训练/部署/监控
12. ML
De
v
e
l
o
p
me
n
t
C C o
d
e
&
o
n
f
i
g T
r
a
i
n
i
n
g
O
p
e
r
a
t
i
o
n
a
l
i
z
a
t
i
o
n
T
r
a
i
n
i
n
g
P
i
p
e
l
i
n
e
工作流示意
Da
t
a
& M
o
d
e
l
M
a
n
a
g
e
m
e
n
t
Co
n
t
i
n
u
o
u
s
T
r
a
i
n
i
n
g
R
e
g
i
s
t
e
r
e
d
M
o
d
e
l
M
o
d
e
l
De
p
l
o
y
m
e
n
t
S
e
r
v
i
n
g
P
a
c
k
a
g
e
P
r
e
d
i
c
t
i
o
n
S
e
r
v
i
n
g
S
e
r
v
i
n
g
L
o
g
s
Co
n
t
i
n
u
o
u
s
M
o
n
i
t
o
r
i
n
g
13.
14. 4. 用户驱动:
托管ScaNN满足企业客户需求
15.
16.
17. 5. 海纳百川:
PyTorch和TF框架相同优先级
18.
19. PyTorch on Google Cloud
2018 2020 2021
在DLVM中官方支持 Cloud TPU中支持 Vertex AI官方提供预安装
PyTorch PyTorch/XLA PyTorch的容器选项
20.
21. 6. 出类拔萃:
NAS搜索SOTA网络结构
22. https://paperswithcode.com/sota/image-classification-on-imagenet
23.
24. Image recognition
25.
26.
27. pyglove
Open sourced: https://github.com/google/pyglove
28. 7. 脱颖而出:
Reduction Server提高分布式训练效率
29. Ring All-Reduce
Worke r
(Ring 0ll -
e d uce)
30. GPUs on GCP
31. Parameter Server
● He t
r oge n e ous
● GPU worke r s + SPU s e r ver
● Push grad ien t
● Pull parame t
r
32. Proprietary + Confidential
Revisiting Parameter Server architecture:
i
Each worker only transfers same amount of
input data over network
Reduction Servers:
High-bandwidth low-cost CPU-only VMs for
reduction.
Higher perf/TCO:
Trading extra CPU costs for higher performance
33. a
f
a
/ d d in g 2 0 r e d u c tio n s e r v e r n o d e s in c r e as e
tr ain in g th r o u g h p u t b y 7 5 %
th
e d u c e d c o s t p e r s te
ad d itio n al n o d e s
TensorFlow Model Garden, BERT-large MNLI finetune,
Workers: a2-highgpu-8 (NVIDIA A100) x 8, Reducers: n1-highcpu-16 x 20
https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai
y 4 2
v e
it
34. 8. 八面玲珑:
Twitter、Spotify等客户横跨多行业
35. source link: https://www.youtube.com/watch?v=N9ufw8uP_8s
36.
37. source: https://engineering.atspotify.com/2022/03/introducing-natural-language-search-for-podcast-episodes/
38. 9. 独孤九剑:
覆盖 AI/ML 全生命周期
39. AI Accelerators
40.
41. 跨所有技术水平协作
flexible tools for collaboration across all levels of technical expertise
42. 10. 继往开来:
JAX和Pathways定义下一代框和平台
43. What is JAX
import jax.numpy as np
from jax import jit, grad, vmap
def predict(params, inputs):
for W, b in params:
outputs = np.dot(inputs, W) + b
inputs = np.tanh(outputs)
return outputs
def loss(params, batch):
inputs, targets = batch
preds = predict(params, inputs)
return np.sum((preds - targets) ** 2)
gradient_fun = jit(grad(loss))
perexample_grads = jit(vmap(grad(loss), (None, 0)))
JAX is an extensible system for
composable function transformations
of Python+NumPy code.
44.
45. X
Model Parallelism
Data-
parallel
split
gpu:0
gpu:4
gpu:1
Model
Model
gpu:2
gpu:5
gpu:6
gpu:3
gpu:7
All
Reduce
S patial P artitioning
M odel 7 ecom position
Gradient
update
M o d e l c o d e n e e d s to b e m o d e l- p a ra lle l a
a re
7 iffic u lt to im p le m e n t
Model
P ro tip # 1 : s c a le u p b e fo re o u t ( A 1 0 0
P ro tip # 2 : u s e re d u c e p re c is io n ( : P 1 6 , T : 3 2 , B : 1 6
gpu:0
gpu:1
gpu:2
gpu:3
46. GPipe: Pipeline Parallelism
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism: 1811.06965
t t p s : / / g it h u b . c o m / t e n s o r f lo w / lin g v o / b lo b / m as t e r / lin g v o / c o r e / g p ip e . p
47. GShard/GSPMD
48.
49.
50.
51.