从 AI 平台演进获得的十点架构启示

1. 从 AI 平台演进获得的十点架构启示 Google Cloud / 王顺 AI/ML专家

2.

3. 王顺 Google Cloud AI/ML专家 2018 年 7 月加入 Google 助力客户AI/ML训练和推理 ’ s A I t ec hnol ogi es t o wor k

4.

5. 1. 变与不变：训练和推理是AI的两大核心任务

6.

7. AI Accelerators

8. 2. 合二为一： AutoML和定制化训练SDK统一

9.

10.

11. 3. 敏捷开发(CI/CT/CD/CM)：可持续的集成/训练/部署/监控

12. ML De v e l o p me n t C C o d e & o n f i g T r a i n i n g O p e r a t i o n a l i z a t i o n T r a i n i n g P i p e l i n e 工作流示意 Da t a & M o d e l M a n a g e m e n t Co n t i n u o u s T r a i n i n g R e g i s t e r e d M o d e l M o d e l De p l o y m e n t S e r v i n g P a c k a g e P r e d i c t i o n S e r v i n g S e r v i n g L o g s Co n t i n u o u s M o n i t o r i n g

13.

14. 4. 用户驱动：托管ScaNN满足企业客户需求

15.

16.

17. 5. 海纳百川： PyTorch和TF框架相同优先级

18.

19. PyTorch on Google Cloud 2018 2020 2021 在DLVM中官方支持 Cloud TPU中支持 Vertex AI官方提供预安装 PyTorch PyTorch/XLA PyTorch的容器选项

20.

21. 6. 出类拔萃： NAS搜索SOTA网络结构

22. https://paperswithcode.com/sota/image-classification-on-imagenet

23.

24. Image recognition

25.

26.

27. pyglove Open sourced: https://github.com/google/pyglove

28. 7. 脱颖而出： Reduction Server提高分布式训练效率

29. Ring All-Reduce Worke r (Ring 0ll - e d uce)

30. GPUs on GCP

31. Parameter Server ● He t r oge n e ous ● GPU worke r s + SPU s e r ver ● Push grad ien t ● Pull parame t r

32. Proprietary + Confidential Revisiting Parameter Server architecture: i Each worker only transfers same amount of input data over network Reduction Servers: High-bandwidth low-cost CPU-only VMs for reduction. Higher perf/TCO: Trading extra CPU costs for higher performance

33. a f a / d d in g 2 0 r e d u c tio n s e r v e r n o d e s in c r e as e tr ain in g th r o u g h p u t b y 7 5 % th e d u c e d c o s t p e r s te ad d itio n al n o d e s TensorFlow Model Garden, BERT-large MNLI finetune, Workers: a2-highgpu-8 (NVIDIA A100) x 8, Reducers: n1-highcpu-16 x 20 https://cloud.google.com/blog/topics/developers-practitioners/optimize-training-performance-reduction-server-vertex-ai y 4 2 v e it

34. 8. 八面玲珑： Twitter、Spotify等客户横跨多行业

35. source link: https://www.youtube.com/watch?v=N9ufw8uP_8s

36.

37. source: https://engineering.atspotify.com/2022/03/introducing-natural-language-search-for-podcast-episodes/

38. 9. 独孤九剑：覆盖 AI/ML 全生命周期

39. AI Accelerators

40.

41. 跨所有技术水平协作 flexible tools for collaboration across all levels of technical expertise

42. 10. 继往开来： JAX和Pathways定义下一代框和平台

43. What is JAX import jax.numpy as np from jax import jit, grad, vmap def predict(params, inputs): for W, b in params: outputs = np.dot(inputs, W) + b inputs = np.tanh(outputs) return outputs def loss(params, batch): inputs, targets = batch preds = predict(params, inputs) return np.sum((preds - targets) ** 2) gradient_fun = jit(grad(loss)) perexample_grads = jit(vmap(grad(loss), (None, 0))) JAX is an extensible system for composable function transformations of Python+NumPy code.

44.

45. X Model Parallelism Data- parallel split gpu:0 gpu:4 gpu:1 Model Model gpu:2 gpu:5 gpu:6 gpu:3 gpu:7 All Reduce S patial P artitioning M odel 7 ecom position Gradient update M o d e l c o d e n e e d s to b e m o d e l- p a ra lle l a a re 7 iffic u lt to im p le m e n t Model P ro tip # 1 : s c a le u p b e fo re o u t ( A 1 0 0 P ro tip # 2 : u s e re d u c e p re c is io n ( : P 1 6 , T : 3 2 , B : 1 6 gpu:0 gpu:1 gpu:2 gpu:3

46. GPipe: Pipeline Parallelism GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism: 1811.06965 t t p s : / / g it h u b . c o m / t e n s o r f lo w / lin g v o / b lo b / m as t e r / lin g v o / c o r e / g p ip e . p

47. GShard/GSPMD

48.

49.

50.

51.