Training, Inference, Agents, beyond Apps in the AI-Native World

1. Training, Inference, Agents: Beyond Apps in the AI-Native World Mark Collier, GM of AI & Infrastructure, Linux Foundation Co-founder of OpenStack & OpenInfra Foundatiom

2.

3. About me • Co-founder, OpenStack & OpenInfra Foundation, raised $150m to create multi billion-dollar market • Now GM, AI & Infrastructure, Linux Foundation • Mission: keep the intelligence layer as open and interoperable as the cloud Mark Collier

4. 15 years of OpenInfra in China 2010 First OpenStack Trip to China Today - 2025 All over China, OpenStack is managing millions of cores of compute for public & private clouds, and Kata Containers secures critical infrastructure like AliPay

5. Linux Foundation + OpenInfra Foundation Ensuring every computing era is open OPENINFRA.ORG/BLUEPRINT

6. THE WORLD RUNS ON OPENINFRA

7. THE WORLD RUNS ON OPENINFRA

8. YOU CAN’T SEPARATE AI FROM INFRASTRUCTURE

9. AI is putting tremendous demands on infrastructure Google’s Pichai told I/O that the company now processes 480 trillion tokens a month – 50× more than a year ago

10. Some of the biggest challenges in AI are infrastructure

11. Let’s talk open source AI

12. Jan 27, 2025 Nvidia drops nearly 17% as China’s cheaper AI model DeepSeek sparks global tech sell-off Jan 27, 2025 DeepSeek’s R1 Launch Shows There Are No Moats Among Large Language Models

13. Open Source is the right side of history needs a new open-source strategy Sam Altman, CEO of OpenAI

14. July 12, 2024 “In the face of disruptive technology, the moat formed by closed-source systems is short-lived.” Liang Wenfeng, DeepSeek

15.

16. Open drives innovation, adoption, access

17. Applications OSS Quickly Pushes Cost Down Fine-Tuned Specialized Models Foundation Models Closed Source / API Proprietary + Public Data Cloud Platform Software Hardware Model Hub AI Safety Tooling / LLMOps Services Value and Innovation Go Up

18. Let’s talk AI-Native Computing

19. AI-Native Computing: a simple definition AI-Native Computing is infrastructure designed for a world where models, not humans , do most of the thinking and interaction 1 Built to: • Run continuously-learning systems, not static code • Handle token streams and agent workflows, not just web requests • Optimize for inference at scale, not just data storage or container orchestration 1 Like AI-Native Computing itself, this definition will require updating, preferably by humans

20. AI-native trend: The software is writing itself

21. 3 pillars of AI-Native Computing 1.Training 2.Inference 3.Agents

22. Feed data in → adjust the algorithm’s parameters → output a model that can make predictions. Methods (full pre-training, fine-tuning, reinforcement learning, etc.) vary and keep evolving.

23. Category Representative Projects Frameworks / libraries PyTorch Infrastructure glue DeepSpeed, Megatron-LM, Ray Pre-trained checkpoints & data Model & export compatibility DeepSeek-V3, Mistral-7B, RedPajama ONNX, Hugging Face Transformers

24. • The stack emerging around Pytorch + ONNX, Transformers, DeepSpeed, Ray • Open checkpoints & datasets enable rapid iteration • Open sourcing weights accelerates downstream innovation • Open neutral governance is winning • 80% of researchers use Pytorch • Transformers is focusing on Pytorch • Neutral foundation governance build ecosystems based on trust

25. Load a trained model → send it real-time requests → return predictions. Runs anywhere—from tiny edge chips to GPU datacenters—often with specialized hardware or software for speed and efficiency.

26. why it matters ● ● ● ● ● Inference is the path to market for AI Inference is largest workload in human history, 50x training Must be more reliable than training for commercial systems We’re heading to self improving computing = new flywheel Efficiency is everything

27. Category Representative Projects Runtimes / graph execution ONNX Runtime, TensorRT-LLM vLLM, abrix, LLM-d Distributed engines Serving frameworks Optimization/Caching KServe, Triton Inference Server LM Cache

28. vLLM: The De Facto Open GenAI Inference Platform Llama GPU Qwen DeepSeek Instinct Physical Gemma Mistral Phi Neuron TPU Virtual Molmo Private Cloud Public Cloud Nemotron Gaudi Edge Single platform to run any model, on any accelerator, on any cloud Source Brian Stevens, Former CEO of Neural Magic & Current CTO of AI At Red Hat. Thanks Brian!: Granite CPU

29. Emerging kernel?: llm-d, kubernetes distributed inference at scale Why? Distributed architecture needed for maximum efficiency and meeting varying SLOs Core Features: ● Prefill/decode disaggregation ● KV Cache distribution, offloading and storage hierarchy ● AI-aware router ● Operational telemetry for production ● Kubernetes-based ● NIXL inference transfer library Source Brian Stevens, Former CEO of Neural Magic & Current CTO of AI At Red Hat. Thanks Brian!:

30. • Kernel → inference runtime (ONNX Runtime, vLLM, LLM-d) schedules tokens like an OS schedules threads. • Syscalls → agent protocols (MCP, A2A, Agent Protocol) standardize calls like 'function-call', 'spawn-agent', 'share-memory'. • User-land apps → agentic workflows (LangChain, CrewAI, Autogen) akin to bash, cron, libc. • Peripherals → tools / skills (retrieval plugins, code interpreters, databases) are today’s I/O drivers. • — Inspired by Andrej Karpathy, Software 3.0 (2025)

31. Key Takeways ● The stack is changing rapidly with some key components emerging: ● ONNX Runtime provides portable graph execution ● vLLM offer distributed engines for massive QPS ● KServe standardizes serving on Kubernetes/OpenStack ● Triton inference server ● Multi-accelerator scheduling abstractions critical ● LLM-d is a new entrant and could be “the kernel for ai” if it succeeds ● Efficiency is king and there are many breakthroughs to come here ● This stuff is too hard to assemble today: we need a community

32. : a simple definition Wrap a model in code that can call tools, talk to other agents or humans, and turn inference results into actions via open protocols (e g . MCP)—so any system can programmatically use the model’s output.

33. • Automate repetitive workflows (code, docs, ops) • New latency & reliability constraints on infra • Standard protocols avoid vendor lock-in

34. “AI Agents flip this on its head. The new paradigm of software will soon be about AI Agents doing work for you, with humans used to plan, review, and orchestration.” — Aaron Levie “Context engineering is the art and science of filling the window with just the right information. Do it wrong, and your agent gets confused or expensive. Do it right, and it feels like magic.” — Andrej Karpathy • Unbounded throughput: agent pools scale horizontally, not keystroke-by-keystroke. • Design shift: product UX optimises review and orchestration, not typing speed. • Infrastructure implication: reliable, low-latency agent calls become the new SLA.

35. “(SLMS) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems, and are therefore the future of agentic AI” – researchers from NVIDIA & Georgia Tech Source https://arxiv.org/abs/2506.02153

36. Agent Experience/UX (Conversational, Action Interfaces) OSS Quickly Pushes Cost Down Agent Orchestrators (Azure AI Foundry, Google Vertex, Amazon Bedrock Agents, LangChain, CrewAI) Protocols & Standards (MCP, A2A) Registries (Secure discovery and orchestration) Skills/Tools Plugins (API Mgmt to Enterprise Resource_ Value and Innovation Go Up

37. Category Representative Projects Orchestration protocols MCP, A2A Agent frameworks LangChain, CrewAI, IDEA ChatDev OpenAI Function Calling, BentoML Tool Hub Tool registries / APIs

38. Experience / UX: Chat UI, Voice, CLI Orchestrator: LangChain, CrewAI, Autogen Protocols: MCP, A2A, Agent Protocol, OpenAI Func-Call Skills / Tool Plugins: OpenAPI, Bash, Retrieval, Enterprise Data Memory / Vector Store: Chroma, Weaviate, pgvector, Redis-Vector Registries & Ops • Tool Registry • Observability • Policy / Guardrails

39. : Breaking News Google Cloud moves A2A to Linux Foundation “Linux Foundation announced the formation of the Agent2Agent project with Amazon Web Services, Cisco, Google, Microsoft, Salesforce, SAP, and ServiceNow. With the formation of this new, independent entity, the companies will collaborate closely on fostering an open and interoperable ecosystem for AI agents with the Agent2Agent (A2A) protocol and other interoperability technology.” - Google Cloud June 24, 2025 Source https://developers.googleblog.com/en/google-cloud-donates-a2a-to-linux-foundation/

40. : Key Takeways ● It’s early days for agents and no one really knows what will happen ● The potential is unlimited if we create the right protocols and stacks ● We know standards matter. CNCF didn’t succeed due to Kubernetes alone, it also defined standards like OCI. In the agent world protocols / standards e.g. MCP and A2A will be essential ● A2A has taken an important step by embracing neutral governance at the Linux Foundation ● Context engineering” is the emerging skill behind effective agents— much more than just prompting.

41. Let’s talk Community

42. Opening Your Code or Model Weights Is Not Enough to Grow an Ecosystem ● Project Governance & Standards ○ ○ ○ ○ Governance and neutral ownership are essential for collaboration Interoperability and standards amplify ecosystem value Release cadence and support build enterprise trust Maturity models (sandbox → graduation) signal long-term viability ● Community & Adoption ○ ○ ○ ○ ● Contributor experience matters Community-driven marketing drives organic adoption Ecosystem health is measurable and must be tracked Open data and evaluation are key to building trust in AI Use the “4 Opens Way” to build the strongest community ○ Open Source, Design, Open Development, Open Community Learn more: https://openinfra.org/four-opens/

43. Open Platforms Require Collective Investment ● Over half a billion dollars was invested in creating an open cloud era with OpenStack & Kubernetes ● What investment will be required to ensure the AI-Native era is open? Rough estimates, directionally correct

44. •2010: VMs + OpenStack •2015: Containers + Kubernetes •2025: Tokens + AI-Native stack

45. Summary ● Training, Inference, Agents are the 3 pillars of AI-Native Computing ● There is a lot of open source out there, but we need a center of gravity ● If we coordinate, it will be powerful, and the future will be open! ● Community is that center of gravity

46. Let’s build the future of AI in the open! Let’s build the AI-Native Computing Stack, and ensure that the AI-Native era is truly open, innovative, and thriving Join me in Amsterdam August 28-29, 2025 for AI_DEV Want to connect? Links at markcollier.me

47.

48.