Training, Inference, Agents, beyond Apps in the AI-Native World
如果无法正常显示,请先停止浏览器的去广告插件。
1. Training, Inference, Agents:
Beyond Apps in the AI-Native World
Mark Collier, GM of AI & Infrastructure, Linux Foundation
Co-founder of OpenStack & OpenInfra Foundatiom
2.
3. About me
• Co-founder, OpenStack & OpenInfra Foundation,
raised $150m to create multi billion-dollar market
• Now GM, AI & Infrastructure, Linux Foundation
• Mission: keep the intelligence layer as open and
interoperable as the cloud
Mark Collier
4. 15 years of OpenInfra in China
2010
First OpenStack Trip to China
Today - 2025
All over China, OpenStack is managing millions of
cores of compute for public & private clouds, and Kata
Containers secures critical infrastructure like AliPay
5. Linux Foundation + OpenInfra Foundation
Ensuring every computing era is open
OPENINFRA.ORG/BLUEPRINT
6. THE WORLD RUNS ON OPENINFRA
7. THE WORLD RUNS ON OPENINFRA
8. YOU CAN’T SEPARATE
AI FROM
INFRASTRUCTURE
9. AI is putting tremendous demands on infrastructure
Google’s Pichai told I/O
that the company now
processes 480 trillion
tokens a month – 50×
more than a year ago
10. Some of the biggest challenges in AI are infrastructure
11. Let’s talk open source AI
12. Jan 27, 2025
Nvidia drops nearly
17% as China’s
cheaper AI model
DeepSeek sparks
global tech sell-off
Jan 27, 2025
DeepSeek’s R1
Launch Shows
There Are No
Moats Among
Large Language
Models
13. Open Source is the right side of history
needs a
new open-source
strategy
Sam Altman, CEO of OpenAI
14. July 12, 2024
“In the face of disruptive
technology, the moat
formed by closed-source
systems is short-lived.”
Liang Wenfeng, DeepSeek
15.
16. Open drives innovation, adoption, access
17. Applications
OSS Quickly
Pushes
Cost Down
Fine-Tuned Specialized Models
Foundation
Models
Closed Source /
API
Proprietary + Public Data
Cloud Platform
Software
Hardware
Model Hub
AI Safety
Tooling / LLMOps
Services
Value and
Innovation
Go Up
18. Let’s talk AI-Native Computing
19. AI-Native Computing: a simple definition
AI-Native Computing is infrastructure designed
for a world where models, not humans , do most
of the thinking and interaction 1
Built to:
• Run continuously-learning systems, not static code
• Handle token streams and agent workflows, not just web requests
• Optimize for inference at scale, not just data storage or container orchestration
1 Like
AI-Native Computing itself, this definition will require updating, preferably by humans
20. AI-native trend: The software is writing itself
21. 3 pillars of AI-Native Computing
1.Training
2.Inference
3.Agents
22. Feed data in → adjust the algorithm’s parameters → output a
model that can make predictions.
Methods (full pre-training, fine-tuning, reinforcement learning,
etc.) vary and keep evolving.
23. Category Representative Projects
Frameworks / libraries PyTorch
Infrastructure glue DeepSpeed, Megatron-LM, Ray
Pre-trained checkpoints & data
Model & export compatibility
DeepSeek-V3, Mistral-7B,
RedPajama
ONNX, Hugging Face
Transformers
24. • The stack emerging around Pytorch +
ONNX, Transformers, DeepSpeed, Ray
• Open checkpoints & datasets enable
rapid iteration
• Open sourcing weights accelerates
downstream innovation
• Open neutral governance is winning
• 80% of researchers use Pytorch
• Transformers is focusing on Pytorch
• Neutral foundation governance build
ecosystems based on trust
25. Load a trained model → send it real-time requests →
return predictions.
Runs anywhere—from tiny edge chips to GPU
datacenters—often with specialized hardware or software
for speed and efficiency.
26. why it matters
●
●
●
●
●
Inference is the path to market for AI
Inference is largest workload in human history, 50x training
Must be more reliable than training for commercial systems
We’re heading to self improving computing = new flywheel
Efficiency is everything
27. Category Representative Projects
Runtimes / graph execution ONNX Runtime,
TensorRT-LLM
vLLM, abrix, LLM-d
Distributed engines
Serving frameworks
Optimization/Caching
KServe, Triton Inference
Server
LM Cache
28. vLLM: The De Facto Open GenAI Inference Platform
Llama
GPU
Qwen
DeepSeek
Instinct
Physical
Gemma
Mistral
Phi
Neuron
TPU
Virtual
Molmo
Private
Cloud
Public
Cloud
Nemotron
Gaudi
Edge
Single platform to run any model, on any accelerator, on any cloud
Source Brian Stevens, Former CEO of Neural Magic & Current CTO of AI At Red Hat. Thanks Brian!:
Granite
CPU
29. Emerging kernel?: llm-d, kubernetes distributed inference at scale
Why? Distributed architecture needed for maximum
efficiency and meeting varying SLOs
Core Features:
● Prefill/decode disaggregation
● KV Cache distribution, offloading and storage hierarchy
● AI-aware router
● Operational telemetry for production
● Kubernetes-based
● NIXL inference transfer library
Source Brian Stevens, Former CEO of Neural Magic & Current CTO of AI At Red Hat. Thanks Brian!:
30. • Kernel → inference runtime (ONNX Runtime, vLLM, LLM-d)
schedules tokens like an OS schedules threads.
• Syscalls → agent protocols (MCP, A2A, Agent Protocol) standardize
calls like 'function-call', 'spawn-agent', 'share-memory'.
• User-land apps → agentic workflows (LangChain, CrewAI, Autogen)
akin to bash, cron, libc.
• Peripherals → tools / skills (retrieval plugins, code interpreters,
databases) are today’s I/O drivers.
•
— Inspired by Andrej Karpathy, Software 3.0 (2025)
31. Key Takeways
● The stack is changing rapidly with some key components emerging:
● ONNX Runtime provides portable graph execution
● vLLM offer distributed engines for massive QPS
● KServe standardizes serving on Kubernetes/OpenStack
● Triton inference server
● Multi-accelerator scheduling abstractions critical
● LLM-d is a new entrant and could be “the kernel for ai” if it succeeds
● Efficiency is king and there are many breakthroughs to come here
● This stuff is too hard to assemble today: we need a community
32. : a simple definition
Wrap a model in code that can call tools, talk to other
agents or humans, and turn inference results into actions
via open protocols (e g . MCP)—so any system can
programmatically use the model’s output.
33. • Automate repetitive workflows (code, docs, ops)
• New latency & reliability constraints on infra
• Standard protocols avoid vendor lock-in
34. “AI Agents flip this on its head. The new
paradigm of software will soon be about AI
Agents doing work for you, with humans used
to plan, review, and orchestration.” — Aaron
Levie
“Context engineering is the art and science of
filling the window with just the right
information. Do it wrong, and your agent gets
confused or expensive. Do it right, and it feels
like magic.”
— Andrej Karpathy
• Unbounded throughput: agent pools scale horizontally, not keystroke-by-keystroke.
• Design shift: product UX optimises review and orchestration, not typing speed.
• Infrastructure implication: reliable, low-latency agent calls become the new SLA.
35. “(SLMS) are sufficiently powerful, inherently more suitable,
and necessarily more economical for many invocations in
agentic systems, and are therefore the future of agentic
AI” – researchers from NVIDIA & Georgia Tech
Source https://arxiv.org/abs/2506.02153
36. Agent Experience/UX
(Conversational, Action Interfaces)
OSS Quickly
Pushes
Cost Down
Agent Orchestrators
(Azure AI Foundry, Google Vertex,
Amazon Bedrock Agents, LangChain, CrewAI)
Protocols &
Standards
(MCP, A2A)
Registries
(Secure discovery
and orchestration)
Skills/Tools Plugins
(API Mgmt to Enterprise Resource_
Value and
Innovation
Go Up
37. Category Representative Projects
Orchestration protocols MCP, A2A
Agent frameworks LangChain, CrewAI, IDEA
ChatDev
OpenAI Function Calling,
BentoML Tool Hub
Tool registries / APIs
38. Experience / UX: Chat UI, Voice, CLI
Orchestrator: LangChain, CrewAI, Autogen
Protocols: MCP, A2A, Agent Protocol, OpenAI Func-Call
Skills / Tool Plugins: OpenAPI, Bash, Retrieval, Enterprise
Data
Memory / Vector Store: Chroma, Weaviate, pgvector,
Redis-Vector
Registries & Ops
• Tool Registry
• Observability
• Policy / Guardrails
39. : Breaking News
Google Cloud moves A2A to Linux Foundation
“Linux Foundation announced the formation of the Agent2Agent
project with Amazon Web Services, Cisco, Google,
Microsoft, Salesforce, SAP, and ServiceNow. With the formation
of this new, independent entity, the companies will collaborate
closely on fostering an open and interoperable ecosystem for AI
agents with the Agent2Agent (A2A) protocol and other
interoperability technology.” - Google Cloud June 24, 2025
Source https://developers.googleblog.com/en/google-cloud-donates-a2a-to-linux-foundation/
40. : Key Takeways
● It’s early days for agents and no one really knows what will happen
● The potential is unlimited if we create the right protocols and stacks
● We know standards matter. CNCF didn’t succeed due to Kubernetes
alone, it also defined standards like OCI. In the agent world
protocols / standards e.g. MCP and A2A will be essential
● A2A has taken an important step by embracing neutral governance
at the Linux Foundation
● Context engineering” is the emerging skill behind effective agents—
much more than just prompting.
41. Let’s talk Community
42. Opening Your Code or Model Weights Is
Not Enough to Grow an Ecosystem
● Project Governance & Standards
○
○
○
○
Governance and neutral ownership are essential for collaboration
Interoperability and standards amplify ecosystem value
Release cadence and support build enterprise trust
Maturity models (sandbox → graduation) signal long-term viability
● Community & Adoption
○
○
○
○
●
Contributor experience matters
Community-driven marketing drives organic adoption
Ecosystem health is measurable and must be tracked
Open data and evaluation are key to building trust in AI
Use the “4 Opens Way” to build the strongest community
○
Open Source, Design, Open Development, Open Community
Learn more: https://openinfra.org/four-opens/
43. Open Platforms Require Collective Investment
● Over half a billion dollars was
invested in creating an open cloud
era with OpenStack & Kubernetes
● What investment will be required
to ensure the AI-Native era is open?
Rough estimates, directionally correct
44. •2010: VMs + OpenStack
•2015: Containers + Kubernetes
•2025: Tokens + AI-Native stack
45. Summary
● Training, Inference, Agents are the 3 pillars of AI-Native Computing
● There is a lot of open source out there, but we need a center of gravity
● If we coordinate, it will be powerful, and the future will be open!
● Community is that center of gravity
46. Let’s build the future of AI in the open!
Let’s build the AI-Native Computing Stack,
and ensure that the AI-Native era is truly
open, innovative, and thriving
Join me in Amsterdam
August 28-29, 2025
for AI_DEV
Want to connect? Links at markcollier.me
47.
48.