Large Models in Finance, From Innovation to Real-World Impact
如果无法正常显示,请先停止浏览器的去广告插件。
1. Large Models in Finance:
From Innovation to Real-
World Impact
演讲人:刘炜清
2.
3. 9 years of deep collaboration with financial industry partners.
Intelligent Investment Anti-Money Laundering
Together with our partners, we launch the
exploration of applying AI techniques to
quantitative investment. We have expanded our industrial partnership
into the areas of Regtech, Anomaly Detection,
and Fraud Detection. We have applied AI
techniques to real-world anti-money laundering
scenario and achieved high performance.
2017 2021
4. 9 years of deep collaboration with financial industry partners.
Research topics from real tasks. Research results into real products.
Key Research Challenges
of AI for Finance Anti-Money Laundering
Together with our partners, we launch the
exploration of applying AI techniques to
quantitative investment. We have summarized and addressed several key
research challenges in the field of AI for Finance.
We also keep updating our techniques in
practice with our latest research findings. We have expanded our industrial partnership
into the areas of Regtech, Anomaly Detection,
and Fraud Detection. We have applied AI
techniques to real-world anti-money laundering
scenario and achieved high performance.
2017 2019 2021
Intelligent Investment
2018
Techniques applied
Our partners start to implement our
research and techniques in their real-world
productions with great success and reached
the highest level of performance thus far.
5. 9 years of deep collaboration with financial industry partners.
Research topics from real tasks. Research results into real products.
6. 9 years of deep collaboration with financial industry partners.
Research topics from real tasks. Research results into real products.
Research
Product
7. 9 years of deep collaboration with financial industry partners.
Research topics from real tasks. Research results into real products.
Popular opensource quant investment framework Qlib.
Key Research Challenges
of AI for Finance Anti-Money Laundering
Together with our partners, we launch the
exploration of applying AI techniques to
quantitative investment. We have summarized and addressed several key
research challenges in the field of AI for Finance.
We also keep updating our techniques in
practice with our latest research findings. We have expanded our industrial partnership
into the areas of Regtech, Anomaly Detection,
and Fraud Detection. We have applied AI
techniques to real-world anti-money laundering
scenario and achieved high performance.
2017 2019 2021
Intelligent Investment
2018
2020
Techniques applied Release of Qlib
Our partners start to implement our
research and techniques in their real-world
productions with great success and reached
the highest level of performance thus far. We have open-sourced our research toolset and
platform, Qlib, on GitHub. It supports multiple learning
paradigms and covers the entire process of quantitative
investment. It has received more than 28k stars so far.
8. https://github.com/microsoft/ qlib
Design Goal: Bridging the gap between research and product in quant investment
Quant Research
Automation Powered
by
Qlib is an AI-oriented Quant investment platform that
aims to use AI tech to empower Quant Research, from
exploring ideas to implementing productions. Qlib
supports diverse ML modeling paradigms, including
supervised learning (2020), reinforcement learning
(2022), and meta learning (2024) and is now equipped
with LLM-based
(2025) to automate R&D
process.
9. Financial
Domain
Agent
Advantages
Easily fusing multi-
modality data sources
Concerns and Limitations
1. Not deterministic result
2. Hard to be evaluated
3. Hard to further improve
4. Hard to be regulated
Advantages
Financial
Domain
Foundation
Model
Our Vision
Finance AI 2.0: deterministic
agents with self-evolution in a
market-in-the-loop system.
Our Approach
Code-Oriented
Agentic Automation
Higher intelligence within
financial domain
Concerns and Limitations
Hard to handle non natural
language financial data
Domain-Native
Foundation Models
10. R&D
Automation
Agent
#3
#2 Code-based
Solution
#1 Code-based
Solution
Code-based
Solution
Code-based
Solution
Model
Training
Trained
Model
Domain-Specific Tools and Frameworks
Digital-twin-level Simulation
World-model-level Domain-
Native Foundation Models
❑ Foundation models go domain-native. →
Train on order-level market data (not just
text) to capture microstructure and show
scaling behavior → realistic, controllable
generation ( LMM ).
❑ Evaluation goes simulation-centric → Move
from offline benchmarks to market-in-the-
loop stress tests and what-ifs → policy, risk,
and compliance measured before capital is at
risk ( MarS ).
❑ Agentic R&D goes deterministic → LLM
agents generate code, run backtests, and
auto-iterate with objective metrics →
powerful, repeatable, reviewable decisions
( Qlib + R&D-Agent ).
11. R&D
Automation
Agent
#3
#2 Code-based
Solution
#1 Code-based
Solution
Code-based
Solution
Code-based
Solution
Model
Training
Trained
Model
Domain-Specific Tools and Frameworks
Digital-twin Level Simulation
World-model Level Domain-
Native Foundation Models
❑ Foundation models go domain-native. →
Train on order-level market data (not just
text) to capture microstructure and show
scaling behavior → realistic, controllable
generation ( LMM ).
❑ Evaluation goes simulation-centric → Move
from offline benchmarks to market-in-the-
loop stress tests and what-ifs → policy, risk,
and compliance measured before capital is at
risk ( MarS ).
❑ Agentic R&D goes deterministic → LLM
agents Research
generate code,
run backtests,
and
Quant
Automation
powered
auto-iterate with objective metrics →
by
and
c
powerful, repeatable, reviewable decisions
( Qlib + R&D-Agent ).
12. 01
Quant Research Automation
powered by Qlib and R&D-Agent
刘炜清
微软亚洲研究院
In collaboration with Xu Yang, Xiao Yang, Shikai Fang, Yuge
Zhang, Zehua Wang, Yelong Shen, Weizhu Chen, Jiang Bian
13. ❑ Building main-stream ML models on specially designed task and dataset.
❑ Continuous expert-led refinement through trial, exploration, and adaptation.
Intelligence
Solution
14. Research Agent Development Agent
Ideation & hypothesis crafting Implementation & validation
Generates high-quality
ideas & experiment plans
Produces robust,
production-ready code
15. Research Agent
Feedback
on Quality
Development Agent
Feedback on
Training and
Execution
External Environment
Implemented
Solution
16. ➥
17. ➥
• The gray dashed lines represent the top metrics achieved by previously open-sourced
solutions on Qlib .
• Through 52 rounds of automated evolution over 18 hours , a comprehensively enhanced
intelligent quant investment solution has been developed, outperforming across all four
key metrics.
18. R&D
Automation
Agent
#3
#2 Code-based
Solution
#1 Code-based
Solution
Code-based
Solution
Code-based
Solution
Model
Training
Trained
Model
Domain-Specific Tools and Frameworks
Digital-twin Level Simulation
World-model Level Domain-
Native Foundation Models
❑ Foundation models go domain-native. →
Train on order-level market data (not just
text) to capture microstructure and show
Large
Market controllable
scaling behavior
→ realistic,
Model
) & Its
generation
( LMM ). (
Universal Financial
❑ Evaluation Market
goes simulation-centric
Simulation → Move
from offline
benchmarks
to market-in-the-
Engine
(
)
loop stress tests and what-ifs → policy, risk,
and compliance measured before capital is at
risk ( MarS ).
❑ Agentic R&D goes deterministic → LLM
agents
generate Automation
code, run backtests,
and by
Quant
Research
powered
auto-iterate with objective metrics →
and
c
repeatable, reviewable decisions ( Qlib +
R&D-Agent ).
19. 02
Large Market Model (LMM) & Its
Universal Financial Market Simulation
Engine (MarS)
刘炜清
微软亚洲研究院
In collaboration with Junjie Li, Yang Liu, Chang Xu, Shikai
Fang, Lewen Wang and Jiang Bian
20. ❑ Building ML models on market indicators, but rarely go inside.
Investor /
Regulator
ML Model
from MSRA
Indicators of
the market
Financial Market
21. • Challenge for traditional ML methods
• Opportunity for Large Foundation Model
22. ➢ Individual Perspective in Microscopic
Tokenization of a Single Order
➢ Market Perspective in Macroscopic
Tokenization of an Order-Batch
23. Transformer-based Auto-regressive Training on Two-Kinds of Token Series
Order-level
Historical Market Data
Order Model
Individual Perspective in
Microscopic
Training
Order-batch Model
Large Market Model
Market Perspective in
Macroscopic
24. Tradition
New Paradigm
LMM
25. Traditional Methods
•
Need redesign for new targets.
New Approach
Uses recent real data as input in LMM.
• Generates future order flows and corresponding
market trajectories.
• Derives any indicator’s point / distributional
forecasting across diverse horizons from multiple
future market trajectories.
Enhanced Performance
• Outperforms traditional algorithms in predictions.
• Evidence of LMM’s powerful modelling capabilities.
3-class Classification of Price
Prediction
LMM (1.02B)
DeepLOB (5 different models)
0.7
•
Multiple Rounds
of Generation
0.6
0.5
0.4
1-min
2-min
3-min
4-min
5-min
26. It’s all about speed, speed, speed!
~15x speed-up of response time makes
the paper gains to real-world promise .
*Accuracy drops significantly with fewer than 16 rollouts.
27. Deploy
Train
Vague Description
of Target Scenario
User-Injected Orders
Interact
Control
LMM
MarS
Digital Twin
MarS: Controllable and
Interactable Financial Market
Simulation Powered by LMM
28. ”Shaping the
Future Based on
Realized Realities”
”Blending Next
Round of Future”
”Reflecting Immediate
Impact of User Interaction”
”Electing the Best from Every Possible Future”
29.
30. * Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative finance, 1(2):223, 2001.
31. Injected Orders
• Traditional Methods
• Use real transactions for feedback; costly
& risky.
• MarS Benefits
• Simulates accurate market impact curves.
• Cost-effective alternative to data
collection.
• Enables data-driven research on collected
data.
Simulation with Market Impact of Inject Orders
Simulation Results
Align Well with
Empirical Formula
from Real Trans.
• Evidence that MarS effectively represents a
digital twin of the real market.
• Define a new way for financial experts to
understand their area.
• Overall Advantage
≈ Observed 𝒈 =
𝟗. 𝟖 𝒎 Τ 𝒔 𝟐 in Sora
Order Size
32. One Sentence Takeaway: We are
creating (1) a universal Market
Simulation Engine that serves as an
interaction layer for investors, risk
managers, and regulators, (2) enabling
agents to propose and evolve
strategies, make decisions, and simulate
outcomes through the LMM.
刘炜清:Weiqing.Liu@microsoft.com
33.
34.