Large Models in Finance, From Innovation to Real-World Impact

如果无法正常显示，请先停止浏览器的去广告插件。

1. Large Models in Finance: From Innovation to Real- World Impact 演讲人：刘炜清

3. 9 years of deep collaboration with financial industry partners. Intelligent Investment Anti-Money Laundering Together with our partners, we launch the exploration of applying AI techniques to quantitative investment. We have expanded our industrial partnership into the areas of Regtech, Anomaly Detection, and Fraud Detection. We have applied AI techniques to real-world anti-money laundering scenario and achieved high performance. 2017 2021

4. 9 years of deep collaboration with financial industry partners. Research topics from real tasks. Research results into real products. Key Research Challenges of AI for Finance Anti-Money Laundering Together with our partners, we launch the exploration of applying AI techniques to quantitative investment. We have summarized and addressed several key research challenges in the field of AI for Finance. We also keep updating our techniques in practice with our latest research findings. We have expanded our industrial partnership into the areas of Regtech, Anomaly Detection, and Fraud Detection. We have applied AI techniques to real-world anti-money laundering scenario and achieved high performance. 2017 2019 2021 Intelligent Investment 2018 Techniques applied Our partners start to implement our research and techniques in their real-world productions with great success and reached the highest level of performance thus far.

5. 9 years of deep collaboration with financial industry partners. Research topics from real tasks. Research results into real products.

6. 9 years of deep collaboration with financial industry partners. Research topics from real tasks. Research results into real products. Research Product

7. 9 years of deep collaboration with financial industry partners. Research topics from real tasks. Research results into real products. Popular opensource quant investment framework Qlib. Key Research Challenges of AI for Finance Anti-Money Laundering Together with our partners, we launch the exploration of applying AI techniques to quantitative investment. We have summarized and addressed several key research challenges in the field of AI for Finance. We also keep updating our techniques in practice with our latest research findings. We have expanded our industrial partnership into the areas of Regtech, Anomaly Detection, and Fraud Detection. We have applied AI techniques to real-world anti-money laundering scenario and achieved high performance. 2017 2019 2021 Intelligent Investment 2018 2020 Techniques applied Release of Qlib Our partners start to implement our research and techniques in their real-world productions with great success and reached the highest level of performance thus far. We have open-sourced our research toolset and platform, Qlib, on GitHub. It supports multiple learning paradigms and covers the entire process of quantitative investment. It has received more than 28k stars so far.

8. https://github.com/microsoft/ qlib Design Goal: Bridging the gap between research and product in quant investment Quant Research Automation Powered by Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, including supervised learning (2020), reinforcement learning (2022), and meta learning (2024) and is now equipped with LLM-based (2025) to automate R&D process.

9. Financial Domain Agent Advantages Easily fusing multi- modality data sources Concerns and Limitations 1. Not deterministic result 2. Hard to be evaluated 3. Hard to further improve 4. Hard to be regulated Advantages Financial Domain Foundation Model Our Vision Finance AI 2.0: deterministic agents with self-evolution in a market-in-the-loop system. Our Approach Code-Oriented Agentic Automation Higher intelligence within financial domain Concerns and Limitations Hard to handle non natural language financial data Domain-Native Foundation Models

10. R&D Automation Agent #3 #2 Code-based Solution #1 Code-based Solution Code-based Solution Code-based Solution Model Training Trained Model Domain-Specific Tools and Frameworks Digital-twin-level Simulation World-model-level Domain- Native Foundation Models ❑ Foundation models go domain-native. → Train on order-level market data (not just text) to capture microstructure and show scaling behavior → realistic, controllable generation ( LMM ). ❑ Evaluation goes simulation-centric → Move from offline benchmarks to market-in-the- loop stress tests and what-ifs → policy, risk, and compliance measured before capital is at risk ( MarS ). ❑ Agentic R&D goes deterministic → LLM agents generate code, run backtests, and auto-iterate with objective metrics → powerful, repeatable, reviewable decisions ( Qlib + R&D-Agent ).

11. R&D Automation Agent #3 #2 Code-based Solution #1 Code-based Solution Code-based Solution Code-based Solution Model Training Trained Model Domain-Specific Tools and Frameworks Digital-twin Level Simulation World-model Level Domain- Native Foundation Models ❑ Foundation models go domain-native. → Train on order-level market data (not just text) to capture microstructure and show scaling behavior → realistic, controllable generation ( LMM ). ❑ Evaluation goes simulation-centric → Move from offline benchmarks to market-in-the- loop stress tests and what-ifs → policy, risk, and compliance measured before capital is at risk ( MarS ). ❑ Agentic R&D goes deterministic → LLM agents Research generate code, run backtests, and Quant Automation powered auto-iterate with objective metrics → by and c powerful, repeatable, reviewable decisions ( Qlib + R&D-Agent ).

12. 01 Quant Research Automation powered by Qlib and R&D-Agent 刘炜清微软亚洲研究院 In collaboration with Xu Yang, Xiao Yang, Shikai Fang, Yuge Zhang, Zehua Wang, Yelong Shen, Weizhu Chen, Jiang Bian

13. ❑ Building main-stream ML models on specially designed task and dataset. ❑ Continuous expert-led refinement through trial, exploration, and adaptation. Intelligence Solution

14. Research Agent Development Agent Ideation & hypothesis crafting Implementation & validation Generates high-quality ideas & experiment plans Produces robust, production-ready code

15. Research Agent Feedback on Quality Development Agent Feedback on Training and Execution External Environment Implemented Solution

16. ➥

17. ➥ • The gray dashed lines represent the top metrics achieved by previously open-sourced solutions on Qlib . • Through 52 rounds of automated evolution over 18 hours , a comprehensively enhanced intelligent quant investment solution has been developed, outperforming across all four key metrics.

18. R&D Automation Agent #3 #2 Code-based Solution #1 Code-based Solution Code-based Solution Code-based Solution Model Training Trained Model Domain-Specific Tools and Frameworks Digital-twin Level Simulation World-model Level Domain- Native Foundation Models ❑ Foundation models go domain-native. → Train on order-level market data (not just text) to capture microstructure and show Large Market controllable scaling behavior → realistic, Model ) & Its generation ( LMM ). ( Universal Financial ❑ Evaluation Market goes simulation-centric Simulation → Move from offline benchmarks to market-in-the- Engine ( ) loop stress tests and what-ifs → policy, risk, and compliance measured before capital is at risk ( MarS ). ❑ Agentic R&D goes deterministic → LLM agents generate Automation code, run backtests, and by Quant Research powered auto-iterate with objective metrics → and c repeatable, reviewable decisions ( Qlib + R&D-Agent ).

19. 02 Large Market Model (LMM) & Its Universal Financial Market Simulation Engine (MarS) 刘炜清微软亚洲研究院 In collaboration with Junjie Li, Yang Liu, Chang Xu, Shikai Fang, Lewen Wang and Jiang Bian

20. ❑ Building ML models on market indicators, but rarely go inside. Investor / Regulator ML Model from MSRA Indicators of the market Financial Market

21. • Challenge for traditional ML methods • Opportunity for Large Foundation Model

22. ➢ Individual Perspective in Microscopic Tokenization of a Single Order ➢ Market Perspective in Macroscopic Tokenization of an Order-Batch

23. Transformer-based Auto-regressive Training on Two-Kinds of Token Series Order-level Historical Market Data Order Model Individual Perspective in Microscopic Training Order-batch Model Large Market Model Market Perspective in Macroscopic

24. Tradition New Paradigm LMM

25. Traditional Methods • Need redesign for new targets. New Approach Uses recent real data as input in LMM. • Generates future order flows and corresponding market trajectories. • Derives any indicator’s point / distributional forecasting across diverse horizons from multiple future market trajectories. Enhanced Performance • Outperforms traditional algorithms in predictions. • Evidence of LMM’s powerful modelling capabilities. 3-class Classification of Price Prediction LMM (1.02B) DeepLOB (5 different models) 0.7 • Multiple Rounds of Generation 0.6 0.5 0.4 1-min 2-min 3-min 4-min 5-min

26. It’s all about speed, speed, speed! ~15x speed-up of response time makes the paper gains to real-world promise . *Accuracy drops significantly with fewer than 16 rollouts.

27. Deploy Train Vague Description of Target Scenario User-Injected Orders Interact Control LMM MarS Digital Twin MarS: Controllable and Interactable Financial Market Simulation Powered by LMM

28. ”Shaping the Future Based on Realized Realities” ”Blending Next Round of Future” ”Reflecting Immediate Impact of User Interaction” ”Electing the Best from Every Possible Future”

29.

30. * Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative finance, 1(2):223, 2001.

31. Injected Orders • Traditional Methods • Use real transactions for feedback; costly & risky. • MarS Benefits • Simulates accurate market impact curves. • Cost-effective alternative to data collection. • Enables data-driven research on collected data. Simulation with Market Impact of Inject Orders Simulation Results Align Well with Empirical Formula from Real Trans. • Evidence that MarS effectively represents a digital twin of the real market. • Define a new way for financial experts to understand their area. • Overall Advantage ≈ Observed 𝒈 = 𝟗. 𝟖 𝒎 Τ 𝒔 𝟐 in Sora Order Size

32. One Sentence Takeaway: We are creating (1) a universal Market Simulation Engine that serves as an interaction layer for investors, risk managers, and regulators, (2) enabling agents to propose and evolve strategies, make decisions, and simulate outcomes through the LMM. 刘炜清：Weiqing.Liu@microsoft.com

33.

34.