AI Agent：从实验室到企业

1. 章毅俞舟

2.

3. AI Agents Labs->World LLM LLM LLM Zhou(Jo) Yu Columbia University & Arklex AI

4. Arklex.AI amazon prize Founder, CEO Dr. Zhou (Jo) Yu ● ● ● ● © 2025 Arklex.AI - All rights reserved Columbia University CS Professor (CMU PhD) Open-source AI models (1M+ downloads) AI Consultant, Microsoft Research Forbes 30 Under 30

5. Arklex.AI What are AI agents? Perception: Multimodal inputs including, text, image, audio, video, touch, etc. Planning (Inner Monologue): Chain-of- Thought reasoning Reflection: meta-reasoning in every stop Actions: function/tool calling, embodied actions. 5 © 2025 Arklex.AI - All rights reserved

6. The field will continue to push the frontier of capability “Levels of AI” (Sam Altman on Levels of AI) Level 1: Chatbots (2022 onwards) Level 2: Reasoners (2024 onwards) Level 3: Agents (2025 onwards) Level 4: Innovators (202?) Level 5: Organizations (20??) Currently: tasks that take seconds to hours (OpenAI Operator, deep research models, Claude Code, Manus) Eventually: hours to days of work Tasks that are at the edge of human performance, or are totally new - New scientific discoveries (Alpha*) - Prize-winning writing - Unsolved mysteries Huge demand for specialists in other fields

8. Arklex.AI Current Agent Ecosystem AI Agent Verticals Applications (Leverage the entire agentic infra to further enhance agents’ usability through UI/UX, verifiers, human intervention, human teaching, etc to provide real ROIs to enterprise and individuals) Agent Orchestration Framework (Leverage search, planning algorithms, tool-use, scaffolds, guardrails, to improve agent tasks’ success-rate, efficiency, security, etc) Arklex Agent Foundation Models (Continue training, reinforcement learning to adapt Foundation Models for agent tasks) Foundation Models (openAI, Anthropic, etc) 8 © 2025 Arklex.AI - All rights reserved

10.

11. Arklex.AI How does Arklex Work? -TaskGraph continual learning 1. Task Graph Generation：language Instruction + API → TaskGraph 2. Human Expert Review: Business team adjusts and verifies TaskGraph with an interactive UI. 3. Control and Compliance: Human hand-over for security and safety 4. Continuous Learning: Agents automatically update TaskGraph based on human agent interactions (offline agent-human simulation + human teaching interactions) 11 © 2025 Arklex.AI - All rights reserved

12. Arklex.AI 1) Task Graph Generation Instruction: Your name is elle, you are a a shopify sales agents of#Store. Make sure you are cheerful and be responsive. You responsibility includes: recommend product, answer product question, give discount, check delivery, …. Available tools: Shopify product database Shopify inventory management Shopify recommendation etc. 12 © 2025 Arklex.AI - All rights reserved

14. Arklex.AI Sample Customer Conversation 1. Product Discovery Agent’s recommendation results Agent User Agent (recommendations) (Intent: the user wants to see a list of products based on some keywords) 14 © 2025 Arklex.AI - All rights reserved

15. Arklex.AI Sample Customer Conversation 2. Proactively Offer Promotion We currently have a promotion on this item. It is now get a second (intent: the user is one for 50% off! exploring products) That’s great! Why don’t you add another yellow one to my cart? Got it! I just added the yellow Cool boy baseball hat to your cart. You’re only $5 away from qualifying for free shipping. Want to add anything else? 15 © 2025 Arklex.AI - All rights reserved

17. Arklex Agents Get Smarter Over Time 17

18. Arklex VS. Other Frameworks Arklex.AI Framework Open-Source Mixed-Control Action Graph NLU Task Composition Human Intervention Continual Learning Traditional Dialog System DialogFlow ✘ ✘ rule-based ✓ ✓ - ✘ Amazon Lex ✘ ✘ rule-based ✓ ✓ - ✘ RASA ✓ ✘ rule-based ✓ ✓ - ✘ LLM-Based Agent Framework LangChain ✓ ✘ LangGraph ✘ ✘ LangGraph ✘ LlamaIndex ✓ ✘ Workflows ✘ ✘ WorkFlows ✘ AG2 ✓ ✘ ✘ ✘ ✘ Always/Never ✓ CrewAI ✓ ✘ Flows ✘ ✘ Always/Never ✘ AgentForce ✘ ✓ - − ✓ - - OpenAI Swarm ✓ ✘ ✘ ✘ ✘ Swarm ✘ Arklex ✓ ✓ ✓ ✓ ✓ ✓ ✓ 18 © 2025 Arklex.AI - All rights reserved

20. Use Case 1: Car Loan Application Challenge Long form to fill that leads to potential customers can’t complete the form in one try, results in low completion rate and high rate in lead loss. Solution Arklex AI framework enables customers to create a solution that provides 24/7 response with clarification questions, multi-language supports and a highly personalized experience that scales Result 40% increase in qualified leads Company is positioned as a leader in customer conversion and engagement

21. Use Case 2: E-commerce Shopping Agents for Product Education Challenge GPT-wrapper Agents often provide wrong or inaccurate answers, creates customer frustration, lose customer interest or potential sales Solution Arklex AI framework enables customers to create a solution that maintains brand voice, ensure natural flow of the conversation and guarantee high accuracy of response Result 50% more accurate response, no AI hallucination 30% higher customer satisfaction 50% of fewer returns Increase sales by 50%

22. ExACT: Teaching AI Agents to Explore with Reflective- MCTS and Exploratory Learning Xiao Yu 1 , Baolin Peng 2† , Vineeth Vajipey 1 , Hao Cheng 2 , Michel Galley 2 Jianfeng Gao 2* & Zhou Yu 1* †Project Lead; *Equal Advisory Contribution 1 Columbia University, NY 2 Microsoft Research, Redmond

23. Background: VLM on Computer Tasks Q: What is he doing? He is performing a skateboard trick… VQA Tasks Computer Tasks Can you help me clear my shopping cart? click button [shopping cart] …. 1

24. 2 Challenge: extremely difficult as interacting with computer was not part of VLM (pre-)training

25. 3 1. Scale test-time compute to improve agent performance 2. Transfer search knowledge back to VLM via training

26. Introduce R-MCTS R-MCTS Introduction = explore decision space and self-improve on-the-fly Scaling test-time compute Transferring search knowledge Conclusion

27. Introduce R-MCTS R-MCTS Introduction = explore decision space and self-improve on-the-fly Scaling test-time compute Transferring search knowledge Conclusion

28. Introduce R-MCTS R-MCTS Introduction = 1 MCTS with contrastive self-reflection Scaling test-time compute Transferring search knowledge Conclusion

29. Introduce R-MCTS R-MCTS Introduction = 1 MCTS with contrastive self-reflection Scaling test-time compute Transferring search knowledge Conclusion

30. Introduce R-MCTS R-MCTS = 1 2 MCTS with contrastive self-reflection and a multi-agent-debate value function 2 Good action, because… Bad action, because… Q=0.07 N=1 V=0.38! V=0.07 Judge Q=0.15 Introduction N=1 N=1 V=0.15 V=0.38 Scaling test-time compute Transferring search knowledge Conclusion

31. Introduce R-MCTS Within each task, R-MCTS performs a tree search to find the best trajectory Introduction Scaling test-time compute Transferring search knowledge Conclusion

32. Introduce R-MCTS Within each task, R-MCTS performs a tree search to find the best trajectory After each task, R-MCTS performs contrastive self-reflection to improve it future execution Introduction Scaling test-time compute Transferring search knowledge Conclusion

33. R-MCTS Results Benchmark: VisualWebArena and OSWorld - Realistic and reproducible - Tasks spans multiple domains VisualWebArena Introduction Scaling test-time compute OSWorld Transferring search knowledge Conclusion

34. R-MCTS Results R-MCTS outperforms other search algorithms (ToT, A*, or MCTS) Introduction Scaling test-time compute Transferring search knowledge Conclusion

35. R-MCTS Results R-MCTS achieves new SOTA on VisualWebArena, and is highly competitive on OSWorld! VisualWebArena Leaderboard Introduction Scaling test-time compute OSWorld Leaderboard Transferring search knowledge Conclusion

36. 3 1. Scale test-time compute to improve agent performance 2. Transfer search knowledge back to VLM via training

37. Introduce Exploratory Learning Exploratory Learning Introduction = explore, evaluate, and backtrack by training on tree traversals! Scaling test-time compute Transferring search knowledge Conclusion

38. Exploratory Learning Results GPT-4o after exploratory learning on R-MCTS trees exhibits compute scaling properties without augmenting with search algorithm! Introduction Scaling test-time compute Transferring search knowledge Conclusion

39. Exploratory Learning Results GPT-4o after exploratory learning on R-MCTS trees exhibits compute scaling properties without augmenting with search algorithm! Eval and backtrack! Introduction Scaling test-time compute Transferring search knowledge Conclusion

40. Summary ❖ Inference/training methods to improve agent performance - - R-MCTS improves agent performance at inference-time Exploratory Learning improve agent performance at training-time R-MCTS

41. Summary ❖ Inference/training methods to improve agent performance - - R-MCTS to improve agent performance at inference-time Exploratory Learning to improve agent performance at training-time R-MCTS ❖ Future work - - RL methods to reduce reliance on tree search Model predictive control (MPC) methods to reduce expensive environment interactions

42. Arklex: Agent-First Organization Framework Join Arklex Open-source Community: Github Documentation Discord

43. DAPLab on AI Agents Advisory Board Richard Zemel, ML Columbia University Director of NSF AI Institute Eugene Wu Zhou Yu Kostis Kaffes David Blei Data Systems NLP Computer Vision ML, Causal Inference Lydia Chilton Adam Elmachtoub Junfeng Yang Baishakhi Ray Human-AI Interaction Operations Research Secure Systems Software Eng Daniel Hsu Shipra Agrawal Carl Vondrick Yunzhu Li ML Theory RL Computer Vision Robotics, Agents Michael Franklin, CS University of Chicago Founder of AMPLab Co-Director of Chicago’s Data Science Institute

44.

45.