Building Applications with LLMs From traditional ML engineering to AI engineering
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. Building Applications with
LLMs
From traditional ML
engineering to AI engineering
Sinan Tang @Zalando
Women+ in Data and AI Festival
27.09.2024
2. Agenda
● Demystifying AI engineering
● Meet the Zalando Assistant
● AI engineering techniques:
adapting models
●
AI engineering techniques:
evaluation
● Summary
● Q&A
3. 01
Introduction to AI Engineering
The AI tech stack
4. Demystifying AI Engineering
Welcome to this year’s
Women+ in Data and AI __
5. Demystifying AI Engineering
…
Welcome to this year’s Collaboration
Women+ in Data and AI Conference
Talk
Event
…
6. AI Engineering Stack
7. AI Engineering
vs.
ML Engineering
“it’s less about model
development and
more about adapting
and evaluating
foundation models”
8. 02
Meet the Zalando Assistant
9. Zalando Assistant (ZA)
ZA is an AI-powered experience
allowing Zalando customers to
discover fashion items, style tips and
more by using their own language.
It’s multilingual and able to interact
with customers in any European
language.
10. SE CC 2024
The core ZA experience is
completely dynamic: A fluid
conversation between the customer
and the assistant (powered by
ChatGPT and Zalando’s own
semantic search model).
“I'm looking for a wardrobe refresh. Bright colours,
fun patterns, unusual cuts.”
“I need sport fashion ideas”
“Need help to buy my fiancé a birthday present!”
“I want Stockholm style”
11. Integrating LLM into Zalando Assistant
Main Flow
12. Integrating LLM into Zalando Assistant
Actions
13. Challenges building Zalando Assistant
& working with LLM
14. 03
AI Engineering Techniques
Adapting AI Models
15. How to make AI
work for you
Prompting
What is a good prompt?
The TELeR framework <Turn,
Expression, Level of details, Role>
16. How to make AI
work for you
Few-shot learning
Prompt
Prompting
Image from Language Models are Few-Shot Learners
17. How to make AI
work for you
Few-shot learning
Performance benchmark
Prompting
Image from Language Models are Few-Shot Learners
18. How to make AI
work for you
RAG
Retrieval-Augmented Generation
19. How to make AI
work for you
Limitations
Limitations of prompt engineering
& RAG
❖ Reliance on prompt quality
❖ Complexity and iteration
❖ Domain specificity
❖ Potential bias
❖ Limited control on the output
❖ Limited context window
20. 04
AI Engineering Techniques
Evaluation
21. How do you know
it’s good (enough)
The problem of
evaluation
“Lack of evaluations has
been a key challenge for
deploying to production.”
— OpenAI Dev Day 2023
22. How do you know
it’s good (enough)
Challenges
23. How do you know
it’s good (enough)
Evaluating
open-ended
models
24. ZA: Scalable Evaluation
Evaluation Strategy
● Offline evaluation
○ Objective metrics (punchcard
evaluation)
○ AI-as-a-Judge (LLM-based evaluation)
○ Human & machine annotations
● Comprehensive testing suite
○ Regression tests
○ Unit tests
○ Smoke tests
○ Conversation replay
25. ZA: Scalable Evaluation
Fix!
Monitor Find Understand & improve
We can monitor if a change or a
bugfix has the desired effect on a
granular level. We can more easily find examples
of conversations with specific
issues. We can better understand how
customers are using the agent on
average and improve the experience to
meet their expectations.
26. ZA: Scalable Evaluation
Evaluate conversations
Improve
evaluations
Deploy new version
Improve Zalando
Assistant
Monitor, find and
understand conversations
27. 05
Summary
28.
29. Thank you