Building Applications with LLMs From traditional ML engineering to AI engineering

1. Building Applications with LLMs From traditional ML engineering to AI engineering Sinan Tang @Zalando Women+ in Data and AI Festival 27.09.2024

2. Agenda ● Demystifying AI engineering ● Meet the Zalando Assistant ● AI engineering techniques: adapting models ● AI engineering techniques: evaluation ● Summary ● Q&A

3. 01 Introduction to AI Engineering The AI tech stack

4. Demystifying AI Engineering Welcome to this year’s Women+ in Data and AI __

5. Demystifying AI Engineering … Welcome to this year’s Collaboration Women+ in Data and AI Conference Talk Event …

6. AI Engineering Stack

7. AI Engineering vs. ML Engineering “it’s less about model development and more about adapting and evaluating foundation models”

8. 02 Meet the Zalando Assistant

9. Zalando Assistant (ZA) ZA is an AI-powered experience allowing Zalando customers to discover fashion items, style tips and more by using their own language. It’s multilingual and able to interact with customers in any European language.

10. SE CC 2024 The core ZA experience is completely dynamic: A ﬂuid conversation between the customer and the assistant (powered by ChatGPT and Zalando’s own semantic search model). “I'm looking for a wardrobe refresh. Bright colours, fun patterns, unusual cuts.” “I need sport fashion ideas” “Need help to buy my ﬁancé a birthday present!” “I want Stockholm style”

11. Integrating LLM into Zalando Assistant Main Flow

12. Integrating LLM into Zalando Assistant Actions

13. Challenges building Zalando Assistant & working with LLM

14. 03 AI Engineering Techniques Adapting AI Models

15. How to make AI work for you Prompting What is a good prompt? The TELeR framework <Turn, Expression, Level of details, Role>

16. How to make AI work for you Few-shot learning Prompt Prompting Image from Language Models are Few-Shot Learners

17. How to make AI work for you Few-shot learning Performance benchmark Prompting Image from Language Models are Few-Shot Learners

18. How to make AI work for you RAG Retrieval-Augmented Generation

19. How to make AI work for you Limitations Limitations of prompt engineering & RAG ❖ Reliance on prompt quality ❖ Complexity and iteration ❖ Domain speciﬁcity ❖ Potential bias ❖ Limited control on the output ❖ Limited context window

20. 04 AI Engineering Techniques Evaluation

21. How do you know it’s good (enough) The problem of evaluation “Lack of evaluations has been a key challenge for deploying to production.” — OpenAI Dev Day 2023

22. How do you know it’s good (enough) Challenges

23. How do you know it’s good (enough) Evaluating open-ended models

24. ZA: Scalable Evaluation Evaluation Strategy ● Oﬄine evaluation ○ Objective metrics (punchcard evaluation) ○ AI-as-a-Judge (LLM-based evaluation) ○ Human & machine annotations ● Comprehensive testing suite ○ Regression tests ○ Unit tests ○ Smoke tests ○ Conversation replay

25. ZA: Scalable Evaluation Fix! Monitor Find Understand & improve We can monitor if a change or a bugﬁx has the desired eﬀect on a granular level. We can more easily ﬁnd examples of conversations with speciﬁc issues. We can better understand how customers are using the agent on average and improve the experience to meet their expectations.

26. ZA: Scalable Evaluation Evaluate conversations Improve evaluations Deploy new version Improve Zalando Assistant Monitor, ﬁnd and understand conversations

27. 05 Summary

28.

29. Thank you