Machine learned search- setting up a production pipeline
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. Machine learned
search: setting up
a production
pipeline
Zalando SE
Maximilian Werk
Senior Research Engineer
20-01-2020
2. 2
Picture from Pexels.com
3. Machine learned
search: setting up
a production
pipeline
Zalando SE
Maximilian Werk
Senior Research Engineer
20-01-2020
4. Programming vs. Software Engineering
Who is a Software Engineer?
Who does programming in their day-to-day work?
Who has a machine learning background?
4
5. 5
6. Our Information Retrieval Pipeline
Full text query
spell-correction
NER
synonym & acronym
recognition
disambiguation
query-builder
elasticsearch
Articles
6
7. Failing classical information retrieval
“pullover patchwork”
7
“top figurumspielend”
“abendkleid tattoospitze”
8. Adding ML based solution
Full text query
spell-correction
NER
synonym & acronym
recognition
disambiguation
query-builder
ML based
lookup table
8
elasticsearch
Articles
9. Classical system vs. end-to-end product search system
offline
offline
Query
Product
② parsing
Symbolic
representation
9
③ matching
① indexing
Symbolic
representation
Query Product
deep learning deep learning
Latent
representation
matching
Latent
representation
10. History
First Idea
Complex,
slow
Prototype
Live
Almost
there
2017
2016/2017
End of 2018
Research
Version
1 + 2
10
Simple &
Fast
Live Live
11. Model degradation
11
12. Building a pipeline
12 Picture from Pexels.com
13. ML pipeline
Data Preparation
Query selection Training
Article selection Batch Serving
Sanity Check
ML based
lookup table
13
Statistics generation
14. Our Technology Stack
BigQuery
Github
CDP
(CI/CD)
Sagemaker
Machine
learning
pipeline
Python
airflow
pytorch
14
zflow
orchestration
15. 15
16. One-time job vs. continuous development
“Obvious decisions”
Complex model vs. simple model
Batch vs. live
Manual vs. automated
Training cadence (daily, weekly, monthly, irregularly)
16
17. One-time job vs. continuous development
“Hidden decisions”
Fast now vs. fast, maintainable, robust later
Scripting vs. Software Development
17
18. Talk, talk talk!
18 Picture from Pexels.com
19. Testing is hard
19 Picture from Pexels.com
20. Configuration is complex
20 Picture from Pexels.com
21. Follow standards & best practices
21 Picture from Pexels.com
22. Do clean code
BÄÄÄH
Nice
22
23. About good (ML) code
Correct
Simple functions
Written for others to read
Accessible business logic
Pipeline steps are independently executable
23
24. Producing good code
1)
2)
3)
4)
5)
6)
Feature
Correct
Readable
Simple
Readable
Feature is still correct?
a)
No => go to 1)
b)
Yes => Happy days!
Be a scout: Leave the code cleaner than you found it.
24
25. Future work
Improve Model
Monitoring
Model influences training data
Add more use cases
Replace tradition IR search
25
26. Takeaways
Simple model does the job
Pipeline building takes a lot of time
Train code craftsmanship
26
27. Maximilian Werk
Search - Team Lens
Senior Research Engineer
maximilian@zalando.de
Twitter: @maintainable_ds
We hire:
Senior Search Engineer
Principal Research Engineer - Search
Principal Product Manager - Search
20-01-2020