基于元数据和配置驱动的 eBay 交易风控 AI 模型管理和部署实践
如果无法正常显示,请先停止浏览器的去广告插件。
1. Model Spec Driven AI Model
Management & Deployment at
eBay Payments Risk
Bing Wang
eBay Payments & Risk
2.
3. Agenda
eBay Payments Risk AI Model Lifecycle and
1
Model Spec
Unified Context for Model Training & Model
Serving
3 Model Integration & Deployment by Model
Spec
4 Model Serving Observability & Monitoring
2
4. eBay Payments Risk AI Model Lifecycle and
Model Spec
5. Payments Risk AI Model Lifecycle
Model Refresh(Refit)
Feature
Engineering
Offline
Model
Training
Model
Deployment
Performance
Validation
Business
Usage
Online
6. Metadata in AI Model Lifecycle
Model Refresh(Refit)
Feature
Engineering
Model
Training
Offline
Raw Features
Metadata
Training Dataset Metadata, Pipeline
Metadata, Model Object Metadata,…
Model
Deployment
Performance
Validation
Business
Usage
Online
Model Service API Metadata,
Features Fetching Metadata,
Feature Preprocessing Metadata,
Model Prediction Metadata, Model
Output Post-processing Metadata
7. Metadata in AI Model Lifecycle
Model Refresh(Refit)
Feature
Engineering
Model
Training
Offline
Raw Features
Metadata
Model
Deployment
Performance
Validation
Business
Usage
Online
Model Service API Metadata,
Features Fetching Metadata,
Model Spec
Training Dataset Metadata, Pipeline (Model Specification) Feature Preprocessing Metadata,
Model Prediction Metadata, Model
Metadata, Model Object Metadata,…
Output Post-processing Metadata
8. Metadata Group - Model Spec
Model Spec (Model Specification)
Basic Model
Information: owners,
model type, scenario,
refresh frequency, ...
Feature Fetching:
feature name, data
source, value type,
default value, ...
Inference Preprocessing & Post-
processing: dependent raw
features, feature preprocessing
expression, model output mapping
logics, …
Model Object: model
type, framework and
version, parameters,
target SLA, …
Multi-models Inference: pipeline
definition, model routing definition
Monitoring and Logging: schema
definition, metrics, event/table
information, …
9. Unified Context for Model Training & Model
Serving
10. Model Deployment
translation
Training
Outputs
Model Training
Pipeline
Model Application
Codes
Training codes
Deploy
copy
Model Service
Model files
(pkl, txt, json, bin)
Model files
(pkl, txt, json, bin)
11. Model Deployment
translation
Training
Outputs
Model Training
Pipeline
Model Application
Codes
Training codes
Deploy
copy
Model Service
Model files
(pkl, txt, json, bin)
Model files
(pkl, txt, json, bin)
• Much Manual Effort
• Vulnerable to discrepancy between model
training and inference
12. Model Integration
deploy
translation
request
Features
Fetching
Model
API Call
Features &
Inference Result
Monitoring
Inference request
Model Service
Business Domain
Service
13. Model Integration
deploy
translation
request
Features
Fetching
Model
API Call
Features &
Inference Result
Monitoring
Inference request
Model Service
Business Domain
Service
• Much Manual Effort
• Vulnerable to discrepancy between model
training and inference
14. Different Context
Model
Training
(Data Scientists)
own context
own context
Model
Integrating
(Domain Engineers)
own context
Model
Deploying
(Data Engineers)
15. Unified Context – Model Spec
Model
Training
(Data Scientists)
Model
Integrating
(Domain Engineers)
Model
Spec
Model
Deploying
(Data Engineers)
16. Model Integration & Deployment by Model
Spec
17. Feature Preprocessing
The traditional way to move feature
preprocessing logics from model training
to inference is serializing/deserializing
object by pickle
18. Feature Preprocessing
Problem I : Forcing dependency on libraries
in different environments
Solution in Model Spec : Reproduce
preprocessing by Logics Representation
Dumping &
Saving
*.pkl
object
object
Logics
Representation
Loading &
Parsing
Logics
Representation
19. Feature Preprocessing
Problem II : Data processing performance
for singleton inference is not optimal
Batch
Processing
Singleton
Processing
Optimization in Model Spec: concurrency by
multiprocessing , std::thread
Multi-threading
Multi-processing
20. Feature Preprocessing
Move feature preprocessing logics from
model training to inference by Model
Spec
21. Codes to Representations in Model Spec
Dumping
Parsing
Logics Representation
Logics Representation
Feature Fetching
Model Feature Preprocessing
Model Object
Model Output Post-processing
Model Inference Routing
22. Configurations Snapshotting from Model Spec
Feature Fetching Configuration
Feature Preprocessing Configuration
Model Spec Store
snapshotting
versioning
Model Object Configuration
Model Ouput Post-preprocessing
Configuration
……
23. Configuration Deployment
Configuration Deployment in Business Domain Service
Configuration
Sync
Configuration
Validation
Feature Fetcher Object
Building
Canary Change
Event
Dropping
Configuration Deployment in Model Service
Configuration
Sync
Configuration
Validation
Model Inference
Session Building
Canary Change
Event
Dropping
24. Configuration Deployment
Configuration Deployment in Business Domain Service
Configuration
Sync
Configuration
Validation
Feature Fetcher Object
Building
Canary Change
Event
Dropping
Metadata and Configuration Driven, Few Code
Changes Needed
Configuration Deployment in Model Service
Configuration
Sync
Configuration
Validation
Model Inference
Session Building
Canary Change
Event
Dropping
25. Model Integration & Deployment by Model Spec
Model Training Pipeline
Model Spec Library
Read/Update
Model Spec
Model Spec Store
Read/Update
Model Spec
Request
Domain Service
Model Spec Library
Read/Update
Model Spec
Model Inference Request
Model Service
Model Spec Library
26. 4
Model Serving Observability & Monitoring
27. ML System Monitoring
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt
Reduction
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley Proceedings of IEEE
Big Data (2017)
28. ML System Observability & Monitoring
Model Output Monitoring:
- default result rate
- score distribution
…
Model Features Monitoring:
- null / empty rate
- value distribution
…
Model System Monitoring:
- latency
- error rate
…
29. Model Outputs & Features Observability & Monitoring
Log
Events
Apache
Flink
Kafka Cluster
Aggregated
Events
Monitoring
Metadata
Events Consumer
& Processor
NRT
Metrics
Monitoring
Metadata
Model Spec Store
Monitoring Metadata
Log
Events
Hadoop
Hadoop
HDFS
Offline
Metrics
30. Thank You