eBay风控实时特征平台建设和应用案例

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
1. eBay Risk Real Time Feature Store Jie Li Senior Manager eBay Payments & Risk
2.
3. Sync / Async Risk Check AI Models (Tree + DNN) BLOCK DO NOTHING REMEDY Risk Rules
4. eBay Risk Real Time Feature Store AI Model Real Time Inference Risk Rule Real Time Inference AI Model Batch Inference AI Model Training AI Model Simulation Risk Rule Simulation Online Offline
5. Requirements from AI Models Agenda Requirements from Risk Rules Technical Highlights
6. Point in Time Value Generation for New Features (Offline) Requirements from AI Models Low Latency Feature Update and Batch Fetching (Online) Online Offline Parity
7. Point in Time Value Generation for New Features (Offline) T1 2024-01-01 11:01:01 T2 2024-02-02 12:02:02 Today T3 2024-03-03 13:03:03 Feature 1, 2, 3 Feature 1 Feature 2 Feature 3 T1 T2 T3 … T 10,000,000 eBay Risk Real Time Feature Store Offline Simulation PiT T1 T2 T3 … Feature1 Value 11 Value 12 Value 13 … Feature2 Value 21 Value 22 Value 23 … Feature3 Value31 Value32 Value33 … AI Model Training AI Model Simulation
8. Low Latency Feature Update and Batch Fetching (Online) Online KV Storage Events eBay Risk Real Time Feature Store Online AI Model Real Time Inference Low Latency of Feature Bulk Fetching Low Latency & Data Accuracy
9. Online Offline Parity Online Feature Values Events eBay Risk Real Time Feature Store AI Model Real Time Inference AI Model Batch Inference Online Parity Offline AI Model Training Training Set AI Model Simulation
10. Self Service Requirements from Risk Rules Offline to Online Auto Backfill
11. Self Service Online Feature Values Events eBay Risk Real Time Feature Store Online Offline Training Set New Features
12. Offline to Online Auto Backfill Online Feature Values Events eBay Risk Real Time Feature Store Online Auto Backfill Offline Training Set GMV_by_sellerId_in_last_90day
13. Overview Data Storage Model & DSL Flink-based Online Stream Processing Pipeline Technical Highlights Spark-based Offline Simulation Pipeline Offline to Online Auto Backfill Online Offline Match Rate Report
14. Two Possible Solutions Replicate Offline to Online What Data Source? Online SQL? Python? Java? Config? Data Warehouse Online Pipeline KV Storage Feature Value Shift? Offline SQL Persist Online Snapshot to Offline Training Set Model Training KV Storage Long Accumulation Time Online Offline Snap Shot Training Set
15. Overview Raw Events Enriched Events Event Enrichment Same Data Source eBay Risk Real Time Feature Store Online Stream Processing Online Offline Event Snapshot KV Storage Feature DSL Offline Simulation Pipeline Query Service Online Decisions Same Computation Logic Training Set
16. Storage Data Model & DSL Raw Events Enriched Events Event Enrichment eBay Risk Real Time Feature Store Online Stream Processing Online Offline Event Snapshot KV Storage Query Service Online Decisions Feature DSL Offline Simulation Pipeline Training Set
17. Storage Data Model & DSL – Sliding Window Enriched event Order { sellerId buyerId itemId amount buyerIP } evtCrtTime : : : : : : 1001, 2001, 123, 23.00, 12.23.34.45, 16850500 Writing DSL def var GMV_by_seller_sw { process Order { @swDelta( @evt. evtCrtTime , @evt. sellerId , @evt. amount ) } } Key: GMV_by_seller_sw:1001 Value: Daily buckets Hourly buckets … … COUNT SQUARED_SUM Enriched Events Minutely buckets … def var total_GMV_by_seller (key sellerId, String timeWindow) as double { local sw = @swLoad(sellerId, “ GMV_by_seller_sw ”); return sw.aggregate(Enum.SUM, timeWindow); } SUM Online Stream Processing Reading DSL Storage Data Model MIN MAX Last MDF Time KV Storage Query Service
18. Storage Data Model & DSL – LastK Writing DSL Enriched event SignIn { userId deviceId loginTimeMs evtCrtTime } … Key: def var : : : : 1001, 8001, 16847200, … signin_by_usr_device_lk { process SignIn { } item = map { “ltm” : @evt. loginTimeMs @lastKDelta( @evt. evtCrtTime , @evt. userId , @evt. deviceId , record); } Enriched Events Reading DSL Storage Data Model signin_by_user_device_lk :1001:8001 def var age_of_first_signin (key userId, key deviceId) as long { Value: 0 PiT: 16848200, ltm: 16847200 1 PiT: 16849700, ltm: 16849100 local lastK = @lastKLoad( userId, deviceId, " signin_by_usr_device_lk ” ); 2 PiT: 16850500, ltm: 16850000 return ::lastK.ageOf(Enum.FIRST); } … K-1 PiT: 16899300, ltm: 16899100 Online Stream Processing KV Storage Query Service
19. Flink-based Online Stream Processing Pipeline Raw Events Enriched Events Event Enrichment eBay Risk Real Time Feature Store Online Stream Processing Online Offline Event Snapshot KV Storage Query Service Online Decisions Feature DSL Offline Simulation Pipeline Training Set
20. Flink-based Online Stream Processing Pipeline Low Latency & Data Accuracy Flink’s at least once semantics (Low Latency) Store kafka offsets in feature value model to achieve idempotent update (At most once) Flink check point 1. Kafka Offsets (Resume from failure) 2. Unique Delta ID list (Dedup) 3. Unapplied Delta List (At least once) Online Stream Processing Delta Generator Kafka Kafka Enriched Events Flink Pipeline KV Storage Total_gmv_by_seller Order(X) EventId: abc SellerId: 123 Order(X) EventId: abc SellerId: 123 Delta2 ID: abc-1 SellerId: 123 Delta1 ID: abc-1 SellerId: 123 123 123 $100 $105 Amount: $5 Amount: $5 Amount: $5 Amount: $5 T0:P0:0 T0:P0:0
21. Spark-based Offline Simulation Pipeline Raw Events Enriched Events Event Enrichment eBay Risk Real Time Feature Store Online Stream Processing Online Offline Event Snapshot KV Storage Query Service Online Decisions Feature DSL Offline Simulation Pipeline Training Set
22. Spark-based Offline Simulation Pipeline 1, Feature Set (Hundreds of) total_gmv_by_seller_last60D (f1) total_gmv_by_seller_last24H (f2) total_gmv_by_seller_last5Min (f3) 2, Driver Set (Millions of) PiT1 20240401 04:04:04 SellerId1 PiT2 20240301 03:03:03 SellerId2 PiT3 20240201 02:02:02 SellerId3 3, Event Snapshot Event Snapshot Dependencies • • • • Reading DSL (total_gmv_by_seller) Writing DSL (gmv_by_seller_sw) Events (Order) Fields (evtCrtTime, sellerId, amount) Spark jobs Time range & Keys • Load event snapshot data with time range > PiT 3 – 60D & < PiT1 • Query keys (SellerId1, SellerId2, SellerId3) PiT PiT1 Key SellerId1 f1 Value 11 f2 Value 12 f3 Value 13 PiT2 SellerId2 Value 21 Value 22 Value 23 PiT3 SellerId3 Value 31 Value 32 Value 33
23. Offline to Online Auto Backfill Pipeline Real Time Raw Events Enriched Events Online Stream Processing KV Storage Historical KV Storage Event Enrichment Query Service Threhdhold Time Online Offline Event Snapshot Backfill Pipeline Backfill: 90 Days Real Time GMV_by_sellerId_in_last_90day
24. Online Offline Match Rate Report Traffic Mirror Online Stream Processing Online Offline Event Snapshot KV Storage Query Service Feature Set Feature Value List Driver Set Offline Simulation Pipeline Training Set Match Rate Report
25. Real Time Graph Feature More Feature Auto Generation and Recommendation RAG based Feature Engineering
26. Thank You

- 위키
Copyright © 2011-2025 iteam. Current version is 2.139.1. UTC+08:00, 2025-01-18 10:49
浙ICP备14020137号-1 $방문자$