eBay风控实时特征平台建设和应用案例
如果无法正常显示,请先停止浏览器的去广告插件。
1. eBay Risk
Real Time Feature Store
Jie Li
Senior Manager
eBay Payments & Risk
2.
3. Sync / Async
Risk Check
AI Models
(Tree + DNN)
BLOCK
DO NOTHING
REMEDY
Risk
Rules
4. eBay Risk
Real Time
Feature
Store
AI Model
Real Time
Inference Risk Rule
Real Time
Inference AI Model
Batch
Inference
AI Model
Training AI Model
Simulation Risk Rule
Simulation
Online
Offline
5. Requirements from AI Models
Agenda
Requirements from Risk Rules
Technical Highlights
6. Point in Time Value Generation for
New Features (Offline)
Requirements from
AI Models
Low Latency Feature Update and
Batch Fetching (Online)
Online Offline Parity
7. Point in Time Value Generation for New Features (Offline)
T1
2024-01-01
11:01:01
T2
2024-02-02
12:02:02
Today
T3
2024-03-03
13:03:03
Feature 1, 2, 3
Feature 1
Feature 2
Feature 3
T1
T2
T3
…
T 10,000,000
eBay Risk
Real Time
Feature
Store
Offline
Simulation
PiT
T1
T2
T3
…
Feature1
Value 11
Value 12
Value 13
…
Feature2
Value 21
Value 22
Value 23
…
Feature3
Value31
Value32
Value33
…
AI Model
Training
AI Model
Simulation
8. Low Latency Feature Update and Batch Fetching (Online)
Online KV Storage
Events
eBay Risk
Real Time
Feature
Store
Online
AI Model
Real Time
Inference
Low Latency of Feature
Bulk Fetching
Low Latency & Data Accuracy
9. Online Offline Parity
Online Feature Values
Events
eBay Risk
Real Time
Feature
Store
AI Model
Real Time
Inference
AI Model
Batch
Inference
Online
Parity
Offline
AI Model
Training
Training Set
AI Model
Simulation
10. Self Service
Requirements from
Risk Rules
Offline to Online Auto Backfill
11. Self Service
Online Feature Values
Events
eBay Risk
Real Time
Feature
Store
Online
Offline
Training Set
New Features
12. Offline to Online Auto Backfill
Online Feature Values
Events
eBay Risk
Real Time
Feature
Store
Online
Auto Backfill
Offline
Training Set
GMV_by_sellerId_in_last_90day
13. Overview
Data Storage Model & DSL
Flink-based Online Stream Processing Pipeline
Technical Highlights
Spark-based Offline Simulation Pipeline
Offline to Online Auto Backfill
Online Offline Match Rate Report
14. Two Possible Solutions
Replicate Offline to Online
What
Data
Source?
Online
SQL?
Python?
Java?
Config?
Data
Warehouse
Online
Pipeline
KV
Storage
Feature
Value
Shift?
Offline
SQL
Persist Online Snapshot to Offline
Training
Set
Model
Training
KV
Storage
Long
Accumulation
Time
Online
Offline
Snap
Shot
Training
Set
15. Overview
Raw
Events
Enriched
Events
Event
Enrichment
Same Data Source
eBay Risk Real Time Feature Store
Online
Stream
Processing
Online
Offline
Event
Snapshot
KV
Storage
Feature
DSL
Offline
Simulation
Pipeline
Query
Service
Online
Decisions
Same
Computation
Logic
Training
Set
16. Storage Data Model & DSL
Raw
Events
Enriched
Events
Event
Enrichment
eBay Risk Real Time Feature Store
Online
Stream
Processing
Online
Offline
Event
Snapshot
KV
Storage
Query
Service
Online
Decisions
Feature
DSL
Offline
Simulation
Pipeline
Training
Set
17. Storage Data Model & DSL – Sliding Window
Enriched event
Order {
sellerId
buyerId
itemId
amount
buyerIP
}
evtCrtTime
:
:
:
:
:
:
1001,
2001,
123,
23.00,
12.23.34.45,
16850500
Writing DSL
def var GMV_by_seller_sw {
process Order {
@swDelta(
@evt. evtCrtTime ,
@evt. sellerId ,
@evt. amount
)
}
}
Key: GMV_by_seller_sw:1001
Value:
Daily
buckets
Hourly
buckets
…
…
COUNT
SQUARED_SUM
Enriched
Events
Minutely
buckets
…
def var total_GMV_by_seller
(key sellerId, String timeWindow) as
double {
local sw = @swLoad(sellerId,
“ GMV_by_seller_sw ”);
return sw.aggregate(Enum.SUM,
timeWindow);
}
SUM
Online
Stream
Processing
Reading DSL
Storage Data Model
MIN
MAX
Last MDF Time
KV
Storage
Query
Service
18. Storage Data Model & DSL – LastK
Writing DSL
Enriched event
SignIn {
userId
deviceId
loginTimeMs
evtCrtTime
}
…
Key:
def var
:
:
:
:
1001,
8001,
16847200,
…
signin_by_usr_device_lk {
process SignIn {
}
item = map {
“ltm” : @evt. loginTimeMs
@lastKDelta(
@evt. evtCrtTime ,
@evt. userId ,
@evt. deviceId , record);
}
Enriched
Events
Reading DSL
Storage Data Model
signin_by_user_device_lk
:1001:8001
def var age_of_first_signin
(key userId, key deviceId) as long {
Value:
0 PiT: 16848200, ltm: 16847200 1 PiT: 16849700, ltm: 16849100 local lastK = @lastKLoad(
userId,
deviceId,
" signin_by_usr_device_lk ”
);
2 PiT: 16850500, ltm: 16850000 return ::lastK.ageOf(Enum.FIRST);
}
…
K-1 PiT: 16899300, ltm: 16899100
Online
Stream
Processing
KV
Storage
Query
Service
19. Flink-based Online Stream Processing Pipeline
Raw
Events
Enriched
Events
Event
Enrichment
eBay Risk Real Time Feature Store
Online
Stream
Processing
Online
Offline
Event
Snapshot
KV
Storage
Query
Service
Online
Decisions
Feature
DSL
Offline
Simulation
Pipeline
Training
Set
20. Flink-based Online Stream Processing Pipeline
Low Latency & Data Accuracy
Flink’s at least
once semantics
(Low Latency)
Store kafka offsets
in feature value
model to achieve
idempotent update
(At most once)
Flink check point
1. Kafka Offsets (Resume from failure)
2. Unique Delta ID list (Dedup)
3. Unapplied Delta List (At least once)
Online Stream Processing
Delta
Generator
Kafka
Kafka
Enriched
Events
Flink
Pipeline
KV
Storage
Total_gmv_by_seller
Order(X)
EventId: abc
SellerId: 123 Order(X)
EventId: abc
SellerId: 123 Delta2
ID: abc-1
SellerId: 123 Delta1
ID: abc-1
SellerId: 123 123 123
$100 $105
Amount: $5 Amount: $5 Amount: $5 Amount: $5 T0:P0:0 T0:P0:0
21. Spark-based Offline Simulation Pipeline
Raw
Events
Enriched
Events
Event
Enrichment
eBay Risk Real Time Feature Store
Online
Stream
Processing
Online
Offline
Event
Snapshot
KV
Storage
Query
Service
Online
Decisions
Feature
DSL
Offline
Simulation
Pipeline
Training
Set
22. Spark-based Offline Simulation Pipeline
1, Feature Set (Hundreds of)
total_gmv_by_seller_last60D (f1)
total_gmv_by_seller_last24H (f2)
total_gmv_by_seller_last5Min (f3)
2, Driver Set (Millions of)
PiT1 20240401 04:04:04 SellerId1
PiT2 20240301 03:03:03 SellerId2
PiT3 20240201 02:02:02 SellerId3
3, Event Snapshot
Event
Snapshot
Dependencies
•
•
•
•
Reading DSL (total_gmv_by_seller)
Writing DSL (gmv_by_seller_sw)
Events (Order)
Fields (evtCrtTime, sellerId, amount)
Spark jobs
Time range & Keys
• Load event snapshot data with time
range > PiT 3 – 60D & < PiT1
• Query keys (SellerId1, SellerId2,
SellerId3)
PiT
PiT1 Key
SellerId1 f1
Value 11
f2
Value 12
f3
Value 13
PiT2 SellerId2 Value 21
Value 22
Value 23
PiT3 SellerId3 Value 31 Value 32 Value 33
23. Offline to Online Auto Backfill Pipeline
Real Time
Raw
Events
Enriched
Events
Online
Stream
Processing
KV
Storage
Historical
KV
Storage
Event
Enrichment
Query
Service
Threhdhold Time
Online
Offline
Event
Snapshot
Backfill
Pipeline
Backfill: 90 Days
Real Time
GMV_by_sellerId_in_last_90day
24. Online Offline Match Rate Report
Traffic Mirror
Online
Stream
Processing
Online
Offline
Event
Snapshot
KV
Storage
Query
Service
Feature
Set
Feature
Value List
Driver
Set
Offline
Simulation
Pipeline
Training
Set
Match
Rate
Report
25. Real Time Graph Feature
More
Feature Auto Generation and Recommendation
RAG based Feature Engineering
26. Thank You