想要一个简单易用的实时ML平台？你必须知道的一些复杂细节

1. Invisible Interfaces Considerations for Abstracting Complexities of a Real-time ML Platform Zhenzhong Xu Cofounder & CTO @ claypot.ai July, 2023

2. The discovery of something invisible

3. The Invisible Interface Ubiquitous Easy and responsive Just works! The endeavor to make things useful

4. Real-time Decisions that powers your business Fraud prevention Personalization Trending products ETA Customer support Dynamic pricing/discounting Risk Assessment Account Take Over Ads Network analysis Sentiment analysis Object detection …

5. The world is moving towards real-time ● ● ● ● ● Instacart: The Journey to Real-Time Machine Learning (2022) ○ Directly reduces millions of fraud-related costs annually. LinkedIn’s Real-time Anti-abuse (2022) ○ LinkedIn moved from an offline pipeline (hours) to real-time pipeline (minutes), and saw 30% increase in bad actors caught online and 21% improvement in fake account detection. How WhatsApp catches and fights abuse (2022 | slides) ○ A few 100ms delay can increase the spam by 20-30%. How Pinterest Leverages Realtime User Actions in Recommendation to Boost Engagement (2022) ○ According to Pinterest, this “has been one of our most impactful innovations recently, increasing Home feed engagement by 11% while reducing Pinner hide volume by 10%.” Airbnb: Real-time Personalization using Embeddings for Search Ranking (2018) ○ Moving from offline scoring to online scoring grows bookings by +5.1% 5

6. Real-time Decisions Exploration & Research Model Architecture & Turning Model Analysis & Selection LLM Prompt Engineering Data Fabric for Real-time AI Data Infrastructure Data Sources Ingestion & Transport Storage Query & Compute Workflow Orchestration Analytics / Visualization Multi-tenancy Isolation Security & Governance

7. Prediction Input Data Flow Model Serving Product Ecosystem Data Model Evaluation Model Flow Model Training Training Input Model Monitoring Data Monitoring Analytics ecosystem

8. The hard things towards real-time decisions ● ● ● Data silo and staleness Collaboration overhead Tech complexity

9.

10. Challenge 1 : From Experimentation to Production ● ● ● Slow prototyping Local vs. remote execution Divergent language & runtime

11. Local Experimentation with Traditional Models

12. Local Experimentation with LLMs

13.

14. Feature API Data scientists Central repo Create, experiment, & deploy features Prediction service Feature catalog Computation engines Feature store online + offline Sources Training service

15. Need an invisible interface to plug into compute ecosystems Local/Single Machine Remote/Distributed

16. Declare features with familiar APIs @transformation def average_transaction_amount_by_merchant( tx: Transactions, wspec: WindowSpec): return tx.groupby(["cc_num", "merchant"])["amt"].window(wspec).mean()

17. Data Science Friendly: Python <> SQL @transformation def transaction_count(tx: Transactions, wspec: WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Relational Expression Workload Compiler / Optimizer Deployment 17

18. Same code can run on different computation engines @transformation def transaction_count(tx: Transactions, wspec: WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Intermediate Representation Relational Expression Workload Compiler/Optimizer Deployment Compile into a relational expression (RE), which is SQL equivalent Compile & optimize RE into the computation engine (e.g., Panda, DuckDb, Flink, Spark) best suited for the job Spin up and manage computation jobs

19. Solution 1 : Relational Expression based Compilation ● ● ● ● Unified yet familiar API Pluggable to many compute engines Minimize human error Prototype in minutes

20. Challenge 2: Streaming and Batch Divided ● ● ● Evolving architecture Difficult to backfill Train-predict inconsistencies

21. Lambda Architecture Online Query (serving) In-motion Compute Online Storage Mixed Query (backfill) Data Source At-rest Compute Offline Storage Offline Query (training)

22. Kappa (Streaming) Architecture Online Query (serving) streaming transformation Data Source In-motion Compute (Backfill from historical log) Materialized Views batch transformation Offline Query (training)

23. Unified Architecture Online Query (serving) streaming transformation In-motion Compute (intelligent backfill from dual sources) Data Source Materialized Views Backing batch transformation DWH backed logs Offline Query (training)

24. Batch and streaming source unified to simplify backfill Dual source cutover Stream DWH Time

25. Need an invisible interface to plug into storage ecosystems Streaming Leaning Batch Leaning

26. Data Fabric for a Streaming Pipeline

27. Data Fabric for a Unified Backfill Pipeline

28. Training dataset backfill requires point-in-time correctness Prediction events Feature data Feature data Feature data Feature data Time

29. Point-in-time joins to generate training data Given a spine (entity keys + timestamp + label), join features to generate training data cc_num_tx_max_1h spine_df user_unique_id_30d inference_ts tid cc_num user_id is_fraud ts cc_num tx_max_1h ts user_id unique_ip_30d 21:30 0122 2 1 0 9:20 2 … 6:00 1 … 21:40 0298 4 1 0 10:24 2 … 6:00 3 … 21:55 7539 6 3 1 20:00 4 … 6:00 5 … train_df = pitc_join_features( spine_df, features=[ "tx_max_1h", "user_unique_ip_30d", ], ) inference_ts tid cc_num user_id is_fraud tx_max_1h user_unique_ip_30d 21:30 0122 2 1 1 … … 21:40 0298 4 1 1 … … 21:55 7539 6 3 3 … … Proprietary & Confidential 29

30. Solution 2: Abstract streaming and batch data storage ● ● ● Unified streaming & batch source Unified online & offline feature stores Pluggable to most storage technologies

31. Challenge 3: It should just work! ● Cost, latency, correctness surprises! ● Lack optimizations knobs

32. Stream processing without consistency (fast and cheap) Cost Batch processing (cheap and correct) Latency Correctness Stream processing with consistency enforced (fast and correct)

33. Optimization @transformation def transaction_count(tx: Transactions, wspec: WindowSpec): return tx[tx.status == "failed"].groupby("account_id").window(wspec).count() Relational Expression Workload Compilation Optimization Deployment Various intelligent optimization can be done to make appropriate tradeoff across storage and compute systems.

34. Claypot Feature SDK (Python) Feature Catalog Guardrail for schema changes Tunable workload optimization Unified Processing Filter Customer managed in your own cloud Feature Serving Offline store Union Join Filter Online store Scan Scan

35. Solution 3: Optimization knobs ● ● ● Abstract optimization complexity User controls with high level knobs Trust, no surprises!

36. Make invisible interface possible! ● ● ● Ubiquitous Easy and responsive Just works! https://zhenzhongxu.com/ zhenzhong@claypot.ai the invisible interface