Data-Centric Machine Learning: Building Shopify Inbox’ s Message Classification Model
摘要
Shopify Inbox is a single business chat app that manages all Shopify merchants’ customer communications in one place, and turns chats into conversions. As we were building the product it was essential for us to understand how our merchants’ customers were using chat applications. Were they reaching out looking for product recommendations? Wondering if an item would ship to their destination? Or were they just saying hello? With this information we could help merchants prioritize responses that would convert into sales and guide our product team on what functionality to build next. However, with millions of unique messages exchanged in Shopify Inbox per month, this was going to be a challenging natural language processing (NLP) task.
Our team didn’t need to start from scratch, though: off-the-shelf NLP models are widely available to everyone. With this in mind, we decided to apply a newly popular machine learning process—the data-centric approach. We wanted to focus on fine-tuning these pre-trained models on our own data to yield the highest model accuracy, and deliver the best experience for our merchants.
We’ll share our journey of building a message classification model for Shopify Inbox by applying the data-centric approach. From defining our classification taxonomy to carefully training our annotators on labeling, we dive into how a data-centric approach, coupled with a state-of-the-art pre-trained model, led to a very accurate prediction service we’re now running in production.
欢迎在评论区写下你对这篇文章的看法。