以数据为中心的机器学习。构建Shopify Inbox的信息分类模型

By Eric Fung and Diego Castañeda

作者:Eric Fung和Diego Castañeda

Shopify Inbox is a single business chat app that manages all Shopify merchants’ customer communications in one place, and turns chats into conversions. As we were building the product it was essential for us to understand how our merchants’ customers were using chat applications. Were they reaching out looking for product recommendations? Wondering if an item would ship to their destination? Or were they just saying hello? With this information we could help merchants prioritize responses that would convert into sales and guide our product team on what functionality to build next. However, with millions of unique messages exchanged in Shopify Inbox per month, this was going to be a challenging natural language processing (NLP) task. 

Shopify Inbox是一个单一的商业聊天应用程序,它在一个地方管理所有Shopify商家的客户沟通,并将聊天转化为转化率。在我们开发该产品时,了解我们商家的客户是如何使用聊天应用程序的,对我们来说至关重要。他们是否在寻找产品推荐?想知道一件商品是否能运送到他们的目的地?还是他们只是在打招呼?有了这些信息,我们就可以帮助商家优先考虑那些可以转化为销售的回复,并指导我们的产品团队下一步要建立什么功能。然而,由于Shopify收件箱中每月有数百万条独特的信息交流,这将是一项具有挑战性的自然语言处理(NLP)任务。

Our team didn’t need to start from scratch, though: off-the-shelf NLP models are widely available to everyone. With this in mind, we decided to apply a newly popular machine learning process—the data-centric approach. We wanted to focus on fine-tuning these pre-trained models on our own data to yield the highest model accuracy, and deliver the best experience for our merchants.

不过,我们的团队并不需要从头开始:现成的NLP模型对每个人来说都是广泛可用的。考虑到这一点,我们决定采用一种新流行的机器学习过程--以数据为中心的方法。我们希望专注于在我们自己的数据上对这些预训练的模型进行微调,以产生最高的模型准确性,并为我们的商家提供最佳体验。

A merchant’s Shopify Inbox screen titled Customers that displays snippets of messages from customers that are labelled with things for easy identification like product details, checkout, and edit order.

Message Classification in Shopify Inbox

Shopify收件箱中的信息分类

We’ll share our journey of building a message classification model for Shopify Inbox by applying the data-centric approach. From defining our classification taxonomy to carefully training our annotators on labeling, we dive into how a data-centric approach, coupled with a state-of-the-art pre-trained model, led to a very accurate prediction service we’re now running in production.

我们将分享我们通过应用以数据为中心的方法为Shopif...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-16 18:02
浙ICP备14020137号-1 $访客地图$