用深度学习进行语义问题匹配

Authors: Lili Jiang, Shuo Chang, and Nikhil Dandekar

作者丽丽、张硕Nikhil Dandekar

In order to build a high-quality knowledge base, it's important that we ensure each unique question exists on Quora only once. Writers shouldn't have to write the same answer to multiple versions of the same question, and readers should be able to find a single canonical page with the question they're looking for. For example, we'd consider questions like “What are the best ways to lose weight?”, “How can a person reduce weight?”, and “What are effective weight loss plans?” to be duplicate questions because they all have the same intent. To prevent duplicate questions from existing on Quora, we've developed machine learning and natural language processing systems to automatically identify when questions with the same intent have been asked multiple times.

为了建立一个高质量的知识库,我们必须确保每个独特的问题在Quora上只存在一次。撰稿人不应该为同一个问题的多个版本写相同的答案,而读者应该能够找到一个带有他们正在寻找的问题的单一规范页面。例如,我们会认为像 "减肥的最佳方法是什么?"、"一个人怎样才能减轻体重?"和 "有效的减肥计划是什么?"这样的问题是重复的问题,因为它们都有相同的意图。为了防止Quora上存在重复的问题,我们已经开发了机器学习和自然语言处理系统,以自动识别具有相同意图的问题被多次提出。

We recently released a public dataset of duplicate questions that can be used to train duplicate question detection models like the one we use at Quora. In this post, we'll give you a sense of what's possible with our duplicate question dataset by outlining a few deep learning explorations we pursued in a recent hack week.

我们最近发布了一个重复问题的公共数据集,可以用来训练重复问题检测模型,比如我们在Quora使用的模型。在这篇文章中,我们将通过概述我们在最近的黑客周中进行的一些深度学习探索,让你感受到我们的重复问题数据集的可能性。

We're also excited to announce a meetup event for NLP and Machine Learning enthusiasts that we'll be hosting at the Quora office in Mountain View, CA on the evening of February 27. We have a couple of exciting speakers lined up: Ben Hamner, co-founder and CTO of Kaggle will talk about “Kaggle Competitions and Reproducible Machine Learning”, and Xavier Amatriain, VP of Engineering at Quora, will be giving a talk entitled “Machine Learning and NLP at Quora”. If you're interested in attending, please apply to join here.

我们也很高兴地宣布,我们将于2月27...

开通本站会员,查看完整译文。

- 위키
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-09 04:58
浙ICP备14020137号-1 $방문자$