Semantic Question Matching with Deep Learning
In order to build a high-quality knowledge base, it's important that we ensure each unique question exists on Quora only once. Writers shouldn't have to write the same answer to multiple versions of the same question, and readers should be able to find a single canonical page with the question they're looking for. For example, we'd consider questions like “What are the best ways to lose weight?”, “How can a person reduce weight?”, and “What are effective weight loss plans?” to be duplicate questions because they all have the same intent. To prevent duplicate questions from existing on Quora, we've developed machine learning and natural language processing systems to automatically identify when questions with the same intent have been asked multiple times.
We recently released a public dataset of duplicate questions that can be used to train duplicate question detection models like the one we use at Quora. In this post, we'll give you a sense of what's possible with our duplicate question dataset by outlining a few deep learning explorations we pursued in a recent hack week.