Modernizing Nextdoor Search Stack — Part 2
In our last blog post of the Modernizing Nextdoor Search Stack series, we explained the Query Understanding and the ML models that power our Query Understanding Engine. We also covered the nuances of the Search at Nextdoor and what it takes to understand the customer intent. This time, we will be focusing on the retrieval of the search results and ranking.
Retrieval
Retrieval of information can take many forms. Users can express their information needs in the form of a text query — by typing into a search bar, by selecting a query from autocomplete, or in some cases a query may not even be explicit. Retrieval can involve ranking existing pieces of content, such as documents or short-text answers, or composing new responses incorporating retrieved information. At Nextdoor, we work on the information retrieval given the features that we capture or infer from the Query Understanding stage.
The Query Understanding stage provides us with rich data about the customer intent and context. Query Understanding metadata consists of raw information we get from the user such as device, location, time of the day, day of the week, query itself, expanded queries, embedded version of the query, intent for the query, predicted vertical, and predicted topic of the query, to name a few.
We combine all of this information in the form of a query that we use for the recall. Our underlying retrieval engine is Elasticsearch. Considering the scale of Nextdoor and the amount of data that we produce each day, we need to ensure that the system that we build meets latency requirements. For that purpose, data for our verticals is split into multiple indices and redistributed across Elasticsearch clusters. High-level overview of our search indices can be viewed through the following diagram.
Nextdoor Search Recall
All of our reads and writes to Elasticsearch are run through the homegrown Nextdoor Elasticsearch Service (NESS). Our engine uses rich metadata from Query Understanding mentioned above, and for each vertical that we support we will construct comprehensive Elasticsearch queries that will be responsible for the retrieval. Respective queries are fanned out to the respective clusters to get candidates that we will use for ranking. In order to scale operations of the Elasticsearch cluster, at this phase, we operate only with respective entity ids. In other words, retrieval will return only entity ids.
Ranking
Once we get the results from the retrieval engine, ranking of the candidates is suboptimal. Hence, we introduce a ranking stage in order to optimize search results based on the document features and customer preferences.
When engineers introduce ML ranking stages in existing systems, complexity of the system increases. Newly introduced stages require that the fetching of the features and inferences is extremely performant. The ranking features for the entities need to be at our disposal with minimal impact on performance of the search. For this, we leverage the Real-Time version of our Feature Store. Feature retrieval happens at the stage that happens before we run inference. We call a Feature Store to get respective raw, statistical features, or embeddings representations of the user, hoods, documents and user document interactions.
The Nextdoor platform was built more than 10 years ago. Since then, it has collected tons of data, which means the search team has a lot of data to train our model on. When it comes to ranking, our team AB tested ML ranking by leveraging various features and ranking algos.
To order the search results, we must find the best order for the list of documents for a given user, otherwise known as the ranking problem. The ranking problem is defined as a derivation of ordering over a list of examples that maximizes the utility of the entire list.
For this purpose, we used Learning To Rank (LTR). LTR attempts to learn a scoring function that maps example feature vectors to real-valued scores from labeled data, documents D = {d1, d2, .., dn} and respective input features Xi where i is between 1 and n.
Given the list, the aim of the LTR is to find an optimal scoring function F such that loss over the objective function is minimal. Features of the documents are both traditional and deep.
To find the right discrimination function between positive and negative samples we use a surrogate problem, which is how to fit the hyperplane between positive and negative training samples. However, loss function and real-world evaluation metrics are not easily comparable. Bridging the gap between evaluation metrics and loss functions has been studied actively in the past. LambdaRank, or its tree-based variant LambdaMART, has been one of the most effective algorithms to incorporate ranking metrics in the learning procedure.
The basic idea is to dynamically adjust the loss during the training based on ranking metrics.
Image from appliedmachinelearning.blog
At Nextdoor, we leverage LGBMRanker implementation from LightGBM. During the offline and online evaluation, this model showcased as a great candidate given the tradeoff between accuracy and latencies.
Final thoughts
Working at search is very exciting, there are many challenges. From scaling the retrieval infrastructure and building ML model inferences, to the ever-changing nature of search at Nextdoor. During this work we managed to bring an amazing group of people together to build a better search experience for our neighbors, with the overall goal to build a kinder world.
At the same time, there are many interesting challenges ahead of us with regards to scaling our infrastructure, understanding our customers better, introducing new stages of the search ranking, like multi-objective optimization and reinforcement-learning, just to name a few.
If you find any of these topics interesting and you would like to join our team, please visit our careers page to find our open roles.