Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement Learning

How Airbnb leverages machine learning and reinforcement learning techniques to solve a unique information retrieval task in order to provide guests with unique, affordable, and differentiated accommodations around the world.

By: Dillon Davis, Huiji Gao, Thomas Legrand, Weiwei Guo, Malay Haldar, Alex Deng, Han Zhao, Liwei He, Sanjeev Katariya

Introduction

Airbnb has transformed the way people travel around the globe. As Airbnb’s inventory spans diverse locations and property types, providing guests with relevant options in their search results has become increasingly complex. In this blog post, we’ll discuss shifting from using simple heuristics to advanced machine learning and reinforcement learning techniques to transform what we call location retrieval in order to address this challenge.

The Challenge of Location Retrieval

Guests typically start searching by entering a destination in the search bar and expect the most relevant results to be surfaced. These destinations can be countries, states, cities, neighborhoods, streets, addresses, or points of interest. Unlike traditional travel accommodations, Airbnb listings are spread across different neighborhoods and surrounding areas. For example, a family searching for a vacation rental in San Francisco might find better options in nearby cities like Daly City, where there are larger single-family homes. Thus, the system needs to account for not just the searched location but also nearby areas that might offer better options for the guest. This is evidenced by the locations of booked listings when searching for San Francisco shown below.

Given Airbnb’s scale, we cannot rank every listing for every search. This presented a challenge to create a system that dynamically infers a relevant map area for a query. This system, known as location retrieval, needed to balance including a wide variety of listings to appeal to all guests’ needs while still being relevant to the query. Our search ranking models can then efficiently rank the subset of our inventory that is within the relevant map area and surface the most relevant inventory to our guests. This system and more is outlined below

Starting with Heuristics: The Cold Start Problem

Initially, Airbnb relied on heuristics to define map areas based on the type of search. For example, if a guest searched for a country, the system would use administrative boundaries to filter listings within that country. If they searched for a city, the system would create a 25-mile radius around the city center to retrieve listings.

Improving these heuristics proved to be profoundly impactful. One such example is the introduction of a log scale parameterized smooth function to compute an expansion factor for the diagonal size of the administrative bounds of the searched destination. We applied this for very precise locations like addresses, buildings, and POI’s resulting in a 0.35% increase in uncancelled bookers on the platform when tested in an online A/B experiment against the baseline heuristics. Figures below demonstrate how search results for a building in Ibiza, Spain improved dramatically with this heuristic by surfacing significantly more and higher quality inventory.

These heuristics were simple and worked well enough to start, but they had limitations. They couldn’t differentiate between different types of searches (e.g., a family looking for a large home versus a solo traveler looking for a small apartment), and they didn’t adapt well to new data as Airbnb’s inventory and guest preferences evolved.

Exploring Statistics to Help Improve Location Retrieval

With more data available over time from these intuition based heuristics, we thought there might be a way to take advantage of this historical user booking behavior to improve location retrieval. We built a dataset for each travel destination that recorded where guests booked listings when searching for that destination. Based on this data, the system could create retrieval map areas that included 96% of the nearest booked listings for a given destination.

We tested these newly constructed retrieval map areas in lieu of the intuition based heuristics outlined above based on the hypothesis that it would provide guests a more bookable selection of inventory. While this statistical approach was more aligned with guest booking behavior, it still had limitations. It treated all searches for a location the same, regardless of specific search parameters like group size or travel dates. This uniform approach meant that some guests might not see the best listings for their particular needs. As a result, this statistics based method had no detectable increase in uncancelled bookers on the platform when tested against the heuristics outlined above in an online A/B experiment. This led us to believe that location retrieval may require more advanced techniques such as machine learning.

Advancing to Machine Learning

Instead of only relying on past booking data, the new system could learn from various search parameters, such as the number of guests and stay duration. By analyzing this data, a model could predict more relevant map areas for each search, rather than applying a one-size-fits-all approach.

For example, a group of ten travelers searching for a San Francisco vacation rental might prefer larger homes in the suburbs, while solo travelers might prioritize central locations. The machine learning model could distinguish between these different preferences and adjust the retrieval map areas accordingly, providing more tailored results.

We constructed our machine learning model in the following manner. This is a result of three iterations that introduced the machine learning model, expanded its feature set, and expanded search attribution. The architecture is depicted in the figure below.

Training Examples: Searches issued by a booker by entering a destination in the search bar or manipulating the map that contained the booked listing in their search results on the same day or one day before the booking. We discard any bookings that are canceled 7 days after booking.
Training Features: We derive features directly from the search request such as location name, stay length, number of guests, price filters, location country, etc. There are 9 continuous features and 19 categorical features in total.
Training Labels: The latitude and longitude coordinates of the booked listing attributed to the search
Architecture: A two layer neural network of size 256 was chosen in order to have more flexibility for loss formulation compared to traditional regression and decision tree based approaches.
Model Output: 4 floats that define the latitude and longitude offsets from the center latitude and longitude coordinates of the searched destination that represent the relevant map area.
Loss: Trained to predict map areas that contain their associated booked listing while minimizing the size of the predicted map area and the occurrence of predictions that cannot construct a valid rectangular map area.

The machine learning system increased the recall of booked listings (i.e., how often the system retrieved a listing that was eventually booked) by 7.12% and reduced the size of the retrieval map area by 40.83%. It had a cumulative impact of +1.8% in uncancelled bookers on the platform. The initial model was evaluated against the baseline and each subsequent model iteration was evaluated against the preceding outgoing model.

Figures below demonstrate how search results for a specific street in Lima, Peru improved dramatically with the model by surfacing results that are much closer to the searched street.

Before

After

Exploring New Frontiers with Reinforcement Learning

While machine learning improved the system’s ability to differentiate search results, there was still room for improvement, particularly in learning whether locations that had never been surfaced before were relevant to guests for a search. To address this, Airbnb introduced reinforcement learning to the location retrieval process.

Reinforcement learning allowed the system to continuously learn from guest interactions by surfacing new areas for a given destination and adjusting the retrieval map area based on guest booking behavior. This approach, known as a contextual multi-armed bandit problem, involved balancing exploration (surfacing new locations) with exploitation (surfacing previous successful locations). The system could actively experiment with different retrieval map areas learning from guest bookings to refine its predictions.

Applying a contextual multi-armed bandit traditionally requires defining an active contextual estimator, a method for uncertainty estimation, and an exploration strategy. We took the following approach given product constraints, system constraints, and the nature of our model formulation. The architecture is depicted in the figure below.

Active contextual estimation: We employed our existing machine learning model for location retrieval retrained on a daily basis to regularly learn from any new bookings data that we collect while surfacing previously unshown locations.
Uncertainty estimation: We modified our model architecture with a random dropout layer to generate 32 unique predictions for a given search (Monte Carlo Dropout). This allows us to measure the mean and standard deviation of our prediction while minimizing negative impact to system performance and changes to our existing model formulation.
Exploration Strategy: We compute an upper confidence bound using the mean and standard deviation of our prediction in order to construct larger retrieval map areas based on the model’s confidence in its prediction for the search.

This system successfully explored more for less-traveled locations where it was less confident and explored less for locations that are often searched and booked. For example, pictured below are the mean (inner) and upper confidence bound (outer) estimates of retrieval map areas for San Francisco, CA (left) and Smith Mountain Lake, Virginia (right). San Francisco is searched almost 25x more than Smith Mountain Lake with proportionately more bookings as well. As a result, the model is more confident in its retrieval map area estimate for San Francisco vs Smith Mountain Lake resulting in 2–3x less exploration for San Francisco queries vs Smith Mountain Lake.

The reinforcement learning system was also tested against the outgoing machine learning model in online A/B experiments showing a cumulative 0.51% increase in uncanceled bookers and 0.71% increase in 5 star trip rate over two iterations that introduced reinforcement learning and optimized scoring of the more complex model.

Conclusion: A Transformative Journey

Airbnb’s journey from simple heuristics to sophisticated machine learning and reinforcement learning models demonstrates the power of data-driven approaches in transforming complex systems. By continually iterating and improving its location retrieval process, Airbnb has not only enhanced the relevance of its search results but also helped guests experience more 5 star trips.

This transformation cumulatively results in a 2.66% increase in uncanceled bookers — a major achievement for a company operating at Airbnb’s scale. More details can be found in our technical paper. As Airbnb continues to innovate, we are continuously evaluating and introducing more advanced features and retrieval mechanisms like retrieving with complex polygons . These will further refine and enhance the search experience for millions of guests worldwide.

If this type of work interests you, check out some of our related positions and more at Careers at Airbnb!

****************

All product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.