HomeAbout UsNews, company reports and blogBlogTechElasticsearch percolation to match new real estate listings against saved searches

Elasticsearch percolation to match new real estate listings against saved searches

David Kemp

26th Sep 2022 — Read in 6 mins

The realestate.com.au website and mobile applications are used by 12.7 million people on average each month in their search for property. Users can save their search filters so that they may easily repeat searches later. Once a user has saved a search, they have the option to receive daily notifications of new listings that match their search criteria. Supporting this involves matching the thousands of new listings created every day against millions of saved searches. In this article I explain how we use Elasticsearch’s “percolation” feature to help us do this.

But first, here are some numbers that give an indication of the scale of the problem:

Our users have millions of saved searches, and it is constantly growing.
During the peak hours of most weekdays, real estate agencies are creating hundreds of new listings per hour.
On a typical weekday we generate millions of notifications.
Notifications are collated at 4pm every day and are all dispatched within two hours.

Implementation options

We considered two main approaches to this problem:

Execute all the searches once per day at 4pm. That is, execute each of the 4.6 million saved searches restricted to the listings created since 4pm the previous day. The main reason we didn’t choose this approach is that we anticipate that one day we will want to give users the option of being notified within five minutes of a new listing being created that matched their search. If this feature proved to be popular, then we could end up having to execute an infeasibly large number of saved searches every five minutes.
Or, the approach we chose: after each new listing is created, find all the saved searches that match the new listing. For every new listing, we need to:
- perform a search across all 4.6 million saved searches to find those saved searches that match the new listing
- record the matches somewhere (we use tables in a Postgres database)
- and at 4pm each day collate the matches and send them out to the corresponding users.

We could have stored the saved searches as regular JSON objects in an Elasticsearch index and used ordinary Elasticsearch queries to find saved searches that match new listings. However, this would have required us to implement matching logic that would need to be consistent with the user search experience, with a significant risk of them being inconsistent with each other. Also, every time we add support for a new filter in our search experience, we would need to implement the corresponding logic in the matching system. For example, at the time writing, a team is implementing the ability for user searches to exclude listings that are to be auctioned – if our saved search matching system was using a conventional approach, then we would need to implement the logic for excluding auctions in two systems: the search system supporting our search experience and our saved search matching system.

These issues convinced us to use Elasticsearch’s percolation feature. As explained below, the Elasticsearch query representing the saved search can be identical to, and generated by the same system as, the one that performs the actual listings search.

What is Percolation?

Most readers familiar with Elasticsearch will be familiar with its conventional usage:

store JSON documents in an Elasticsearch index
use an Elasticsearch query to find matching saved documents

The percolation feature of Elasticsearch allows you to do the reverse:

store Elasticsearch queries in an Elasticsearch index
use a JSON document to find matching saved queries

For example, suppose a user saves a search for listings in Melbourne that have between 1 and 2 bedrooms, and they give the search a name like “My Melbourne Search”. Assuming the user has a user-id of “1234”, we create a document that looks bit like the following and save it to an Elasticsearch index called “saved-search”:

PUT /saved-search/_doc
{
"userId": "1234",
"savedSearchName": "My Melbourne Search",
"query": {
  "bool": {
    "filter": [
      {"term": {"suburb": "Melbourne VIC 3000"}},
      {"range": {"bedrooms": {"gte": 1, "lte": 2}}}
    ]
}
…

Notice how the contents of the “query” field is a query expressed using the Elasticsearch query DSL. In our architecture (described further below), the Elasticsearch query that we store is generated by the same service that is used to perform the actual listings search – this allows us to save the exact same Elasticsearch query that is used when the user performs the search.

For this to work, the index mapping for “saved-search” must declare the “query” field to be of type “percolator”. So, the index mapping for saved searches looks something like this:

PUT /saved-search
{
 "mappings": {
     "properties": {
         "userId": {"type": "keyword"},
         "savedSearchName": {"type": "text"},
         "query": {"type": "percolator"}, ⬅
  …
}

What isn’t shown here is that the index mapping must include the mappings for all the fields to which the query refers. This means that the mappings we use in our listings index to support listings search need to also be applied to the saved search index. So, the index mapping for saved searches looks more like this:

{
 "mappings": {
     "properties": {
         "userId": {"type": "keyword"},
         "savedSearchName": {"type": "text"},
         "query": {"type": "percolator"},
         "suburb": {"type": "keyword"},
         "bedrooms": {"type": "integer"},
         "bathrooms": {"type": "integer"},
   …
}

When a real estate agent creates a new listing, we not only save the listing in a listing index (so that it can be found by regular searches), but soon afterwards we also perform a search to find all the saved searches that match the new listing. The search is quite simple as it involves taking the document representing the listing and using it in a percolate query.

So, for example, if a real estate agent creates a listing in Sydney that has three bedrooms and two bathrooms, then the listing document might look like this:

{
    "suburb": "Sydney NSW 2000",
    "bedrooms": 3,
    "bathrooms": 2,
    "price": 2100000,
…
}

To find the saved searches that match the new listing, the search looks a bit like this:

GET /saved-search/_search
{
   "query": {
       "percolate": {
           "field": "query",
               " document ": {
                    "suburb": "Sydney NSW 2000",
                    "bedrooms": 3,
                    "bathrooms": 2,
                    "price": 2100000,
                 …
               }
       }
…

Architecture

Here is a simplified view of the architecture.

It shows how a user (far left) can search for listings and save a search.

When the user saves a search, that search makes its way through the Saved Search system to the Saved Search Feeder, which in turn uses a special endpoint to request that the Listing Search API provide it with the exact same Elasticsearch query that the Listing Search API would have used to perform the listings search. That Elasticsearch query is then stored in the Saved Search Elasticsearch index.

The Saved Search Matcher polls the Listings Search System for new listings and percolates each new listing against the Saved Search Elasticsearch index. Any matches it finds are stored in the Match Store. At 4pm every day the matches are collated and sorted before being sent as daily notifications to the corresponding users (not shown).

Pros and cons

The big benefit of using percolation is that the various business rules that are encoded in the Elasticsearch query used when a user performs a search do not need to be re-implemented when matching listings against saved searches. As a consequence of this:

we avoid the risk of the business rules used when performing live searches diverging from those used when matching listings against saved searches.
whenever a new filter is supported by our search interface, very little effort is required to ensure that we can support saved searches using the new.

Possibly the biggest flaw with our approach is that the saved search matching system is extremely coupled to the main listings search system:

if a new type of filter requires a new attribute mapping be added to the listing Elasticsearch index, then the same attribute mapping needs to be added to the saved search Elasticsearch index before any searches using the new filter are saved.
whenever fields are renamed, or the query structure is changed, it needs to be done in a multi-step manner to ensure existing saved searches continue to work before they can be refreshed.

Another drawback to our approach is that there are some limitations to the use of the Percolation feature, such as the lack of support for parent/child queries. Fortunately, none of the limitations apply to our queries.

Closing comments

Our saved search match and notification system has been successfully using the Elasticsearch percolation feature for several months now. It is performing well and is an improvement on the old system – previously, users were prevented from enabling notifications for certain types of searches (e.g. map searches and searches with keyword filters). With the new system this restriction no longer exists; you can now enable notifications for any listings search you can perform.

We have already benefited from the way that no additional work is required to support matching on new filters supported by the search – a new filter is currently being added to support users that don’t want to see listings going to auction, and no additional coding will be required for the saved search matching system to support this filter.

We have also extended the system to support REA’s “Coming Soon” product: users get separate notifications for listings that agents have indicated are soon to go to market.