弹性搜索(Elasticsearch)的渗入,将新的房地产房源与保存的搜索进行匹配
The realestate.com.au website and mobile applications are used by 12.7 million people on average each month in their search for property. Users can save their search filters so that they may easily repeat searches later. Once a user has saved a search, they have the option to receive daily notifications of new listings that match their search criteria. Supporting this involves matching the thousands of new listings created every day against millions of saved searches. In this article I explain how we use Elasticsearch’s “percolation” feature to help us do this.
realestate.com.au网站和移动应用程序平均每月有1270万人在搜索房产时使用。用户可以保存他们的搜索过滤器,以便他们以后可以轻松地重复搜索。一旦用户保存了搜索,他们可以选择每天接收符合其搜索条件的新房源通知。支持这一点需要将每天创建的数以千计的新房源与数以百万计的已保存的搜索进行匹配。在这篇文章中,我解释了我们如何使用Elasticsearch的 "渗滤 "功能来帮助我们做到这一点。
But first, here are some numbers that give an indication of the scale of the problem:
但首先,这里有一些数字,可以说明问题的规模。
- Our users have millions of saved searches, and it is constantly growing.
- 我们的用户有数以百万计的保存搜索,而且还在不断增加。
- During the peak hours of most weekdays, real estate agencies are creating hundreds of new listings per hour.
- 在大多数工作日的高峰时段,房地产机构每小时要创建数百个新的房源。
- On a typical weekday we generate millions of notifications.
- 在一个典型的工作日,我们产生了数百万的通知。
- Notifications are collated at 4pm every day and are all dispatched within two hours.
- 通知在每天下午4点进行整理,并在两小时内全部发送完毕。
Implementation options
实施方案
We considered two main approaches to this problem:
我们考虑了解决这一问题的两个主要方法。
- Execute all the searches once per day at 4pm. That is, execute each of the 4.6 million saved searches restricted to the listings created since 4pm the previous day. The main reason we didn’t choose this approach is that we anticipate that one day we will want to give users the option of being notified within five minutes of a new listing being created that matched their search. If this feature proved to be popular, then we could end up having to execute an infeasibly large number of saved searches every five minutes.
- 每天下午4点执行一次所有的搜索。也就是说,执行460万次保存的搜索中的每一次,仅限于前一天下午4点以来创建的房源。我们没有选择这种方法的主要原因是,我们预计有一天我们会想让用户选择在5分钟内得到符合他们搜索的新房源的通知。如果...