中间件与数据库：Elasticsearch的相关资料

一个线上问题引发的思考——Elasticsearch 8.X 如何实现更精准的检索？

满足用户要求的“精准”才算是精准。

基于 MySQL Binlog 的 Elasticsearch 数据同步实践

用 go-mysql-elasticsearch 实现数据同步的本地化实践。

马蜂窝技术

Elasticsearch percolation to match new real estate listings against saved searches

The realestate.com.au website and mobile applications are used by 12.7 million people on average each month in their search for property. Users can save their search filters so that they may easily repeat searches later. Once a user has saved a search, they have the option to receive daily notifications of new listings that match their search criteria. Supporting this involves matching the thousands of new listings created every day against millions of saved searches. In this article I explain how we use Elasticsearch’s “percolation” feature to help us do this.

Elasticsearch 8.X 路径检索的企业级玩法

Elasticsearch 8.X 路径检索搞不定，怎么办？

Elasticsearch 为什么会产生文档版本冲突？如何避免？

一篇讲透 Elasticsearch 文档版本冲突。

Elasticsearch 8.X 检索实战调优锦囊 001

Elasticsearch 检索响应慢，怎么办？

说说 Elasticsearch filter 和 post_filter 的区别？

能否说说 filter 和 post_filter 的区别？

Elasticsearch 删除重复文档实现方式，你知道几个?

Elasticsearch 8.X 删除重复文档 Python 实现。

记一次Elasticsearch问题排查

我们团队基于Elasticsearch开发了一款将数据从数据库实时同步至Elasticsearch的工具——搜索平台，其实现方式主要是通过flink将数据库中已有的存量数据导入Elasticsearch，并订阅数据表的binlog，将实时改动也同步至Elasticsearch。

AIoT团队在搜索平台上维护了一个较大的索引，其写入平均有2k到3k的tps，查询也有数百QPS。由于该索引较重要且占用资源较多，因此使用Elasticsearch的template功能将之单独部署在专用的机器上。

从5月底开始，写入此索引的flink实时任务就会偶现失败重启的情况，经排查，发现是写入Elasticsearch的请求超时导致的，结合当时机器的cpu占用等指标判定是写入tps过高导致Elasticsearch无法承受，因此，将该索引所占的机器从2台升级到3台，并使用业务数据进行了一轮写入压测，发现能支撑业务方的写入速率，扩完后较长一段时间内，该索引也一直没有出现问题，因此认为问题已经被解决了。

哈啰技术

Elasticsearch 获取两个索引数据不同之处的四种方案

假定有两个索引 index1、index2，这两个索引中有大量相同数据。类似：linux 下的 diff 命令的操作，找出一个索引中存在而在另外一个索引不存在的数据。

Elasticsearch 如何实现时间差查询？

es能通过两个字段差值进行查询吗？类似 select * from myindex where endtimes- starttime > 10 这种？

Elasticsearch Java 客户端演进历史和选型指南

Elasticsearch java 客户端各个版本的发展演进历史、选择哪个、如何选择是本篇文章要解决的问题。

通过 Goyacc 构建 Elasticsearch Querystring 解析器 - 领域特定语言语法分析实践

领域特定语言（DSL），如 SQL、Elasticsearch Querystring 等，往往是为专门的目的设计的。在特定的任务中，DSL 通过在表达能力上做的妥协换取在某一领域内的高效。

在飞书套件日志系统的私有化研发过程中，为了符合研发同学查询日志的习惯，尝试使用 Elasticquery Querystring（下简称为 Querystring）作为过滤器的查询条件语句，由此需要可用的 Golang Querystring 解析器。由于目前开源界无法找到完善的实现，尝试使用 Goyacc 自行构建。

字节跳动技术

How Netflix Content Engineering makes a federated graph searchable (Part 2)

In a previous post, we described the indexing architecture of Studio Search and how we scaled the architecture by building a config-driven self-service platform that allowed teams in Content Engineering to spin up search indices easily.

This post will discuss how Studio Search supports querying the data available in these indices.

netflix技术

Elasticsearch 有没有比 reindex 更轻量级的更换字段类型的方式？

convert ingest 预处理 + reindex 结合方案。

Modernizing Nextdoor Search Stack — Part 2

In our last blog post of the Modernizing Nextdoor Search Stack series, we explained the Query Understanding and the ML models that power our Query Understanding Engine. We also covered the nuances of the Search at Nextdoor and what it takes to understand the customer intent. This time, we will be focusing on the retrieval of the search results and ranking.