搜索索引的优化

Modern applications commonly utilise various database engines, with each serving a specific need. At Grab Deliveries, MySQL database (DB) is utilised to store canonical forms of data, and Elasticsearch to provide advanced search capabilities. MySQL serves as the primary data storage for raw data, and Elasticsearch as the derived storage.

现代应用程序通常利用各种数据库引擎,每个引擎都为特定需求服务。在 Grab Deliveries,MySQL 数据库(DB)被用来存储典型的数据形式,Elasticsearch 则提供高级搜索功能。MySQL 作为原始数据的主要数据存储,而 Elasticsearch 则作为衍生存储。

Search data flow

Search data flow

搜索数据流

Efforts have been made to synchronise data between MySQL and Elasticsearch. In this post, a series of techniques will be introduced on how to optimise incremental search data indexing.

人们一直在努力使MySQL和Elasticsearch之间的数据同步。在这篇文章中,将介绍一系列关于如何优化增量搜索数据索引的技术。

Background

背景介绍

The synchronisation of data from the primary data storage to the derived data storage is handled by Food-Puxian, a Data Synchronisation Platform (DSP). In a search service context, it is the synchronisation of data between MySQL and Elasticsearch.

从主数据存储到衍生数据存储的数据同步是由数据同步平台(DSP)Food-Puxian处理的。在搜索服务背景下,它是MySQL和Elasticsearch之间的数据同步。

The data synchronisation process is triggered on every real-time data update to MySQL, which will streamline the updated data to Kafka. DSP consumes the list of Kafka streams and incrementally updates the respective search indexes in Elasticsearch. This process is also known as Incremental Sync.

每次实时数据更新到MySQL时都会触发数据同步过程,将更新的数据流转到Kafka。DSP消耗Kafka流的列表,并增量更新Elasticsearch中各自的搜索索引。这个过程也被称为增量同步

Kafka to DSP

Kafka到DSP

DSP uses Kafka streams to implement Incremental Sync. A stream represents an unbounded, continuously updating data set, which is ordered, replayable and fault-tolerant.

DSP使用Kafka流来实现增量同步。流代表了一个无界限的、持续更新的数据集,它是有序的、可重放的和容错的。

Data synchronisation process using Kafka

Data synchronisation process using Kafka

使用Kafka进行数据同步的过程

The above diagram depicts the process of data synchronisation using Kafka. The Data Producer creates a Kafka stream for every operation done on MySQL and sends it to Kafka in real-time. DSP creates a stream consumer for each Kafka stream and the...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-16 02:51
浙ICP备14020137号-1 $访客地图$