在Kubernetes中无法自动缩放Elasticsearch
Introduction
介绍
In Lounge by Zalando, we run an Elasticsearch cluster in Kubernetes to store user facing article descriptions. Our business model is such that we receive about three times the normal load during the busy hour in the morning and therefore we use schedules to automatically scale in and out applications to handle that peak. If scaling out in the morning fails, we face a potential catastrophe. This is a story of one such case.
在Zalando的Lounge中,我们在Kubernetes中运行一个Elasticsearch集群,用于存储面向用户的文章描述。我们的业务模型是,在早上繁忙的时间段,我们的负载大约是正常负载的三倍,因此我们使用计划任务自动缩放应用程序来处理这个高峰期。如果早上的扩容失败,我们将面临潜在的灾难。这是一个这样的案例。
First anomaly
第一个异常
Early Tuesday morning, our on-call engineer received an alert about too few running Elasticsearch nodes. We started executing the playbook to handle such a case, but before we had time to go through all the steps, the missing nodes popped up and the alert closed on its own. Catastrophe avoided for now, but after a cup of coffee, follows the root cause analysis.
星期二早上,我们的值班工程师收到了有关运行的Elasticsearch节点过少的警报。我们开始执行处理此类情况的playbook,但在我们有时间完成所有步骤之前,缺失的节点出现了,警报自动关闭了。暂时避免了灾难,但喝杯咖啡后,我们进行了根本原因分析。
Investigating the logs it turned out that the cluster had failed to fully scale down for the night. The cluster was configured to run 6 nodes during the night, but it got stuck running 7 nodes.
通过调查日志,发现集群在夜间未能完全缩减。集群在夜间配置为运行6个节点,但卡在了运行7个节点。
To understand why that happened and why it is interesting, a little bit of context is required. We run Elasticsearch in Kubernetes using es-operator. Es-operator defines a Kubernetes custom resource, ElasticsearchDataSet (EDS), that describes the Elasticsearch cluster. It monitors changes to it and maintains a StatefulSet that consists of pods and volumes that implement the Elasticsearch nodes. We’ve configured our cluster so that the pods running it are spread across all AWS availability zones, and Elasticsearch is configured to spread the shards across the zones.
要理解为什么会发生这种情况以及为什么这很有趣,需要一些背景知识。我们使用es-operator在Kubernetes中运行Elasticsearch。Es-operator定义了一个Kubernetes自定义资源,ElasticsearchData...