Operating Elasticsearch in Kubernetes
如果无法正常显示,请先停止浏览器的去广告插件。
        
                1. Operating
Elasticsearch in
Kubernetes
MIKKEL LARSEN
@mikkeloscar
OLIVER TROSIEN
@otrosien
2019-04-02            
                        
                2. WHO ARE WE?
Oliver
Core Search Services
Elasticsearch
2
Mikkel
Cloud Infrastructure
Kubernetes            
                        
                3. “EUROPE’S LEADING ONLINE FASHION PLATFORM”
3            
                        
                4. WE BRING FASHION TO PEOPLE IN 17 COUNTRIES
17 markets
7 fulfillment centers
26 million active customers
5.4 billion € revenue 2018
250 million visits per month
15,000 employees in Europe
4            
                        
                5. SEARCH @ ZALANDO
5
300k+ 45%
Products
per country Mobile Traffic
~2000
Brands 12K
QPS
~700
Categories 8K
Updates/s            
                        
                6. DATA INGESTION
Stock
availability
Fashion
Tags
Scoring
Article
Content
Elasticsearch
Pricing &
Campaigns
Kafka / Nakadi *
* nakadi.io
6
Image
Tags
Color
Enrichment            
                        
                7.             
                        
                8. KUBERNETES @ ZALANDO
8
Default
Deployment
Target 114
clusters
1400~
nodes Node
Autoscaling
Since
Oct 2016 From v1.4
to v1.12            
                        
                9. Elasticsearch
2.500 vCPUs
1 TB RAM
Elasticsearch in Kubernetes            
                        
                10. RUNNING ELASTICSEARCH IN KUBERNETES
1. Safe automatic updates
(Including Kubernetes cluster updates)
2. Advanced auto-scaling for cost efficiency
10            
                        
                11. UPDATING ELASTICSEARCH (STATEFULSET)
1) PreStop Hook (bash script)
● Exclude node in ES
● Wait for node to drain (up to
1h)
● Data is moved to existing
nodes
2) PostStart Hook (bash script)
● Remove all excludes
● Let ES rebalance from existing
nodes
ES Pod ES Pod
ready ready
Node Node
ES Pod ES Pod
terminating
ready ready
Node Node
draining
11            
                        
                12. ELASTICSEARCH DATA SETS
apiVersion: zalando.org/v1
kind: ElasticsearchDataSet
metadata:
name: test-cluster
spec:
scaling:
{...}
replicas: 3
template: # PodTemplate
{...}
volumeClaimTemplates:
{...}
12
github.com/zalando-incubator/es-operator            
                        
                13. ELASTICSEARCH DATA SETS
ES
Operator
ES
Data
ES
Data
ES
Data
ES Cluster
ES
Data
ES
Data
ES
Data
ES
Master
ES
Master
ES
Data
ES
Data
ES
Data
ES
Master
13
github.com/zalando-incubator/es-operator            
                        
                14. UPDATING ELASTICSEARCH (OPERATOR)
2) Drain node
ES Pod ES Pod
ready ready
Node Node
ES Service
1) Scale out by 1
draining
ES
Operator
ES Pod ES Pod
ready ready
Node Node
3) Delete Pod
draining
github.com/zalando-incubator/es-operator
14            
                        
                15. SCALING UP ELASTICSEARCH (1)
METRICS
Thresholds
● CPU
● Duration
● Cooldown
ES 6 Pod
shards
ready
Node
ES 3 Pod
shards
ES 3 Pod
shards
ready
ready
Node
Node
Increase pod replicas
15
Boundaries
● Max # Pod replicas
ES 2 Pod
shards
ES 2 Pod
shards
ES 2 Pod
ready
shards
ready
Node
ready
Node
Node
ES 2 Pod
ES 2 Pod
shards
ES 2 Pod
shards
shards
ES 2 Pod
ready
ready
shards
ES 2 Pod
Node ready shards
ES 1 Pod
Node ready
Node shard
ready
Node ready
Node
Node            
                        
                16. SCALING DOWN ELASTICSEARCH
METRICS
Thresholds
● CPU
● Duration
● Cooldown
ES 2 Pod
ES 2 Pod
shards
ES 2 Pod
shards
shards
ES 2 Pod
ready
ready
shards
ES 2 Pod
Node ready shards
ES 1 Pod
Node ready
Node shard
ready
Node ready
Node
Node
16
ES 2 Pod
shards
ES 2 Pod
shards
ES 2 Pod
ready
shards
ready
Node
ready
Node
Node
DON
’
WHE T OPERA
N CL
T
IS N
UST E
O
ER
T GR
Boundaries
EEN
!
● Min # Replica
● Max # Shards per node
● Max disk usage (%)
ES 3 Pod
shards
ES 3 Pod
shards
ready
ready
Node
Node
Decrease Pod replicas
ES 6 Pod
shards
ready
Node            
                        
                17. SCALING UP ELASTICSEARCH (2)
METRICS
Thresholds
● CPU
● Duration
● Cooldown
ES 1 Pod
shard
ready
Node
ES 3 Pod
shards
ES 1 Pod
shard
ready
ready
Node
Node
Increase index replicas
17
Boundaries
● Min # Shards per node
● Max # Pod replicas
ES 2 Pod
shards
ES 2 Pod
shards
ES 1 Pod
ready
shard
ready
Node
ready
Node
Node
ES 2 Pod
ES 2 Pod
shards
ES 2 Pod
shards
shards
ES 1 Pod
ready
ready shard
Node ready
Node ready
Node
Node            
                        
                18. DEMO
18            
                        
                19. OPEN SOURCE
ES Operator
github.com/zalando-incubator/es-operator
microXchg demo
github.com/otrosien/microxchg19-demo
Nakadi
nakadi.io
Kubernetes on AWS
github.com/zalando-incubator/kubernetes-on-aws
19            
                        
                20. MIKKEL LARSEN
mikkel.larsen@zalando.de
@mikkeloscar
OLIVER TROSIEN
oliver.trosien@zalando.de
@otrosien
2019-04-02