How Zalando runs Kubernetes clusters at scale on AWS
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1.
2. OPN211
How Zalando runs Kubernetes
clusters at scale on AWS
Henning Jacobs
Senior Principal
Zalando SE
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
3. THE EUROPEAN
ONLINE PLATFORM
FOR FASHION
3
4. ZALANDO AT A GLANCE
~ 5.4
billion EUR
> 300
million
revenue 2018
4
~ 14,000 > 80%
employees in
Europe of visits via
mobile devices
as of June 2019
visits
per
month
> 400,000
> 28 product choices
million > 2,000 17
brands countries
active customers
5. 2015: JOURNEY INTO THE CLOUD
DOCKER
DEPLOY
SSH
ACCESS
AUDIT
REPORTS
STUPS
AWS
5
FULL AWS
ACCESS
Teams have
admin access
& full
responsibility
6. 2015: ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org
ELB
Team ABC
EC2
6
*.xyz.example.org
ELB
Team XYZ
EC2
7. INFRASTRUCTURE @ ZALANDO
STUPS
(toolset around AWS)
7
Kubernetes
AWS accounts per team. Clusters per product (multiple teams).
All instances must run the same AMI. Instances are not managed by teams.
PowerUser access to Production. Hands off approach.
You build it, you run EVERYTHING. A lot of stuff out of the box.
8. 2019: SCALE
396
Accounts
140
8
Clusters
9. 2019: DEVELOPERS USING KUBERNETES
9
10. > 200
development teams
> 1100
developers
10
Platform
11. YOU BUILD IT, YOU RUN IT
The traditional model is that you take your software to the
wall that separates development and operations, and
throw it over and then forget about it. Not at Amazon.
You build it, you run it. This brings developers into
contact with the day-to-day operation of their software. It
also brings them into day-to-day contact with the
customer.
- A Conversation with Werner Vogels, ACM Queue, 2006
11
12. ON-CALL: YOU OWN IT, YOU RUN IT
When things are broken,
we want people with the best
context trying to fix things.
- Blake Scrivener, Netflix SRE Manager
12
13. GOALS
• No manual operations
• No pet clusters
• Reliability
• Autoscaling
• Latest Kubernetes
• Cost efficient
13
14. ARCHITECTURE
Pairs of clusters, each cluster in isolated account
AWS Acc. foobar-test
Cluster
foobar-test
14
AWS Acc. foobar
Cluster
foobar
15. ARCHITECTURE
CloudFormation stacks, node pools w/ self-baked Ubuntu AMI
Master
Nodes
etcd
15
Worker Nodes
16. ARCHITECTURE
https://cluster-id.example.org
AWS ELB
Master
Nodes
Worker
Nodes
16
AZ a
AZ b
AZ c
17. CLUSTER METADATA (CLUSTER-REGISTRY)
clusters:
- id: “cluster-id”
api_server_url: “https://cluster-id.example.org”
config_items:
Key: “value”
environment: “test”
region: “eu-central-1”
lifecycle_status: “ready”
node_pools:
- name: “worker-pool”
instance_type: “m5.large”
min_size: 3
max_size: 20
17
18. CLUSTER CONFIGURATION
cluster
├── cluster.yaml
# Kubernetes cluster stack
├── etcd-cluster.yaml # etcd cluster stack
├── manifests
│
├── ...
└── node-pools
# master/worker nodes
├── ...
github.com/zalando-incubator/kubernetes-on-aws
18
19. KUBERNETES CLUSTER MANIFESTS
19
github.com/zalando-incubator/kubernetes-on-aws
20. CLUSTER LIFECYCLE MANAGER (CLM)
20
github.com/zalando-incubator/cluster-lifecycle-manager
21. CLUSTER UPGRADE
FLOW
21
22. CLUSTER CHANNELS
Channel Description
Development and playground clusters 3
alpha Main infrastructure cluster (important to us) 1
beta Non-prod clusters for the rest of the org 65+
Production clusters. 65+
dev
stable
github.com/zalando-incubator/kubernetes-on-aws
22
Clusters
23. E2E TESTS ON EVERY PR
23
github.com/zalando-incubator/kubernetes-on-aws
24. E2E TESTS
✓
Conformance Tests
Upstream Kubernetes e2e conformance tests
✓
StatefulSet Tests
Rolling update of stateful sets including volume
mounting
✓
Zalando Tests (custom)
Custom tests for ingress, external-dns, PSP
etc.
24
159
2
17
25. RUNNING E2E TESTS
Testing dev to alpha upgrade
branch: alpha (base)
branch: dev (head)
node
Create Cluster
25
Control plane
Run e2e tests Delete Cluster
Control plane
Control plane
node
Control plane
node
node
Update Cluster
26. UPGRADING NODES
26
27. NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min:
Max:
Current:
Desired:
27
3
9
5
5
28. NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min:
Max:
Current:
Desired:
6
6
5
6
Set ASG size to current + 1
28
29. NAÏVE NODE UPGRADE STRATEGY
Auto Scaling Group
Min:
Max:
Current:
Desired:
6
6
6
6
Get a new instance
29
n
i
a
r
d
dr
n
i
a
30. PROBLEMS WITH THE NAÏVE STRATEGY
What about stateful applications like Postgres?
in
a
r
d
master
replica
Node
Node
replica
Postgres cluster unavailable :(
Node
30
31. STATEFUL WORKLOADS
(POSTGRES)
31
32. POSTGRES OPERATOR
evict
in
a
r
d
✓ ✘
pg
pg
role=replica
Node
Node
pg
role=replica
role=master
role=replica
role=master
Evict
pg
Node
role=replica
Node
github.com/zalando-incubator/postgres-operator
32
postgres
operator
promote
33. POSTGRES OPERATOR
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "postgres-cluster"
spec:
minAvailable: 1
selector:
matchLabels:
application: “postgres-cluster”
role: “master”
github.com/zalando-incubator/postgres-operator
33
34. ROLLING UPGRADE OF NODES
az-1a
PVs
a
r
d
az-1b
in
PreferNoSchedule
Node Pool
PreferNoSchedule
34
PVs
az-1c
PVs
PreferNoSchedule
PreferNoSchedule
35. POSTGRES OPERATOR
Application to manage
PostgreSQL clusters on
Kubernetes
>500
clusters running
on Kubernetes
35
github.com/zalando/postgres-operator
36. Elasticsearch
2.500 vCPUs
1 TB RAM
Elasticsearch in Kubernetes
github.com/zalando-incubator/es-operator/
37. SLAS FOR CLUSTER UPDATES
• Respect PodDisruptionBudgets
• Force-terminate Pods after 3 days (or 8h on test)
• Cluster updates can be blocked anytime!
zkubectl cluster-update block [+ REASON]
37
38. DEPLOY & USER
INTERFACE
38
39. APP DEPLOYMENT CONFIGURATION
├── deploy/apply
│
├── deployment.yaml
│
├── credentials.yaml # Zalando IAM
│
├── ingress.yaml
│
└── service.yaml
└── delivery.yaml
# Zalando CI/CD
39
40. APP INGRESS.YAML
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: "myapp.foo.example.org"
http:
paths:
- backend:
serviceName: "myapp"
servicePort: 80
40
41. CONTINUOUS DELIVERY PLATFORM
41
42. CDP: DEPLOY
"glorified kubectl apply"
42
43. EMERGENCY ACCESS SERVICE
Emergency access by referencing Incident
zkubectl cluster-access request \
--emergency -i INC REASON
Privileged production access via 4-eyes
zkubectl cluster-access request REASON
zkubectl cluster-access approve USERNAME
43
44. KUBERNETES WEB VIEW
kubectl get
pods,stacks,deploys,..
44
45. SEARCHING ACROSS 140+ CLUSTERS
45
codeberg.org/hjacobs/kube-web-view
46. codeberg.org/hjacobs/kube-web-view
47. UPGRADE TO KUBERNETES 1.14
"Found 1223 rows for 1 resource type in 148 clusters in 3.301 seconds."
47
48. SOME USE CASES
All Pending Pods across all clusters
48
49. AVOIDING
CONFIGURATION DRIFT
49
50. CLUSTER CONFIGURATION
Clusters look mostly the same, except:
• secrets, e.g. credentials for external logging provider
• node pools and their instance sizes
Cluster-specific config items are stored in Cluster Registry
50
51. CLUSTER AUTOSCALER
51
52. VERTICAL POD AUTOSCALER
• External DNS
• Heapster / Metrics Server
• our ALB Ingress Controller
52
• Prometheus
53. VERTICAL POD AUTOSCALER
53
54. MONITORING &
COST EFFICIENCY
54
55. MONITORING SYSTEM - ZMON
• Dynamic entity registration
(clusters, pods, ..)
• Generic checks on entity attributes,
e.g. for all production clusters
"Less than 60% of worker nodes are ready"
• OpsGenie alerts
55
56. OPENTRACING
56
57. KUBERNETES RESOURCE REPORT
57
github.com/hjacobs/kube-resource-report
58. RESOURCE REPORT: TEAMS
Sorting teams by
Slack Costs
58
github.com/hjacobs/kube-resource-report
59. KUBERNETES APPLICATION DASHBOARD
59
60. VERTICAL POD AUTOSCALER
limit/requests adapted by VPA
60
61. DOWNSCALING DURING OFF-HOURS
Weekend
61
github.com/hjacobs/kube-downscaler
62. KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
62
github.com/hjacobs/kube-janitor
63. EC2 SPOT NODES
72% savings
63
64. OUR SETUP VS
VANILLA KUBERNETES
64
65. HOW MUCH DO WE DIVERGE?
• API access via Zalando OAuth
• CPU throttling disabled via Kubelet flag
• No memory overcommit (requests == limits)
• Ingress: External DNS, Skipper, AWS ALB
• Custom CRDs: Zalando OAuth, Postgres, StackSet
• Kubernetes Downscaler
• DNS setup (CoreDNS DaemonSet, ndots: 2)
65
66. INGRESS: ALB + SKIPPER
TLS
ALB
:443
:80 - redirect
K8S network
EC2 network
HTTP
NODE
Skipper
:9999
NODE
Skipper
:9999
Service
(list of pod IPs -
endpoints)
MyApp
10.2.0.2:8080
66
MyApp
10.2.0.3:8080
MyApp
10.2.1.2:8080
github.com/zalando/skipper
github.com/zalando-incubator/kube-ingress-aws-controller
67. DNS: COREDNS AS DAEMONSET
67
github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md
68. NON-PROD VS PROD
• Non-production similar to plain hosted Kubernetes
• Production:
• No write access (only via CI/CD)
• Compliance webhooks
• Require production-ready Docker images
68
69. COMPLIANCE FOR PRODUCTION
• Pods require application label pointing to application registry
⇒ establishes link to owning team
• Docker images must be built from master via CDP
NOTE: teams can freely choose their namespace(s)
69
70.
71. MONTHLY DEVELOPER NEWSLETTER
71
72. SUMMARY
• Seamless updates
• Avoid pet clusters
• Small disruptions are normal
• Automated cluster e2e tests
• Documentation & communication
72
73. FUTURE
• API version updates (1.16+)
• Improved Autoscaling
• Improved StackSet, Gradual Rollout
• Migrations
• Cost efficiency
• Looking at VPC CNI, AWS IAM, EKS, ...
73
74. KUBERNETES FAILURE STORIES
• Zalando's Failure Stories - KubeCon EU 2019
• Build Errors of Continuous Delivery Platform
• Total DNS outage in Kubernetes cluster
74
https://k8s.af
75. COMMON PITFALLS
• Insufficient e2e tests
• Readiness & Liveness Probes
• Resource Requests & Limits
• DNS
75
76. OPEN SOURCE & MORE
Cluster Config
github.com/zalando-incubator/kubernetes-on-aws
Skipper HTTP Router & Ingress controller
github.com/zalando/skipper
Ingress Controller for AWS
github.com/zalando-incubator/kube-ingress-aws-controller
Kubernetes Web View
codeberg.org/hjacobs/kube-web-view
More Zalando Tech Talks
github.com/zalando/public-presentations
76
77. Thank you!
Henning Jacobs
@try_except_
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.