Why Kubernetes?
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. Why
Kubernetes?
OWL TECH &
INNOVATION DAY
2019-09-26
HENNING JACOBS
@try_except_
2. ROLLING OUT KUBERNETES?
"We are rolling out Kubernetes to production next month
and I'm interested to hear from people who made that
step already."
2
3. DON'T USE IT !!!!!
3
4. DON'T USE IT !!!!!
4
5. 5
6. KUBERNETES FAILURE STORIES
6
7. ZALANDO AT A GLANCE
~ 5.4
billion EUR
> 250
million
revenue 2018
7
> 15.000 > 79%
employees in
Europe of visits via
mobile devices
visits
per
month
> 300.000
> 26 product choices
million ~ 2.000 17
brands countries
active customers
8. A BRIEF HISTORY OF
ZALANDO TECH
8
9. 2010
"Sysop-Test"
"QA-Test"
9
10. 2013: SELF SERVICE
10
11. 2015: RADICAL AGILITY
DOCKER
DEPLOY
SSH
ACCESS
AUDIT
REPORTS
STUPS
AWS
11
FULL AWS
ACCESS
Teams have
admin access
& full
responsibility
12. 2015: ISOLATED AWS ACCOUNTS
Internet
*.abc.example.org
ELB
Team ABC
EC2
12
*.xyz.example.org
ELB
Team XYZ
EC2
13. 2019: SCALE
396
Accounts
140
13
Clusters
14. 2019: DEVELOPERS USING KUBERNETES
14
15. > 200
development teams
> 1100
developers
15
Platform
16. YOU BUILD IT, YOU RUN IT
The traditional model is that you take your software to the
wall that separates development and operations, and
throw it over and then forget about it. Not at Amazon.
You build it, you run it. This brings developers into
contact with the day-to-day operation of their software. It
also brings them into day-to-day contact with the
customer.
- A Conversation with Werner Vogels, ACM Queue, 2006
16
17. ON-CALL: YOU OWN IT, YOU RUN IT
When things are broken,
we want people with the best
context trying to fix things.
- Blake Scrivener, Netflix SRE Manager
17
18. DEVELOPER JOURNEY
Consistent story
that models
all aspects of SW dev
18
19. Developer
Journey
19
20. Correctness
Compliance
GDPR
Security
Cost Efficiency
24x7 On Call
Governance
Resilience
Capacity
...
20
Developer
Journey
21. DEVELOPER PRODUCTIVITY
Setup
Code
Build
Test
Deploy
Cloud Native Application Runtime
21
Operate
22. CLOUD NATIVE
.. uses an open source software stack to deploy
applications as microservices, packaging each part into
its own container, and dynamically orchestrating those
containers to optimize resource utilization.
Cloud native technologies enable software developers to
build great products faster.
- https://www.cncf.io/
22
23. CONTAINERS END-TO-END
Setup
Code
Build
Test
Deploy
Cloud Native Application Runtime
23
Operate
24. CONTAINERS
24
25. CONTAINERS
25
26.
27. PLAN & SETUP
27
28. Plan
Stories
Rules of Play
Tech Radar
28
29.
30. Setup
Application
Bootstrapping
30
31.
32.
33. BUILD & TEST
33
34. CONTINUOUS DELIVERY PLATFORM: BUILD
push
Git
code
34
CDP
35.
36. DEPLOY
36
37. Kubernetes
Deploy
37
38. DEPLOYMENT CONFIGURATION
├── deploy/apply
│
├── deployment.yaml
│
├── credentials.yaml # Zalando IAM
│
├── ingress.yaml
│
└── service.yaml
└── delivery.yaml
# Zalando CI/CD
38
39. INGRESS.YAML
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: "myapp.foo.example.org"
http:
paths:
- backend:
serviceName: "myapp"
servicePort: 80
39
40. TEMPLATING: MUSTACHE
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: "{{{APPLICATION}}}.example.org"
http:
paths:
- backend:
serviceName: "{{{APPLICATION}}}"
servicePort: 80
40
41. CONTINUOUS DELIVERY PLATFORM
41
42. CDP: DEPLOY
"glorified kubectl apply"
42
43. CDP: OPTIONAL APPROVAL
43
44. STACKSET: TRAFFIC SWITCHING
44
github.com/zalando-incubator/stackset-controller
45. STACKSET CRD
kind: StackSet
...
spec:
ingress:
hosts: ["foo.example.org"]
backendPort: 8080
stackLifecycle:
scaledownTTLSeconds: 1800
limit: 5
stackTemplate:
spec:
podTemplate:
...
45
github.com/zalando-incubator/stackset-controller
46. TRAFFIC SWITCHING STEPS IN CDP
46
github.com/zalando-incubator/stackset-controller
47. You build it, you run it!
Deploy
47
48. EMERGENCY ACCESS SERVICE
Emergency access by referencing Incident
zkubectl cluster-access request \
--emergency -i INC REASON
Privileged production access via 4-eyes
zkubectl cluster-access request REASON
zkubectl cluster-access approve USERNAME
48
49. KUBERNETES WEB VIEW
kubectl get
pods,stacks,deploys,..
49
50. SEARCHING ACROSS 140+ CLUSTERS
50
codeberg.org/hjacobs/kube-web-view
51. INTEGRATIONS
51
52. CLOUD FORMATION VIA CI/CD
"Infrastructure as Code"
├── deploy/apply
│
├── deployment.yaml
│
├── cf-iam-role.yaml
│
├── cf-rds.yaml
│
├── kube-ingress.yaml
│
├── kube-secret.yaml
│
└── kube-service.yaml
└── delivery.yaml
52
# Kubernetes
# AWS IAM Role
# AWS RDS Database
# CI/CD config
53. POSTGRES OPERATOR
Application to manage
PostgreSQL clusters on
Kubernetes
>500
clusters running
on Kubernetes
53
github.com/zalando/postgres-operator
54. Elasticsearch
2.500 vCPUs
1 TB RAM
Elasticsearch in Kubernetes
github.com/zalando-incubator/es-operator/
55. SUMMARY
• Application Bootstrapping
• Git as source of truth and UI
• 4-eyes principle for master/production
• Extensible Kubernetes API as primary interface
• OAuth/IAM credentials
• PostgreSQL, Elasticsearch
• CloudFormation for proprietary AWS services
55
56. MONITORING &
COST EFFICIENCY
56
57. KUBERNETES RESOURCE REPORT
57
github.com/hjacobs/kube-resource-report
58. RESOURCE REPORT: TEAMS
Sorting teams by
Slack Costs
58
github.com/hjacobs/kube-resource-report
59. KUBERNETES APPLICATION DASHBOARD
59
60. https://github.com/hjacobs/kube-ops-view
61. VERTICAL POD AUTOSCALER
limit/requests adapted by VPA
61
62. DOWNSCALING DURING OFF-HOURS
Weekend
62
github.com/hjacobs/kube-downscaler
63. KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
63
github.com/hjacobs/kube-janitor
64. EC2 SPOT NODES
72% savings
64
65. STABILITY ↔ EFFICIENCY
Slack
Autoscaling
Buffer
Disable
Overcommit
Cluster
Overhead
65
Resource
Report
HPA
VPA
Downscaler
Janitor
EC2 Spot
66. DELIVERY PERFORMANCE METRICS
• Lead Time
• Release Frequency
• Time to Restore Service
• Change Fail Rate
66
srcco.de/posts/accelerate-software-delivery-performance.html
67. CONTAINERS
67
From "Accelerate: The Science of Lean Software and DevOps"
68. DELIVERY PERFORMANCE METRICS
68
• Lead Time ≙ Commit to Prod
• Release Frequency ≙ Deploys/week/dev
• Time to Restore Service ≙ MTRS from incidents
• Change Fail Rate ≙ n/a
69. “.. means establishing empathy with internal
consumers (read: developers) and collaborating
with them on the design. Platform product managers
establish roadmaps and ensure the platform delivers
value to the business and enhances the developer
experience.”
- ThoughtWorks Technology Radar
70.
71. DEVELOPER SATISFACTION
71
72. DOCUMENTATION
"Documentation is hard to find"
"Documentation is not comprehensive enough"
"Remove unnecessary complexity and obstacles."
"Get the documentation up to date and prepare
use cases"
"More and more clear documentation"
"More detailed docs, example repos with more
complicated deployments."
72
73. DOCUMENTATION
• Restructure following
https://www.divio.com/en/blog/documentation/
• Concepts
• How Tos
• Tutorials
• Reference
• Global Search
• Weekly Health Check: Support → Documentation
73
74.
75. WHY
KUBERNETES?
75
76. WHY KUBERNETES?
• provides enough abstractions (StatefulSet, CronJob, ..)
• provides consistency (API spec/status)
• is extensible (annotations, CRDs, API aggreg.)
• certain compatibility guarantee (versioning)
• widely adopted (all cloud providers)
• works across environments and implementations
76
srcco.de/posts/why-kubernetes.html
77. WHY KUBERNETES?
(for Zalando)
• Efficiency
• Common Operational Model
• Developer Experience
• Cloud Provider Independent
• Compliance and Security
• Talent
77
78. KUBERNETES FAILURE STORIES
• Learning about production pitfalls!
• Availability bias?
78
https://k8s.af
79. FACTFULNESS
Things can be both better and bad!
How would failure stories for
your non-K8s infra look like?
79
https://k8s.af
80. COMPLEXITY FOR GOOGLE-SCALE INFRA?
• Managed DO cluster: 4 minutes
• K3s single node: 2 minutes
80
demo.j-serv.de
81. DE-FACTO STANDARD, EXTENSIBLE API
81
82. 82
83. MAYBE THAT'S GOOD?
83
84. OPEN SOURCE & MORE
Kubernetes on AWS
github.com/zalando-incubator/kubernetes-on-aws
Skipper HTTP Router & Ingress controller
github.com/zalando/skipper
External DNS
github.com/kubernetes-incubator/external-dns
Postgres Operator
github.com/zalando-incubator/postgres-operator
More Zalando Tech Talks
github.com/zalando/public-presentations
84
85. QUESTIONS?
HENNING JACOBS
SENIOR PRINCIPAL
henning@zalando.de
@try_except_
Illustrations by @01k