Why Kubernetes?

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
相关话题: #zalando
1. Why Kubernetes? OWL TECH & INNOVATION DAY 2019-09-26 HENNING JACOBS @try_except_
2. ROLLING OUT KUBERNETES? "We are rolling out Kubernetes to production next month and I'm interested to hear from people who made that step already." 2
3. DON'T USE IT !!!!! 3
4. DON'T USE IT !!!!! 4
5. 5
6. KUBERNETES FAILURE STORIES 6
7. ZALANDO AT A GLANCE ~ 5.4 billion EUR > 250 million revenue 2018 7 > 15.000 > 79% employees in Europe of visits via mobile devices visits per month > 300.000 > 26 product choices million ~ 2.000 17 brands countries active customers
8. A BRIEF HISTORY OF ZALANDO TECH 8
9. 2010 "Sysop-Test" "QA-Test" 9
10. 2013: SELF SERVICE 10
11. 2015: RADICAL AGILITY DOCKER DEPLOY SSH ACCESS AUDIT REPORTS STUPS AWS 11 FULL AWS ACCESS Teams have admin access & full responsibility
12. 2015: ISOLATED AWS ACCOUNTS Internet *.abc.example.org ELB Team ABC EC2 12 *.xyz.example.org ELB Team XYZ EC2
13. 2019: SCALE 396 Accounts 140 13 Clusters
14. 2019: DEVELOPERS USING KUBERNETES 14
15. > 200 development teams > 1100 developers 15 Platform
16. YOU BUILD IT, YOU RUN IT The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. - A Conversation with Werner Vogels, ACM Queue, 2006 16
17. ON-CALL: YOU OWN IT, YOU RUN IT When things are broken, we want people with the best context trying to fix things. - Blake Scrivener, Netflix SRE Manager 17
18. DEVELOPER JOURNEY Consistent story that models all aspects of SW dev 18
19. Developer Journey 19
20. Correctness Compliance GDPR Security Cost Efficiency 24x7 On Call Governance Resilience Capacity ... 20 Developer Journey
21. DEVELOPER PRODUCTIVITY Setup Code Build Test Deploy Cloud Native Application Runtime 21 Operate
22. CLOUD NATIVE .. uses an open source software stack to deploy applications as microservices, packaging each part into its own container, and dynamically orchestrating those containers to optimize resource utilization. Cloud native technologies enable software developers to build great products faster. - https://www.cncf.io/ 22
23. CONTAINERS END-TO-END Setup Code Build Test Deploy Cloud Native Application Runtime 23 Operate
24. CONTAINERS 24
25. CONTAINERS 25
26.
27. PLAN & SETUP 27
28. Plan Stories Rules of Play Tech Radar 28
29.
30. Setup Application Bootstrapping 30
31.
32.
33. BUILD & TEST 33
34. CONTINUOUS DELIVERY PLATFORM: BUILD push Git code 34 CDP
35.
36. DEPLOY 36
37. Kubernetes Deploy 37
38. DEPLOYMENT CONFIGURATION ├── deploy/apply │ ├── deployment.yaml │ ├── credentials.yaml # Zalando IAM │ ├── ingress.yaml │ └── service.yaml └── delivery.yaml # Zalando CI/CD 38
39. INGRESS.YAML kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80 39
40. TEMPLATING: MUSTACHE kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "{{{APPLICATION}}}.example.org" http: paths: - backend: serviceName: "{{{APPLICATION}}}" servicePort: 80 40
41. CONTINUOUS DELIVERY PLATFORM 41
42. CDP: DEPLOY "glorified kubectl apply" 42
43. CDP: OPTIONAL APPROVAL 43
44. STACKSET: TRAFFIC SWITCHING 44 github.com/zalando-incubator/stackset-controller
45. STACKSET CRD kind: StackSet ... spec: ingress: hosts: ["foo.example.org"] backendPort: 8080 stackLifecycle: scaledownTTLSeconds: 1800 limit: 5 stackTemplate: spec: podTemplate: ... 45 github.com/zalando-incubator/stackset-controller
46. TRAFFIC SWITCHING STEPS IN CDP 46 github.com/zalando-incubator/stackset-controller
47. You build it, you run it! Deploy 47
48. EMERGENCY ACCESS SERVICE Emergency access by referencing Incident zkubectl cluster-access request \ --emergency -i INC REASON Privileged production access via 4-eyes zkubectl cluster-access request REASON zkubectl cluster-access approve USERNAME 48
49. KUBERNETES WEB VIEW kubectl get pods,stacks,deploys,.. 49
50. SEARCHING ACROSS 140+ CLUSTERS 50 codeberg.org/hjacobs/kube-web-view
51. INTEGRATIONS 51
52. CLOUD FORMATION VIA CI/CD "Infrastructure as Code" ├── deploy/apply │ ├── deployment.yaml │ ├── cf-iam-role.yaml │ ├── cf-rds.yaml │ ├── kube-ingress.yaml │ ├── kube-secret.yaml │ └── kube-service.yaml └── delivery.yaml 52 # Kubernetes # AWS IAM Role # AWS RDS Database # CI/CD config
53. POSTGRES OPERATOR Application to manage PostgreSQL clusters on Kubernetes >500 clusters running on Kubernetes 53 github.com/zalando/postgres-operator
54. Elasticsearch 2.500 vCPUs 1 TB RAM Elasticsearch in Kubernetes github.com/zalando-incubator/es-operator/
55. SUMMARY • Application Bootstrapping • Git as source of truth and UI • 4-eyes principle for master/production • Extensible Kubernetes API as primary interface • OAuth/IAM credentials • PostgreSQL, Elasticsearch • CloudFormation for proprietary AWS services 55
56. MONITORING & COST EFFICIENCY 56
57. KUBERNETES RESOURCE REPORT 57 github.com/hjacobs/kube-resource-report
58. RESOURCE REPORT: TEAMS Sorting teams by Slack Costs 58 github.com/hjacobs/kube-resource-report
59. KUBERNETES APPLICATION DASHBOARD 59
60. https://github.com/hjacobs/kube-ops-view
61. VERTICAL POD AUTOSCALER limit/requests adapted by VPA 61
62. DOWNSCALING DURING OFF-HOURS Weekend 62 github.com/hjacobs/kube-downscaler
63. KUBERNETES JANITOR ● TTL and expiry date annotations, e.g. ○ set time-to-live for your test deployment ● Custom rules, e.g. ○ delete everything without "app" label after 7 days 63 github.com/hjacobs/kube-janitor
64. EC2 SPOT NODES 72% savings 64
65. STABILITY ↔ EFFICIENCY Slack Autoscaling Buffer Disable Overcommit Cluster Overhead 65 Resource Report HPA VPA Downscaler Janitor EC2 Spot
66. DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate 66 srcco.de/posts/accelerate-software-delivery-performance.html
67. CONTAINERS 67 From "Accelerate: The Science of Lean Software and DevOps"
68. DELIVERY PERFORMANCE METRICS 68 • Lead Time ≙ Commit to Prod • Release Frequency ≙ Deploys/week/dev • Time to Restore Service ≙ MTRS from incidents • Change Fail Rate ≙ n/a
69. “.. means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience.” - ThoughtWorks Technology Radar
70.
71. DEVELOPER SATISFACTION 71
72. DOCUMENTATION "Documentation is hard to find" "Documentation is not comprehensive enough" "Remove unnecessary complexity and obstacles." "Get the documentation up to date and prepare use cases" "More and more clear documentation" "More detailed docs, example repos with more complicated deployments." 72
73. DOCUMENTATION • Restructure following https://www.divio.com/en/blog/documentation/ • Concepts • How Tos • Tutorials • Reference • Global Search • Weekly Health Check: Support → Documentation 73
74.
75. WHY KUBERNETES? 75
76. WHY KUBERNETES? • provides enough abstractions (StatefulSet, CronJob, ..) • provides consistency (API spec/status) • is extensible (annotations, CRDs, API aggreg.) • certain compatibility guarantee (versioning) • widely adopted (all cloud providers) • works across environments and implementations 76 srcco.de/posts/why-kubernetes.html
77. WHY KUBERNETES? (for Zalando) • Efficiency • Common Operational Model • Developer Experience • Cloud Provider Independent • Compliance and Security • Talent 77
78. KUBERNETES FAILURE STORIES • Learning about production pitfalls! • Availability bias? 78 https://k8s.af
79. FACTFULNESS Things can be both better and bad! How would failure stories for your non-K8s infra look like? 79 https://k8s.af
80. COMPLEXITY FOR GOOGLE-SCALE INFRA? • Managed DO cluster: 4 minutes • K3s single node: 2 minutes 80 demo.j-serv.de
81. DE-FACTO STANDARD, EXTENSIBLE API 81
82. 82
83. MAYBE THAT'S GOOD? 83
84. OPEN SOURCE & MORE Kubernetes on AWS github.com/zalando-incubator/kubernetes-on-aws Skipper HTTP Router & Ingress controller github.com/zalando/skipper External DNS github.com/kubernetes-incubator/external-dns Postgres Operator github.com/zalando-incubator/postgres-operator More Zalando Tech Talks github.com/zalando/public-presentations 84
85. QUESTIONS? HENNING JACOBS SENIOR PRINCIPAL henning@zalando.de @try_except_ Illustrations by @01k

Home - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.0. UTC+08:00, 2025-02-22 04:33
浙ICP备14020137号-1 $Map of visitor$