Developer Experience at Zalando2

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
相关话题: #zalando
1. Developer Experience at Zalando CNCF END USER SIG-DX 2019-04-18 HENNING JACOBS @try_except_
2. EUROPE’S LEADING ONLINE FASHION PLATFORM 2
3. ZALANDO AT A GLANCE ~ 5.4 billion EUR > 250 million revenue 2018 3 > 15.000 > 79% employees in Europe of visits via mobile devices visits per month > 300.000 > 26 product choices million ~ 2.000 17 brands countries active customers
4. > 200 development teams > 1100 developers 4 Platform
5. YOU BUILD IT, YOU RUN IT The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. - A Conversation with Werner Vogels, ACM Queue, 2006 5
6. ON-CALL: YOU OWN IT, YOU RUN IT When things are broken, we want people with the best context trying to fix things. - Blake Scrivener, Netflix SRE Manager 6
7. KUBERNETES @ ZALANDO 7 Default Deployment Target 114 clusters 1400~ nodes Node Autoscaling Since Oct 2016 From v1.4 to v1.12
8. DEVELOPERS USING KUBERNETES 8
9. DEVELOPER JOURNEY Consistent story that models all aspects of SW dev 9
10. Developer Journey 10
11. Correctness Compliance GDPR Security Cost Efficiency 24x7 On Call Governance Resilience Capacity ... 11 Developer Journey
12. DEVELOPER PRODUCTIVITY Setup Code Build Test Deploy Cloud Native Application Runtime 12 Operate
13.
14. PLAN & SETUP 14
15. Plan Stories Rules of Play Tech Radar 15
16.
17. Setup Application Bootstrapping 17
18.
19.
20. BUILD & TEST 20
21. CONTINUOUS DELIVERY PLATFORM: BUILD push Git code 21 CDP
22.
23. DEPLOY 23
24. Kubernetes Deploy 24
25. DEPLOYMENT CONFIGURATION ├── deploy/apply │ ├── deployment.yaml │ ├── credentials.yaml # Zalando IAM │ ├── ingress.yaml │ └── service.yaml └── delivery.yaml # Zalando CI/CD 25
26. INGRESS.YAML kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80 26
27. TEMPLATING: MUSTACHE kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "{{{APPLICATION}}}.example.org" http: paths: - backend: serviceName: "{{{APPLICATION}}}" servicePort: 80 27
28. CONTINUOUS DELIVERY PLATFORM 28
29. CDP: DEPLOY "glorified kubectl apply" 29
30. CDP: OPTIONAL APPROVAL 30
31. STACKSET: TRAFFIC SWITCHING 31 github.com/zalando-incubator/stackset-controller
32. STACKSET CRD apiVersion: zalando.org/v1 kind: StackSet ... spec: ingress: hosts: ["foo.example.org"] backendPort: 8080 stackLifecycle: scaledownTTLSeconds: 1800 limit: 5 stackTemplate: spec: podTemplate: ... 32 github.com/zalando-incubator/stackset-controller
33. TRAFFIC SWITCHING STEPS IN CDP 33 github.com/zalando-incubator/stackset-controller
34. EMERGENCY ACCESS SERVICE Get emergency access by referencing existing Incident ticket: zkubectl cluster-access request --emergency -i INC REASON Get privileged production access via 4-eyes: zkubectl cluster-access request REASON zkubectl cluster-access approve USERNAME 34
35. INTEGRATIONS 35
36. CLOUD FORMATION VIA CI/CD "Infrastructure as Code" ├── deploy/apply │ ├── deployment.yaml │ ├── cf-iam-role.yaml │ ├── cf-rds.yaml │ ├── kube-ingress.yaml │ ├── kube-secret.yaml │ └── kube-service.yaml └── delivery.yaml 36 # Kubernetes # AWS IAM Role # AWS RDS Database # CI/CD config
37. ZALANDO IAM/OAUTH VIA CRD apiVersion: zalando.org/v1 kind: PlatformCredentialsSet .. spec: application: my-app tokens: read-only: privileges: - com.zalando::foobar.read clients: employee: grant: authorization-code realm: users redirectUri: https://example.org/auth/callback 37
38. POSTGRES OPERATOR Application to manage PostgreSQL clusters on Kubernetes >700 clusters running on Kubernetes 38 github.com/zalando/postgres-operator
39. Elasticsearch 2.500 vCPUs 1 TB RAM Elasticsearch in Kubernetes github.com/zalando-incubator/es-operator/
40. SUMMARY • Application Bootstrapping • Git as source of truth and UI • 4-eyes principle for master/production • Extensible Kubernetes API as primary interface • OAuth/IAM credentials • PostgreSQL • CloudFormation for proprietary AWS services 40
41. DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate 41 https://srcco.de/posts/accelerate-software-delivery-performance.html
42. CONTAINERS 42 From "Accelerate: The Science of Lean Software and DevOps"
43. DELIVERY PERFORMANCE METRICS 43 • Lead Time ≙ Commit to Prod • Release Frequency ≙ Deploys/week/dev • Time to Restore Service ≙ MTRS from incidents • Change Fail Rate ≙ n/a
44. “.. means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience.” - ThoughtWorks Technology Radar
45.
46. DEVELOPER SATISFACTION 46
47. DOCUMENTATION "Documentation is hard to find" "Documentation is not comprehensive enough" "Remove unnecessary complexity and obstacles." "Get the documentation up to date and prepare use cases" "More and more clear documentation" "More detailed docs, example repos with more complicated deployments." 47
48. DOCUMENTATION • Restructure following https://www.divio.com/en/blog/documentation/ • Concepts • How Tos • Tutorials • Reference • Global Search • Weekly Health Check: Support → Documentation 48
49.
50. NEWSLETTER "You can now.." • You can now benefit from the most recent Kubernetes 1.12 features, e.g. .. • You can now analyse your Kotlin project with SonarQube and upload your Scala code coverage report to SonarQube 50
51. SIGNAL: ISSUE UPVOTES 51
52. TESTIMONIALS “So, thank you, Team Automata, for listening to our community, taking our upvotes in consideration when developing new solutions and building every day 'the first CI that doesn't suck'.” - a user, October 2018 52
53. MONITORING 53
54. ZMON DASHBOARD github.com/zalando/zmon 54
55. GRAFANA APPLICATION DASHBOARD 55
56. KUBERNETES RESOURCE REPORT 56 github.com/hjacobs/kube-resource-report
57. RESOURCE REPORT: TEAMS Sorting teams by Slack Costs 57 github.com/hjacobs/kube-resource-report
58. RESOURCE REPORT: APPLICATIONS "Slack" 58
59. RESOURCE REPORT: CLUSTERS "Slack" 59 github.com/hjacobs/kube-resource-report
60. UNDER THE HOOD 60
61. ZALANDO: DECISION 1. Forbid Memory Overcommit • Implement mutating admission webhook • Set requests = limits 2. Disable CPU CFS Quota in all clusters • --cpu-cfs-quota=false 61
62. KUBERNETES CLUSTER SETUP Master Config Master EC2 Instances CloudFormation Stacks Worker github.com/zalando-incubator/kubernetes-on-aws 62
63. CLUSTER PROVISIONING CLUSTER LIFECYCLE MANAGER (CLM) ADMIN create CloudFormation apply manifests CLUSTER REGISTRY CLM API create CF stack provision resources API ... ... ... github.com/zalando-incubator/cluster-lifecycle-manager 63 github.com/zalando-incubator/kubernetes-on-aws
64. INGRESS 64 https://github.com/zalando-incubator/kube-ingress-aws-controller
65. apiVersion: poc.autoscaling.k8s.io/v1alpha1 kind: VerticalPodAutoscaler metadata: name: prometheus-vpa namespace: kube-system spec: selector: matchLabels: application: prometheus updatePolicy: updateMode: Auto 65 VPA FOR PROMETHEUS
66. VERTICAL POD AUTOSCALER limit/requests adapted by VPA 66
67. HORIZONTAL POD AUTOSCALING (CUSTOM METRICS) 67 Queue Length Ingress Req/s Prometheus Query ZMON Check github.com/zalando-incubator/kube-metrics-adapter
68. DOWNSCALING DURING OFF-HOURS Weekend 68 github.com/hjacobs/kube-downscaler
69. DOWNSCALING DURING OFF-HOURS DEFAULT_UPTIME="Mon-Fri 07:30-20:30 CET" annotations: downscaler/exclude: "true" 69 github.com/hjacobs/kube-downscaler
70. KUBERNETES JANITOR ● TTL and expiry date annotations, e.g. ○ set time-to-live for your test deployment ● Custom rules, e.g. ○ delete everything without "app" label after 7 days 70 github.com/hjacobs/kube-janitor
71. JANITOR TTL ANNOTATION # let's try out nginx, but only for 1 hour kubectl run nginx --image=nginx kubectl annotate deploy nginx janitor/ttl=1h 71 github.com/hjacobs/kube-janitor
72. CUSTOM JANITOR RULES # require "app" label for new pods starting April 2019 - id: require-app-label-april-2019 resources: - deployments - statefulsets jmespath: "!(spec.template.metadata.labels.app) && metadata.creationTimestamp > '2019-04-01'" ttl: 7d 72 github.com/hjacobs/kube-janitor
73. EC2 SPOT NODES 72% savings 73
74. SPOT ASG / LAUNCH TEMPLATE 74 Not upstream in cluster-autoscaler (yet)
75. OPEN SOURCE Kubernetes on AWS github.com/zalando-incubator/kubernetes-on-aws AWS ALB Ingress controller github.com/zalando-incubator/kube-ingress-aws-controller External DNS github.com/kubernetes-incubator/external-dns Postgres Operator github.com/zalando/postgres-operator Kubernetes Resource Report github.com/hjacobs/kube-resource-report Kubernetes Downscaler github.com/hjacobs/kube-downscaler Kubernetes Janitor github.com/hjacobs/kube-janitor 75
76. MORE INFO ● DevOps Gathering 2019: Ensuring Kubernetes Cost Efficiency across (many) Clusters (slides) ● DevOpsCon Munich 2018: Running Kubernetes in Production: A Million Ways to Crash Your Cluster ● HighLoad++ Moscow 2018: Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency (slides) ● DevOps Lisbon Meetup 2018: Kubernetes at Zalando kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/public-presentations.html 76
77. QUESTIONS? HENNING JACOBS HEAD OF DEVELOPER PRODUCTIVITY henning@zalando.de @try_except_ Illustrations by @01k

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.1. UTC+08:00, 2025-04-03 04:51
浙ICP备14020137号-1 $访客地图$