POSTGRESQL MONITORING IN ZALANDO

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
相关话题: #zalando
1. POSTGRESQL MONITORING IN ZALANDO Helsinki PostgreSQL Meetup October 2019
2. POSTGRESQL IN ZALANDO 2
3. POSTGRESQL IN ZALANDO Zalando works with PostgreSQL since approx. 2010 (running in DC) 2015 2016-2017 2018-... 3 - migration to AWS (using RDS); Patroni and Spilo have been started. - using provided PostgreSQL clusters (based on Spilo/STUPS); - using PostgreSQL operator.
4. POSTGRESQL OPERATOR When a new postgresql custom resource appears, the operator creates: 1. 2. 3. 4. 4 StatefulSet for PostgreSQL/Patroni cluster Service for master node (ClusterIP or LB) Service for replica nodes (ClusterIP or LB) Extra DNS names for the services if needed
5. ZMON - ZALANDO MONITORING SYSTEM 5
6. Few facts about ZMON ● ● ● ● ● 6 Created in 2013 during a HackWeek, in production since 2014; Optimized for our use-case: autonomous teams, multiple Kubernetes clusters, common central storage; Open-sourced (APL2.0) and available from GitHub; Stores time-series data in KairosDB (based on top of Cassandra); Stores infrastructure data in PostgreSQL.
7. What’s important for us? ● ● ● ● ● 7 ZMON uses the “pull” model, its workers fetch the metrics ZMON is a distributed monitoring system, the workers are running in every K8s cluster; Autodiscover results in the most up-to-date view of our apps; Checks are centralized and may be shared between the teams; PostgreSQL Patroni/Spilo clusters are fully supported.
8. INFRASTRUCTURE MONITORING 8
9. What are the metrics we need? ● ● 9 Kubernetes ○ Pods CPU ○ Network I/O ○ Free space in Persistent Volumes ○ Open TCP connections for PostgreSQL processes AWS ○ EBS I/O ○ ELB throughput (rarely) ○ Backup S3 bucket
10. Prometheus 10
11. POSTGRESQL INTERNAL METRICS 11
12. How to collect internal metrics ● ● ● 12 ZMON worker is running in the same K8s cluster as the PostgreSQL nodes; It supports running SQL queries natively; We need only to figure out the proper tables/views to select from.
13. What are the metrics we need? ● ● ● ● Server pg_stat_activity ○ idle transactions ○ failed login attempts pg_stat_user_tables Tables ○ size / seq_scans / inserts / updates / deletes Indexes pg_stat_user_indexes ○ size / scans Backups pg_stat_archiver ○ WAL archiver status ○ 13 age of last backup S3 bucket check
14. Idle transactions 14
15. Index scans 15 It looks like a problem!
16. But what’s about authentication? ● ● ● ● ZMON worker uses its own credentials to connect to all the databases in the cluster. The credentials are separate from the application credentials; ZMON worker user is unique in every K8s cluster: robot_cluster_name; PostgreSQL role robot_zmon is created during the database setup to authorize the ZMON user to access the database objects; ZMON may also be authorized to run queries against the application tables or views, but the permissions for the ZMON role should be granted explicitly in DB: GRANT SELECT ON ALL TABLES IN SCHEMA public TO robot_zmon; 16
17. Application metrics 17
18. Key Takeaways ● ● ● ● 18 Get useful metrics from your infrastructure (K8s, AWS, …) Collect PostgreSQL metrics from the pg_stat_… views Get your application metrics by querying your tables or views directly (but beware of the performance impact) Treat your monitoring tool as another application, not as the superuser
19. Feel free to reach me: Uri Savelchev <uri.savelchev@zalando.fi> Find out more about our Culture, People & Jobs: ● ON SOCIAL MEDIA: Linkedin @Zalando SE Facebook @ Inside Zalando Instagram @insidezalando Twitter @ZalandoTech ● ● CAREER WEBSITE: jobs.zalando.com CORPORATE WEBSITE: corporate.zalando.com

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.1. UTC+08:00, 2025-03-14 17:38
浙ICP备14020137号-1 $访客地图$