PostgreSQL Berlin March Meetup
如果无法正常显示,请先停止浏览器的去广告插件。
相关话题:
#zalando
1. PostgreSQL Berlin March
Meetup
Tuesday, March 5, 2024
2. Agenda
● 19:00 - 19:10 - "Introduction to PostgreSQL at Zalando"
by Matthias Adler, Team Lead Database as a Service,
Zalando SE
● 19:15 - 19:45 "PostgreSQL worst practices" by Ilya
Kosmodemiansky, CEO of dataegret.com
● 19:50 - 20:20 "Peculiarities of Logical Decoding in
PostgreSQL" by Polina Bungina, Senior Database
Engineer, Zalando SE
● 20:20 - Q&A, networking, and Pizza & Drinks
● 22:00 - The End
3. Introduction to PostgreSQL at
Zalando
Tuesday, March 5, 2024
Matthias Adler
Team Lead Database as a Service,
Zalando SE
4. PostgreSQL at Zalando
● More than 3000 database clusters on Kubernetes
● Support for PostgreSQL 12 - 16
● Backups: wal-e (archiving) wal-g (restoring)
● Notable extensions
○ Monitoring: bg_mon
○ Query statistics: pg_stat_statements
○ Partitioning: pg_partman vs. Timescale
○ LLM embeddings: pgvector
● Core Contributions
○ Submitting and reviewing patches for
PostgreSQL and maintenance of FOSS projects
5. Patroni
● Industry’s high availability solution for PostgreSQL
● Management of PostgreSQL fleets
○ Leader election
○ Manages PostgreSQL configuration
○ Bootstraps PostgreSQL instances
○ Handles K8s API server outages
● Recent improvements in code quality
○ Completely Dropping Python 2.x
○ Adding type hints
○ Attracted new contributors
https://github.com/zalando/patroni
6. Spilo
● The complete bundle: PostgreSQL and Patroni
● Important utility scripts
○ WAL and log shipping to AWS S3
○ User management for employees
○ Metric helpers for pgView and monitoring
○ Automatic major version upgrades
● Other container images
○ PostgreSQL Docker image for local development
○ Flyway migration template
○ SQL export to AWS S3
https://github.com/zalando/spilo
7. Postgres-Operator
●
K8s operator that provisions PostgreSQL (“Spilo”)
clusters
● Designed to help engineers and teams
○ Tuning capabilities (CPU, Memory, IOPS etc.)
○ Automation for database roles and privileges
○ Easy access for owning teams
○ UI to inspect manifest, backups & operator logs
● Clone functionality to easily test in production
○ Testing major version upgrade
○ Query performance analysis
https://github.com/zalando/postgres-operator
8. PG-View
● Live monitoring of PostgreSQL clusters
● Users could observe
○ High CPU load and OOM exceptions
○ Running out of free disk space
○ Replication delay and stale WAL consumers
○ Long running and/or blocking queries
● Database browser
○ Schema and table view (to check indexes)
○ Statement statistics (and its history)
○ User logins
9. Self-Service Database on K8s
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: teapot-test
namespace: default
spec:
numberOfInstances: 2
enableConnectionPooler: true
postgresql:
version: "15"
teamId: teapot
users:
app_user: []
databases:
app_db: app_user
volume:
size: 50Gi
# Kubernetes Custom Kubernetes Resource (CRD) definition
# pick a PostgreSQL cluster name and namespace
# 2 = 1 primary, 1 streaming replica
# Enable connection pooler (using PGBouncer)
# major version, upgradable by you
# team members automatically added to database
# application user with privileges to own database
# volume size, increasable
10. Continuous Delivery Platform
11. What you get ?
● Manifest validation and delete protection
● In-place major version upgrade
● HA with automatic leader election and failover
● Continuous backups with point-in-time-recovery (up until last 5 days)
● User provisioning, OAuth2 access for employees with SSO integration
● Databases with pre-configured role setup
● And … built-in monitoring with PG-View & Grafana, etc.
12. Break
13. PostgreSQL worst practices
Tuesday, March 5, 2024
Ilya Kosmodemiansky
CEO of dataegret.com
14. Questions?
15. Break
16. Hazards of
logical decoding
Polina Bungina, PostgreSQL Berlin Meetup, 05-03-2024
17. Agenda
1.
Intro
a.
2.
17
CDC, approaches for PostgreSQL
Logical decoding
a. The hazards
b. Specific problems
18. CDC - Change Data Capture
●
Is a set of design patterns used to track when and what changes
occur in data then notify other systems and services that must
respond to those changes.
●
18
Maintain consistency across all systems that rely on data
19. CDC in PostgreSQL
19
● Queries on Timestamp Column
● Triggers
● Logical decoding
20. Logical decoding
Process of extracting all persistent changes to a database’s tables into
a coherent, easy to understand format which can be interpreted
without detailed knowledge of the database’s internal state.
Decoding the contents of the WAL (changes on the storage level) into
the app-specific form (e.g. tuples, SQL statements)
20
21. The hazards of logical decoding
21
22. The hazards
Replication slots!
●
All resources required for the consumer are defined by the slot’s
position in the form of an LSN (log sequence number)
●
●
22
Resource is not removed if needed by at least one consumer
○ WAL
○ System catalog rows
Crash-safe - state is persisted to disk
○ only on checkpoint
○ independently of the connection using it
23. The hazards - replication slots
●
confirmed_flush_lsn - LSN up to which the logical slot's
consumer has confirmed receiving data
●
restart_lsn - LSN of the oldest WAL which still required by the
consumer => won't be removed during checkpoints
●
catalog_xmin - the oldest transaction affecting the system
catalogs that this slot needs the database to retain =>
VACUUM cannot remove catalog tuples deleted by any later
transaction
●
23
exposed in pg_replication_slots view
24. 24
25. The hazards - replication slots
25
26. The hazards - replication slots
26
27. The hazards
Important things to consider
● Single WAL for the whole cluster, not per-database
● Repeated message handling (after a crash/failover)
● Decoded transactions are sent on commit
○
27
[PG14] streaming of in-progress transactions
28. 28
29. Logical decoding
29
30. Logical decoding
30
31. The hazards
“Big” transactions
●
logical_decoding_work_mem - max amount of memory to be used by
logical decoding, before some of the decoded changes are written to
local disk
● Written to: $PGDATA/pg_replslot/<slot_name>/*
● [PG14] pg_stat_replication_slots view
○
31
spill _txns, spill_count, spill_bytes
32. The hazards - “big” transactions
32
33. Specific problems
33
34. Specific problems
Slot is still there
●
Consumer downscaling
○
test the new approach: we go live later and don’t expect new
events
○
misconfigured setup: “decoding doesn’t work as expected…
Oh, but it is already 6pm, will handle it tomorrow.”
○
34
test env on the weekends
35. Specific problems
But wait, I
●
do not produce writes for my publication
○
●
see “WAL is a single stream” for the whole cluster
do not produce write activity in the db
○ scheduled vacuum/reindex/… ?
○ autovacuum
○ WAL segment switches (archive_timeout) on any database
activity
Segment file size is always the same (16MB by default)
35
36. Specific problems
Slot is still there
●
Cleanup
○
not just stopping to consume - slot should be deleted once
not needed
○
36
Patroni config: delete from the permanent slots section
37. Specific problems
Consumer performance
37
● Failing/not performant consumer == risk of WAL files piling up
● Make use of publications (filtering):
○ FOR TABLE / FOR TABLES IN SCHEMA
○ with (publish = 'insert, update, delete, truncate')
○ unrelated events are not sent to the consumer
38. Specific problems - pgjdbc oversight
Pgjdbc oversight
38
39. Specific problems - pgjdbc oversight
39
40. Specific problems - pgjdbc oversight
● Used by Debezium
● Not properly handling Keepalive messages
○ Keepalive msg with the flag requestReply
○ Local flushPtr is not updated with the received sendPtr position
○ Standby Status Update msg with the old flushPtr
○ Slot is not advanced unless there is a write produced for the
respective publication
○
Problem of endless shutdown
■
●
40
a clean shutdown == all changes are received by consumers
Fixed in pgjdbc v42.7.0
41. Specific problems
“Big” transaction
“Transactional outbox” pattern:
●
Atomically update the database and send messages to a message
broker
● Messages emitted via an “outbox table” in the same transaction
● Improved way (without additional table) -
pg_logical_emit_message( transactional boolean, prefix text, content text ) → pg_lsn
pg_logical_emit_message( transactional boolean, prefix text, content bytea ) → pg_lsn
41
42. Specific problems - “Big” transaction
42
43. Specific problems - “Big” transaction
Initial situation:
43
● Outbox table grew to 300GB (no immediate deletion)
● Deletion cron job running for several days, as we are already too big
● Table and indexes became bloated
● Table is actively used for reads - latency spikes observed
44. Specific problems - “Big” transaction
Let’s run pg_repack!
●
New table containing all the rows of the original one
○
44
INSERT INTO <temp_table> SELECT … FROM <origin_table>
● For 300GB is a long, write-intensive transaction
● full_page_writes = ‘on’
● Huge amount of WAL generated
45. 15:45
~/pgdata/pgroot/data$ du -sh pg_replslot/*
70GB
pg_replslot/my_slot
15:53
~/pgdata/pgroot/data$ du -sh pg_replslot/*
152GB
pg_replslot/my_slot
16:03
~/pgdata/pgroot/data$ du -sh pg_replslot/*
185GB
pg_replslot/my_slot
16:09 - slot dropped
~/pgdata/pgroot/data$ du -sh pg_replslot/*
8.0K
pg_replslot/my_slot
45
46. Thank you!
46
47. 🍕Pizza & Drinks 🍻
48. Thank You!
See you again soon …