POSTGRESQL AT LOW LEVEL

如果无法正常显示,请先停止浏览器的去广告插件。
分享至:
相关话题: #zalando
1. POSTGRESQL AT LOW LEVEL STAY CURIOUS! DMITRY DOLGOV 17-05-2019
2. patroni 1 & postgres-operator
3. pg_stat_* PG K8S 2
4. pg_stat_* CPU/IO PG OS K8S 2
5. pg_stat_* CPU/IO PG ??? CG K8S 2 OS
6. pg_stat_* CPU/IO PG ??? ??? VM K8S 2 CG OS
7. pg_stat_* CPU/IO PG ??? ??? ??? 2 VM K8S CG OS
8. 3
9. Plan? 4
10. A bit chaotic dailymail.co.uk 4
11. Info sources source code strace/GDB/Perf procfs/sysfs BPF/eBPF/BCC 5
12. Shared memory ERROR: could not resize shared memory segment "/PostgreSQL.699663942" to 50438144 bytes: No space left on device 6
13. # strace -k -p PID openat(AT_FDCWD, "/dev/shm/PostgreSQL.62223175" ftruncate(176, 50438144) = 0 fallocate(176, 0, 0, 50438144) = -1 ENOSPC > libc-2.27.so(posix_fallocate+0x16) [0x114f76] > postgres(dsm_create+0x67) [0x377067] ... > postgres(ExecInitParallelPlan+0x360) [0x254a80] > postgres(ExecGather+0x495) [0x269115] > postgres(standard_ExecutorRun+0xfd) [0x25099d] ... > postgres(exec_simple_query+0x19f) [0x39afdf] 7
14. # strace -k -p PID openat(AT_FDCWD, "/dev/shm/PostgreSQL.62223175" ftruncate(176, 50438144) = 0 fallocate(176, 0, 0, 50438144) = -1 ENOSPC > libc-2.27.so(posix_fallocate+0x16) [0x114f76] > postgres(dsm_create+0x67) [0x377067] ... > postgres(ExecInitParallelPlan+0x360) [0x254a80] > postgres(ExecGather+0x495) [0x269115] > postgres(standard_ExecutorRun+0xfd) [0x25099d] ... > postgres(exec_simple_query+0x19f) [0x39afdf] 7
15. vDSO # strace -k -p PID on XEN gettimeofday({tv_sec=1550586520, tv_usec=313499}, NULL) = 0 > [vdso]() [0xef0] Two frequently used system calls are 77% slower on AWS EC2 8
16. Scheduling T2 9 c T3 c
17. Scheduling T2 9 c T3 c
18. Andres Freund: New intel MDS vulnerability mitigations cause measurable slowdown 10
19. MDS # Children # ........ 71.06% 71.06% 56.82% 25.19% 25.14% 23.60% 11 Self ........ 0.00% 0.00% 0.14% 0.06% 0.29% 0.14% Symbol ................................... [.] __libc_start_main [.] PostmasterMain [.] exec_simple_query [k] entry_SYSCALL_64_after_hwframe [k] do_syscall_64 [.] standard_ExecutorRun
20. MDS # Children # ........ 71.06% 71.06% 56.82% 25.19% 25.14% 23.60% 11 Self ........ 0.00% 0.00% 0.14% 0.06% 0.29% 0.14% Symbol ................................... [.] __libc_start_main [.] PostmasterMain [.] exec_simple_query [k] entry_SYSCALL_64_after_hwframe [k] do_syscall_64 [.] standard_ExecutorRun
21. MDS # Percent # ........ 0.01% : 28.94% : 0.55% : 3.24% : 12 Disassembly of kcore for cycles ................................ nopl 0x0(%rax,%rax,1) verw 0xffe9e1(%rip) pop %rbx pop %rbp
22. MDS # Percent # ........ 0.01% : 28.94% : 0.55% : 3.24% : 12 Disassembly of kcore for cycles ................................ nopl 0x0(%rax,%rax,1) verw 0xffe9e1(%rip) pop %rbx pop %rbp
23. MDS # Overhead # ........ 25.19% 13 Symbol ................................... [k] native_safe_halt
24. MDS static inline __cpuidle void native_safe_halt(void) { mds_idle_clear_cpu_buffers(); asm volatile("sti; hlt": : :"memory"); } 13
25. MDS static inline __cpuidle void native_safe_halt(void) { mds_idle_clear_cpu_buffers(); asm volatile("sti; hlt": : :"memory"); } 13
26. Huge pages transparent vs classic TLB misses are faster and less frequent 14
27. Huge pages # perf record -e dTLB-loads,dTLB-stores -p PID # huge_pages on Samples: 832K of event 'dTLB-load-misses' Event count (approx.): 640614445 : ~19% less Samples: 736K of event 'dTLB-store-misses' Event count (approx.): 72447300 : ~29% less # huge_pages off Samples: 894K of event Event count (approx.): Samples: 822K of event Event count (approx.): 15 'dTLB-load-misses' 784439650 'dTLB-store-misses' 101471557
28. Huge pages # perf record -e dTLB-loads,dTLB-stores -p PID # huge_pages on Samples: 832K of event 'dTLB-load-misses' Event count (approx.): 640614445 : ~19% less Samples: 736K of event 'dTLB-store-misses' Event count (approx.): 72447300 : ~29% less # huge_pages off Samples: 894K of event Event count (approx.): Samples: 822K of event Event count (approx.): 15 'dTLB-load-misses' 784439650 'dTLB-store-misses' 101471557
29. VM : : : : Lock holder preemption problem Lock waiter preemption problem Intel PLE (pause loop exiting) PLE_Gap, PLE_Window Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol. 3 16
30. vCPU vC1 vC2 vC3 Hypervisor 17 vC4
31. vCPU  vC1 vC2 vC3 Hypervisor 17 vC4
32. vCPU  vC1 vC2 vC3 Hypervisor 17 vC4
33. # latency average = 17.782 ms => modprobe kvm-intel ple_gap=128 => perf record -e kvm:kvm_exit reason PAUSE_INSTRUCTION 306795 18
34. # latency average = 17.782 ms => modprobe kvm-intel ple_gap=128 => perf record -e kvm:kvm_exit reason PAUSE_INSTRUCTION 306795 # latency average = 16.858 ms => modprobe kvm-intel ple_gap=0 => perf record -e kvm:kvm_exit reason PAUSE_INSTRUCTION 0 18
35. # latency average = 17.782 ms => modprobe kvm-intel ple_gap=128 => perf record -e kvm:kvm_exit reason PAUSE_INSTRUCTION 306795 # latency average = 16.858 ms => modprobe kvm-intel ple_gap=0 => perf record -e kvm:kvm_exit reason PAUSE_INSTRUCTION 0 18
36. 19
37. Userspace vfs_read Bytecode Regs … 20 Stack … Maps …
38. Userspace vfs_read Bytecode Regs … 20 Stack … Maps …
39. Userspace vfs_read Bytecode Regs … 20 Stack … Maps …
40. Tunables # from /proc/sys/kernel/ sched_wakeup_granularity_ns # default = 1 msec * (1 + ilog(ncpus)) 21
41. pgbench and pg_dump usecs 0 2 4 8 16 32 64 128 256 512 1024 2048 user sys real 22 -> -> -> -> -> -> -> -> -> -> -> -> 1 3 7 15 31 63 127 255 511 1023 2047 4095 1m9.127s 0m2.066s 1m38.990s : : : : : : : : : : : : : count 16 4604 6812 14888 19267 65795 50454 16393 5981 12300 48 0 distribution : : :** : :**** : :********* : :*********** : :****************************************: :****************************** : :********* : :*** : :******* : : : : :
42. pgbench and pg_dump usecs 0 2 4 8 16 32 64 128 256 512 1024 2048 user sys real 22 -> -> -> -> -> -> -> -> -> -> -> -> 1 3 7 15 31 63 127 255 511 1023 2047 4095 1m9.127s 0m2.066s 1m38.990s : : : : : : : : : : : : : count 16 4604 6812 14888 19267 65795 50454 16393 5981 12300 48 0 distribution : : :** : :**** : :********* : :*********** : :****************************************: :****************************** : :********* : :*** : :******* : : : : :
43. pgbench and pg_dump usecs 0 2 4 8 16 32 64 128 256 512 1024 2048 user sys real 23 -> -> -> -> -> -> -> -> -> -> -> -> 1 3 7 15 31 63 127 255 511 1023 2047 4095 1m8.559s 0m1.641s 1m32.030s : : : : : : : : : : : : : count 1 8 25 46 189 119 96 93 238 323 1012 47 distribution : : : : : : :* : :******* : :**** : :*** : :*** : :********* : :************ : :****************************************: :* :
44. pgbench and pg_dump usecs 0 2 4 8 16 32 64 128 256 512 1024 2048 user sys real 23 -> -> -> -> -> -> -> -> -> -> -> -> 1 3 7 15 31 63 127 255 511 1023 2047 4095 1m8.559s 0m1.641s 1m32.030s : : : : : : : : : : : : count 1 8 25 46 189 119 96 93 238 323 1012 47 distribution : : : : : : :* : :******* : :**** : :*** : :*** : :********* : :************ : :****************************************: :* :
45. github.com/iovisor/bcc/ github.com/erthalion/postgres-bcc 24
46. Cache => llcache_per_query.py bin/postgres PID QUERY CPU REFERENCE MISS HIT% 9720 UPDATE pgbench_tellers ... 0 2000 1000 50.00% 9720 SELECT abalance FROM ... 2 2000 100 95.00% ... Total References: 3303100 Total Misses: 599100 Hit Rate: 81.86% 25
47. Remember? 26
48. Shared memory => shmem.py bin/postgres mmap: [20439]: 142M anon shm: [20439]: 56B shm: [postmaster.opts]: 0B [PostgreSQL.57332071]: 7K 27
49. Dirty pages bgw linux chkp OS Cache Storage 28
50. Dirty pages bgw linux chkp OS Cache Storage 28
51. Dirty pages bgw linux chkp OS Cache Storage 28
52. Dirty pages bgw linux chkp OS Cache Storage 28
53. Writeback (cgroup v1) /* vmscan.c */ /* The normal page dirty throttling mechanism * in balance_dirty_pages() is completely broken * with the legacy memcg and direct stalling in * shrink_page_list() is used for throttling instead, * which lacks all the niceties such as fairness, * adaptive pausing, bandwidth proportional * allocation and configurability. */ static bool sane_reclaim(struct scan_control *sc) 29
54. Pages written, kernel 30
55. Writeback => perf record -e writeback:writeback_written kworker/u8:1 reason=periodic nr_pages=101429 kworker/u8:1 reason=background nr_pages=MAX_ULONG kworker/u8:3 reason=periodic nr_pages=101457 31
56. Writeback # pgbench insert workload => io_timeouts.py bin/postgres [18335] [18333] [18331] [18318] 32 END: MAX_SCHEDULE_TIMEOUT END: MAX_SCHEDULE_TIMEOUT END: MAX_SCHEDULE_TIMEOUT truncate pgbench_history: MAX_SCHEDULE_TIMEOUT
57. Kubernetes resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" 33
58. Kubernetes resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" 33 soft_limits_in_bytes limits_in_bytes
59. 34
60. Kubernetes resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" 35 soft_limits_in_bytes limits_in_bytes
61. Memory reclaim # only under the memory pressure => page_reclaim.py --container 89c33bb3133f [7382] [7138] [7136] [7468] [7464] [5451] 36 postgres: postgres: postgres: postgres: postgres: postgres: 928K 152K 180K 72M 57M 1M
62. How to run? # bcc + postgres-bcc CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_NET_ACT_BPF=m CONFIG_BPF_JIT=y CONFIG_BPF_EVENTS=y debugfs on /sys/kernel/debug type debugfs (rw) 37
63. How to run: container? # sometimes you also need to let perf know # where to find debugging symbols, e.g. copy # from /usr/lib/.debug/ docker run --priviledged --net=container:<container-id> --ipc=container:<container-id> 38
64. How to run: K8S? spec: serviceAccountName: "bcc" hostPID: true containers: - name: "bcc" securityContext: privileged: true # 4 * 65536 + 14 * 256 + 96 => export BCC_LINUX_VERSION_CODE 265824 39
65. How to break? # unsafe access => perf probe -x bin/postgres --funcs => perf probe -x bin/postgres 'ExecCallTriggerFunc trigdata->?' => perf record probe_postgres:ExecCallTriggerFunc 40
66. How to break? # non interruptible sleep => perf probe -x bin/postgres --funcs => perf probe -x bin/postgres 'XLogInsertRecord fpw_lsn' 41
67. How to break? 42
68. Questions?  github.com/erthalion  github.com/erthalion/postgres-bcc  @erthalion  dmitrii.dolgov at zalando dot de  9erthalion6 at gmail dot com 43

首页 - Wiki
Copyright © 2011-2025 iteam. Current version is 2.142.1. UTC+08:00, 2025-04-03 04:50
浙ICP备14020137号-1 $访客地图$