PG数据库使用repmgr自动切换

若想要实现高可用集群的自动故障转移，就必须使用repmgrd的守护程序，用来监控和记录复制集群信息、检测集群复制故障并决策选出最佳服务器并提升为主服务器，启动repmgrd的同时需要我们在PostgreSQL数据库中添加repmgr的共享库。修改后重启PG。

qq_40921573

214人浏览 · 2024-07-14 16:24:37

qq_40921573 · 2024-07-14 16:24:37 发布

一、共享库加载repmgr

若想要实现高可用集群的自动故障转移，就必须使用repmgrd的守护程序，用来监控和记录复制集群信息、检测集群复制故障并决策选出最佳服务器并提升为主服务器，启动repmgrd的同时需要我们在PostgreSQL数据库中添加repmgr的共享库。

$ vim /home/postgres/pg15.5/data/postgresql.conf 
shared_preload_libraries = 'repmgr'

启动后台进程：

repmgrd -f /home/postgres/repmgr/repmgr.conf --pid-file /home/postgres/repmgr/repmgrd.pid --daemonize

删除后台进程：

[postgres@fl-prod-worker01 ~]$ ps -ef | grep repmgr
postgres  7153     1  0 22:06 ?        00:00:01 repmgrd -f /home/postgres/repmgr/repmgr.conf --pid-file /home/postgres/repmgr/repmgrd.pid --daemonize
kill 7153

修改后重启PG

[postgres@fl-prod-pg02 .ssh]$ pg_ctl restart
waiting for server to shut down.... done
server stopped
waiting for server to start....2024-03-09 10:47:44.297 CST [30772] LOG:  starting PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2024-03-09 10:47:44.297 CST [30772] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-03-09 10:47:44.297 CST [30772] LOG:  listening on IPv6 address "::", port 5432
2024-03-09 10:47:44.298 CST [30772] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-03-09 10:47:44.301 CST [30775] LOG:  database system was shut down in recovery at 2024-03-09 10:47:44 CST
2024-03-09 10:47:44.301 CST [30775] LOG:  entering standby mode
2024-03-09 10:47:44.304 CST [30775] LOG:  consistent recovery state reached at 0/3060E40
2024-03-09 10:47:44.304 CST [30775] LOG:  invalid record length at 0/3060E40: wanted 24, got 0
2024-03-09 10:47:44.304 CST [30772] LOG:  database system is ready to accept read-only connections
2024-03-09 10:47:44.309 CST [30776] LOG:  started streaming WAL from primary at 0/3000000 on timeline 5
 done
server started

二、模拟主库宕机

 在master（134）上操作：

[postgres@fl-prod-pg01 .ssh]$ pg_ctl stop
waiting for server to shut down...2024-03-09 10:48:31.532 CST [7633] LOG:  received fast shutdown request
.2024-03-09 10:48:31.533 CST [7633] LOG:  aborting any active transactions
2024-03-09 10:48:31.535 CST [7633] LOG:  background worker "logical replication launcher" (PID 7639) exited with exit code 1
2024-03-09 10:48:31.537 CST [7634] LOG:  shutting down
2024-03-09 10:48:31.547 CST [7634] LOG:  checkpoint starting: shutdown immediate
2024-03-09 10:48:31.551 CST [7634] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB
2024-03-09 10:48:31.561 CST [7633] LOG:  database system is shut down
 done
server stopped

三、查看集群状态：

[postgres@fl-stg-worker03 pg]$ repmgr -f /home/postgres/repmgr/repmgr.conf cluster show
 ID  | Name    | Role    | Status        | Upstream  | Location  | Priority | Timeline | Connection string
-----+---------+---------+---------------+-----------+-----------+----------+----------+------------------------------------------------------------------------
 132 | node132 | standby |   running     | ? node135 | location1 | 100      | 9        | host=10.51.3.132 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 134 | node134 | witness | * running     | ? node135 | default   | 0        | n/a      | host=10.51.3.134 port=5433 user=repmgr dbname=repmgr connect_timeout=2
 135 | node135 | primary | ? unreachable | ?         | location1 | 100      |          | host=10.51.3.135 port=5432 user=repmgr dbname=repmgr connect_timeout=2

[postgres@fl-stg-worker03 pg]$ repmgr -f /home/postgres/packet/repmgr-conf/repmgr.conf cluster show
 ID  | Name    | Role    | Status        | Upstream  | Location  | Priority | Timeline | Connection string
-----+---------+---------+---------------+-----------+-----------+----------+----------+------------------------------------------------------------------------
 132 | node132 | primary |   running     |           | location1 | 100      | 9        | host=10.51.3.132 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 134 | node134 | witness | * running     | node132 | default   | 0        | n/a      | host=10.51.3.134 port=5433 user=repmgr dbname=repmgr connect_timeout=2
 135 | node135 | primary | ? unreachable | ?         | location1 | 100      |          | host=10.51.3.135 port=5432 user=repmgr dbname=repmgr connect_timeout=2