repmgr教程
简介
repmgr是一个开源工具套件,用于管理PostgreSQL服务器集群中的复制和故障转移。它使用工具来增强PostgreSQL的内置热备份功能,以设置备用服务器,监控复制以及执行管理任务,例如故障转移或手动切换操作。
前期准备
两台装有 postgresql 9.5 的服务器。
1. 安装
1.1 下载源码包https://repmgr.org/download/repmgr-5.0.0.tar.gz
1.2 解压
1 | tar -zxvf repmgr-5.0.0.tar.gz |
1.3 进入目录并编译
安装编译所需库
- flex
- libedit-dev
- libkrb5-dev
- libpam0g-dev
- libreadline-dev
- libselinux1-dev
- libssl-dev
- libxml2-dev
- libxslt1-dev
1
apt install flex libedit-dev libkrb5-dev libpam0g-dev libreadline-dev libselinux1-dev libssl-dev libxml2-dev libxslt1-dev
1
./configure && make install
2. 配置
可复制 sample, sample 位于解压的根目录 repmgr.conf.sample。修改以下关键信息1
cp repmgr.conf.sample /etc/postgresql/repmgr.conf
主从1
2
3
4
5
6node_id=1
node_name='138'
conninfo='host=db1 user=postgres dbname=postgres password=d connect_timeout=2'
data_directory='/var/lib/postgresql/9.5/main'
pg_bindir='/usr/lib/postgresql/9.5/bin'
repmgr_bindir='/usr/lib/postgresql/9.5/bin'1
2
3
4
5
6node_id=2
node_name='139'
conninfo='host=db2 user=postgres dbname=postgres password=d connect_timeout=2'
data_directory='/var/lib/postgresql/9.5/main'
pg_bindir='/usr/lib/postgresql/9.5/bin'
repmgr_bindir='/usr/lib/postgresql/9.5/bin'3. 从库复制主库
主库中创建extension,psql 连接数据库后输入创建流复制角色1
create extension repmgr
1
CREATE ROLE repl login replication encrypted password 'd'
3.1 修改主库配置
postgresql.conf, 和一般的流复制配置无异
1 | listen_addresses = '*' # 监听来自任何地址的请求 |
pg_hba.conf
1 | # local DATABASE USER METHOD [OPTIONS] |
3.2 主库注册到 repmgr,并查看
1 | repmgr -f /etc/postgresql/repmgr.conf primary register |
1 | repmgr cluster show -f /etc/postgresql/repmgr.conf |
3.3 从库配置
将密码连接信息存储在.pgpass文件
1 | vim ~/.pgpass |
输入以下信息,格式为 host:port:DBName:username:password
1 | db1:5432:postgres:postgres:d |
然后设置权限chmod 600 ~/.pgpass
clone 执行,执行前需要清空 pgdata 目录
1 | repmgr -h db1 -U postgres -d postgres -f /etc/postgresql/repmgr.conf standby clone |
你会发现目录生成了,且 recovery.conf 也生成了。
启动从库
1 | pg_ctl -D . start |
然后注册从库
1 | repmgr standby register -f /etc/postgresql/repmgr.conf |
查看节点
1 | repmgr cluster show -f /etc/postgresql/repmgr.conf |
4.配置自动failover(如果使用了 pgpool failover,则不要使用 repmgr 的failover)
- 因为自动 failover 基于repmgrd,首先需要包含其关联库,postgresql.conf,配置下面一行,需要重启主从都需要重启:
1
shared_preload_libraries ='repmgr'
1
pg_ctl -D . restart
- 配置repmgr.conf,配置在上面配置的基础上添加(主从都需要)
1
2
3
4failover=automatic # 开启自动故障转移
promote_command='/usr/lib/postgresql/9.5/bin/repmgr standby promote -f /etc/postgresql/repmgr.conf --log-to-file'
follow_command='/usr/lib/postgresql/9.5/bin/repmgr standby follow -f /etc/postgresql/repmgr.conf --log-to-file --upstream-node-id=%n'
#上面这两个命令必须绝对路径 - 主备均要启动 repmgrd 服务
配置如果改了,则需要 kill 掉,然后重新启动.1
repmgrd -f /etc/postgresql/repmgr.conf
- 测试
- 主库挂掉
1
pg_ctl -D . stop
- 主库挂掉
- 2 副库则看到可以看到副库已经变成了主库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26[2020-02-13 20:14:49] [INFO] checking state of node 1, 6 of 6 attempts
[2020-02-13 20:14:49] [WARNING] unable to ping "user=postgres password=d connect_timeout=2 dbname=postgres host=db1 fallback_application_name=repmgr"
[2020-02-13 20:14:49] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-02-13 20:14:49] [WARNING] unable to reconnect to node 1 after 6 attempts
[2020-02-13 20:14:49] [INFO] 0 active sibling nodes registered
[2020-02-13 20:14:49] [INFO] primary and this node have the same location ("default")
[2020-02-13 20:14:49] [INFO] no other sibling nodes - we win by default
[2020-02-13 20:14:49] [NOTICE] this node is the only available candidate and will now promote itself
[2020-02-13 20:14:49] [INFO] promote_command is:
"/usr/lib/postgresql/9.5/bin/repmgr standby promote -f /etc/postgresql/repmgr.conf --log-to-file"
[2020-02-13 20:14:49] [NOTICE] promoting standby to primary
[2020-02-13 20:14:49] [DETAIL] promoting server "139" (ID: 2) using "/usr/lib/postgresql/9.5/bin/pg_ctl -w -D '/var/lib/postgresql/9.5/main' promote"
LOG: received promote request
[2020-02-13 20:14:49] [NOTICE] LOG: redo done at 0/5000028
waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
LOG: last completed transaction was at log time 2020-02-13 04:13:29.417036-08
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
[2020-02-13 20:14:50] [NOTICE] STANDBY PROMOTE successful
[2020-02-13 20:14:50] [DETAIL] server "139" (ID: 2) was successfully promoted to primary
[2020-02-13 20:14:50] [INFO] 0 followers to notify
[2020-02-13 20:14:50] [INFO] switching to primary monitoring mode
[2020-02-13 20:14:50] [NOTICE] monitoring cluster primary "139" (ID: 2)1
2
3
4
5
6
7
8postgres@ubuntu:~/9.5/main$ repmgr cluster show -f /etc/postgresql/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Connection string
----+------+---------+-----------+----------+----------+----------+---------------------------------------------------------------------
1 | 138 | primary | - failed | | default | 100 | host=db1 user=postgres dbname=postgres password=d connect_timeout=2
2 | 139 | primary | * running | | default | 100 | host=db2 user=postgres dbname=postgres password=d connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "138" (ID: 1) - 3 恢复
也就是将主库设置为新的从库
创建 recovery.conf 文件。
内部配置和备用库基本一样启动1
2
3standby_mode = 'on' # 热备份mode
primary_conninfo = 'user=repl password=d host=192.168.56.139 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres' # 连接信息
recovery_target_timeline = 'latest' # 复制最新数据可以看到1
pg_ctl start
成功切换1
2
3
4
5postgres@ubuntu:~/9.5/main$ repmgr cluster show -f /etc/postgresql/repmgr.conf
ID | Name | Role | Status | Upstream | Location | Priority | Connection string
----+------+---------+-----------+----------+----------+----------+-------------------------------------------------------------------- -
1 | 138 | standby | running | | default | 100 | host=db1 user=postgres dbname=postgres password=d connect_timeout=2
2 | 139 | primary | * running | | default | 100 | host=db2 user=postgres dbname=postgres password=d connect_timeout=2
问题
apt 安装后并配置数据库会导致启动数据库时could not access file “repmgr”: No such file or directory
It means the repmgr extension code is not installed in the PostgreSQL application directory. This typically happens when using PostgreSQL packages provided by a third-party vendor, which often have different filesystem layouts.Either use PostgreSQL packages provided by the community or 2ndQuadrant; if this is not possible, contact your vendor for assistance.
简单就是说用的可能时第三方的包,还是从官网下载包并编译安装吧。提示未输入密码
1
2
3
4
5
6ERROR: connection to database failed
DETAIL:
fe_sendauth: no password supplied
ERROR: unable to establish necessary replication connections
HINT: check replication permissions on the source server检查 pg_hba.conf,以及 ~/.pgpass 是否输入正确的用户密码。没找到问题,然后重新初始化数据库部署了一次,就好了。
常用命令
参考:
http://www.cslingjun.com/2019/11/27/PostgreSQL%E9%AB%98%E5%8F%AF%E7%94%A8%E9%9B%86%E7%BE%A4repmgr/