Linux下PostgreSQL主备环境搭建和切换

2023-10-27

1. 概念

数据库热备：数据库热备是指为主数据库的创建、维护和监控一个或多个备用数据库，它们时刻处于开机状态，同主机保持同步。当主机失灵时，可以随时启用热备数据库来代替，以保护数据不受故障、灾难、错误和崩溃的影响。

流复制（streaming replication）：PostgreSQL提供的一种服务器间的数据复制方式。这种方式下，后备服务器连接到主服务器，主服务器则在 WAL 记录产生时即将它们以流式传送给后备服务器而不必等到 WAL 文件被填充。

2. 实验环境

服务器两台。

操作系统：Linux CentOS 6.6

软件：PostgreSQL 10.0

主服务器的IP地址为10.40.239.101，后备服务器IP 地址为 10.40.239.102。

3. 实验过程

3.1 实验准备

两台虚拟机上都安装了PostgreSQL 10.0，它们的安装目录是：/opt/postgresql10/

在/etc/profile 中添加如下几行：

export PATH=/opt/postgresql10/bin:$PATH

export LD_LIBRARY_PATH=/opt/postgresql10/lib:$LD_LIBRARY_PATH

export PGDATA=/opt/postgresql10/data

随后加载使之生效：

source /etc/profile

3.2 搭建PostgreSQL 主备环境。

3.2.1 主节点上的操作

1. 确保服务已经启动。执行命令：

 su postgres

切换用户，并执行：

pg_ctl start -D $PGDATA

启动服务。

2. 创建用于流复制的用户。执行命令：

psql -h 127.0.0.1 -p 5432 -U postgres

进入控制台，并执行如下语句创建用户：

create user repuser with login replication password '123456';

3. 修改pg_hba.conf 文件，添加如下内容，允许两台计算机上的复制用户和超级用户登录：

host    replication     repuser         10.33.45.101/32         md5

host    replication     repuser         10.33.45.102/32         md5

host      all           postgres           10.33.45.101/32              trust

host      all           postgres           10.33.45.102/32              trust

4. 在主节点的 postgresql.conf 中设置这些参数：

max_wal_senders = 10
wal_level = replica
wal_log_hints = on
wal_keep_segments = 10
wal_receiver_status_interval = 5s
hot_standby_feedback = on

这些参数中的含义如下：

max_wal_senders表示来自后备服务器或流式基础备份客户端的并发连接的最大数量；

wal_level 表示日志级别，对于流复制，它的值应设置为replica；

wal_log_hints = on表示，在PostgreSQL服务器一个检查点之后页面被第一次修改期间，把该磁盘页面的整个内容都写入 WAL，即使对所谓的提示位做非关键修改也会这样做；

wal_keep_segments 指定在后备服务器需要为流复制获取日志段文件的情况下，pg_wal（PostgreSQL 9.6 以下版本的是pg_xlog）目录下所能保留的过去日志文件段的最小数目。

log_connections 表示是否在日志中记录客户端对服务器的连接；

wal_receiver_status_interval 指定在后备机上的 WAL 接收者进程向主服务器或上游后备机发送有关复制进度的信息的最小周期；

hot_standby_feedback 指定一个热后备机是否将会向主服务器或上游后备机发送有关于后备机上当前正被执行的查询的反馈，这里设置为on。

关于详细内容，可以参考postgresql官方文档。

5. 重启主节点：

pg_ctl restart -D $PGDATA

6. 重启之后，为主服务器和后备服务器创建复制槽，这一步不是必须的。

select * from pg_create_physical_replication_slot('postgresql_node101');

select * from pg_create_physical_replication_slot(' postgresql_node102');

复制槽（replication slot）的作用是：

1. 在流复制中，当一个备节点断开连接是时，备节点通过hot_standby_feedback 提供反馈数据数据会丢失。当备节点重新连接时，它可能因为被主节点发送清理记录而引发查询冲突。复制槽即使在备节点断开时仍然会记录下备节点的xmin（复制槽要需要数据库保留的最旧事务ID）值，从而确保不会有清理冲突。

2. 当一个备节点断开连接时，备节点需要的WAL文件信息也丢失了。如果没有复制槽，当备节点重连时，我们可能已经丢弃了所需要的WAL文件，因此需要完全重建备节点。而复制槽确保这个节点保留所有下游节点需要的wal文件。

3.2.2 备节点上的操作

1. 确保服务是停止的：

su postgres

切换用户，并执行：

pg_ctl stop -D $PGDATA

关闭服务。

2. 首先删除备节点中的数据目录 $PGDATA 中的文件：

cd $PGDATA

rm –rf *

然后执行：

pg_basebackup -Xs -d "hostaddr=10.33.45.101 port=5432 user=repuser password=123456" -D $PGDATA -v -Fp

这里,-Xs 表示复制方式是流式的（stream），这种方式不会复制在此次备份开始前，已经归档完成的WAL文件；-d 后面是一个连接字符串，其中“hostaddr=10.40.239.101”表示主服务器的ip地址是10.40.239.101，“port=5432”表示数据库的端口是5432，“user=repuser”表示用于流复制的用户是repuser， “password=123456”表示密码是123456；“-D $PGDATA”表示将备份内容输入到本地的 $PGDATA 目录；“-v”表示打印详细信息，–Fp 表示复制结果输出位普通（plain）文件。

3. 基础备份完成后，修改备节点的 postgresql.conf 文件，设置：

hot_standby = on

4. 将 /opt/postgresql10/share/ 中的 recovery.conf.sample 拷贝到 $PGDATA 下，重命名为 recovery.conf：

cp /opt/postgresql10/share/recovery.conf.sample $PGDATA/recovery.conf

并设置如下参数：

recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=10.40.239.101 port=5432 user=repuser password=repuser123'
primary_slot_name = 'postgresql_node102'
trigger_file = 'tgfile'

这些参数的含义如下：

recovery_target_timeline 表示恢复到数据库时间线的上的什么时间点，这里设置为latest，即最新。

standby_mode 表示是否将PostgreSQL服务器作为一个后备服务器启动，这里设置为on，即后备模式。

primary_conninfo指定后备服务器用来连接主服务器的连接字符串，其中“host=10.40.239.101”表示主服务器的ip地址是10.40.239.101，“port=5432”表示数据库的端口是5432，“user=repuser”表示用于流复制的用户是repuser， “password=123456”表示密码是123456。

primary_slot_name 指定通过流复制连接到主服务器时使用一个现有的复制槽来控制上游节点上的资源移除。这里我们指定3.2.1节创建的postgresql_node102。如果没有在主服务器上创建复制槽，则不配置此参数。

trigger_file指定一个触发器文件，该文件的存在会结束后备机中的恢复，使它成为主机。

5. 启动备节点服务：

pg_ctl start -D $PGDATA

3.3 主备环境检测

1. 在主节点上创建一个表,并插入数据：

postgres=# create table student  (id int, name text);

CREATE TABLE

postgres=# insert into student  (id, name)  values  (1,'tom');

INSERT 0 1

2. 在备节点上检测：

postgres=# select * from student;

 id | name

----+------

  1 | tom

主节点数据同步到了备机。

同时，在备节点上写数据会失败：

postgres=# insert into student (id, name)  values  (2,'amy');

ERROR:  cannot execute INSERT in a read-only transaction

3.4 主备环境的切换

1. 激活备节点，使之成为新的主结点：

pg_ctl promote -D $PGDATA

结果是：

waiting for server to promote........ done

server promoted

2. 查看新主节点的状态：

postgres=#  pg_controldata | grep cluster

Database cluster state:               in production

插入一条数据：

postgres=#  insert into student (id, name)  values  (2,'amy');

INSERT 0 1

3. 停止旧的主结点：

pg_ctl stop -m fast -D $PGDATA

结果：

waiting for server to shut down.... done

server stopped

4. 在停止的旧主结点上执行恢复数据的操作：

pg_rewind --target-pgdata $PGDATA --source-server='host=10.33.45.102 port=5432 user=postgres dbname=postgres' -P

结果如下：

connected to server

servers diverged at WAL location 0/2B000230 on timeline 4

rewinding from last common checkpoint at 0/2A000098 on timeline 4

reading source file list

reading target file list

reading WAL in target

need to copy 57 MB (total source directory size is 143 MB)

58749/58749 kB (100%) copied

creating backup label and updating control file

syncing target data directory

Done!

表示从新主节点上成功获取WAL日志。

5. 重新配置旧主结点的 recovery.conf：

recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'hostaddr=10.40.239.102 port=5432 user=repuser password=repuser123'
primary_slot_name = 'postgresql_node101'

6. 在旧主结点上执行下面的命令，重新启动该节点：

pg_ctl start -D $PGDATA

7. 在旧主结点上验证：

postgres=# insert into student (id, name)  values  (3,'lily');

ERROR:  cannot execute INSERT in a read-only transaction

现在，它成为了新的备节点。

这样，我们就实现了linux下的主备节点的切换。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)