Fault tolerant PostgreSQL cluster. Part 3. Auto-failover. Repmgr daemon.

Part 1. Introduction.
Part 2. Replication. Repmgr.
Part 3. Auto-failover. Repmgr daemons.
Part 4. HA. Backup. Repmgr daemons. Barman.
Part 5. Testing. Ansible. Final thoughts.

At this part, we will create two repmgr daemons on backup server and setup auto-failover.

Why not place daemons on db servers? Because in this case we don’t have voting majority, which dispose to split-brain. If db1 losing connection with db2, we get unnecessary switchover and two masters. Since we are using backup server as the only entry point, we can switch servers only if we actually need it. It’s not a split-brainless solution, but much better.

First, we need to create passwordless ssh connection to db1 and db2 servers, letting repmgr daemons automatically switchover them. ssh-copy-id will ask you for a postgres user password we changed at Part 1.
Execute this on backup server.

su - postgres
ssh-keygen -b 2048 -t rsa -N "" -C "postgres@10.8.3.1"
ssh-copy-id postgres@10.8.1.1
ssh-copy-id postgres@10.8.2.1

Let’s allow repmgr to connect to our PostgreSQL instances without password. Add this line to beginning of pg_hba.conf on db1 and db2.
/etc/postgresql/11/main/pg_hba.conf

host repmgr repmgr 10.8.3.1/32 md5

Restart PostgreSQL.

systemctl restart postgresql

Add pgpass file on backup server, so repmgr daemons can connect to database without a password.

su - postgres
echo "*:*:*:repmgr:hdf456hw" > ~/.pgpass
chmod 600 ~/.pgpass

Disable OpenVPN and PostgreSQL autostart on db1 and db2 servers. Why? If, after reboot, OpenVPN will start automatically, repmgr daemon will wait forever until PostgreSQL start, which means no auto switchover. If PostgreSQL will start automatically, you will get an instant split-brain. We will learn how to properly return PostgreSQL instance back to cluster in Part 5.

update-rc.d openvpn disable
update-rc.d postgresql disable

Let’s create two repmgr configs on backup server.
/etc/repmgrd_db1.conf

node_id=1
node_name=db1
conninfo='host=10.8.1.1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/11/main'

failover=automatic
promote_command='ssh postgres@10.8.1.1 /usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='ssh postgres@10.8.1.1 /usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
service_promote_command='ssh postgres@10.8.1.1 /usr/lib/postgresql/11/bin/pg_ctl -w -D /var/lib/postgresql/11/main promote'
monitoring_history=yes
monitor_interval_secs=2

log_facility=LOCAL0

/etc/repmgrd_db2.conf

node_id=2
node_name=db2
conninfo='host=10.8.2.1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/11/main'

failover=automatic
promote_command='ssh postgres@10.8.2.1 /usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='ssh postgres@10.8.2.1 /usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
service_promote_command='ssh postgres@10.8.2.1 /usr/lib/postgresql/11/bin/pg_ctl -w -D /var/lib/postgresql/11/main promote'
monitoring_history=yes
monitor_interval_secs=2

log_facility=LOCAL0

Create two systemd daemon configs.
/etc/systemd/system/repmgrd_db1.service

[Unit]
Description=repmgrd_db1
After=syslog.target
After=network.target

[Service]
Type=forking
Restart=on-failure
RestartSec=10
User=postgres
Group=postgres
TimeoutStopSec=5
ExecStart=/usr/lib/postgresql/11/bin/repmgrd -d --config-file /etc/repmgrd_db1.conf --pid-file=/var/run/repmgr/repmgrd_db1.pid
StandardOutput=syslog
StandardError=syslog
PIDFile=/var/run/repmgr/repmgrd_db1.pid

[Install]
WantedBy=multi-user.target

/etc/systemd/system/repmgrd_db2.service

[Unit]
Description=repmgrd db2
After=syslog.target
After=network.target

[Service]
Type=forking
Restart=on-failure
RestartSec=10
User=postgres
Group=postgres
TimeoutStopSec=5
ExecStart=/usr/lib/postgresql/11/bin/repmgrd -d --config-file /etc/repmgrd_db2.conf --pid-file=/var/run/repmgr/repmgrd_db2.pid
StandardOutput=syslog
StandardError=syslog
PIDFile=/var/run/repmgr/repmgrd_db2.pid

[Install]
WantedBy=multi-user.target

Now, create PID directory config.
/usr/lib/tmpfiles.d/repmgr.conf

d /run/repmgr 2775 postgres postgres -

Create PID directory.

systemd-tmpfiles --create

Enable and restart both daemons.

systemctl enable repmgrd_db1
systemctl restart repmgrd_db1
systemctl enable repmgrd_db2
systemctl restart repmgrd_db2

Check daemons running.

ps aux | grep repmgrd

Check cluster is running using postgres user.

su - postgres
repmgr -f /etc/repmgrd_db1.conf cluster show
repmgr -f /etc/repmgrd_db2.conf cluster show

At the Part 4 we will set up barman and make cluster high available.

Part 1. Introduction.Part 2. Replication. Repmgr.Part 3. Auto-failover. Repmgr daemons.Part 4. HA. Backup. Repmgr daemons. Barman.Part 5. Testing. Ansible. Final thoughts.

Leave a Comment Cancel Reply

Part 1. Introduction.
Part 2. Replication. Repmgr.
Part 3. Auto-failover. Repmgr daemons.
Part 4. HA. Backup. Repmgr daemons. Barman.
Part 5. Testing. Ansible. Final thoughts.