Fault tolerant PostgreSQL cluster. Part 4. High availability. Backup. Repmgr daemon. Barman.


Part 1. Introduction.
Part 2. Replication. Repmgr.
Part 3. Auto-failover. Repmgr daemons.
Part 4. HA. Backup. Repmgr daemons. Barman.
Part 5. Testing. Ansible. Final thoughts.

At this part, we will:

  • make PostgreSQL high available using repmgr daemons
  • use barman for streaming backup
  • combine HA feature and barman to store master and standby backups in the same barman place and save disk space

By default, barman can stream data from master and standby at the same time, storing backups at separate places. It’s the only option for an asynchronous replication. If master server will fall, your standby server will be inconsistent.

In case of synhronous replication, we can continue retrieving WAL from new master(ex-standby). It means, we can store backups in the same place saving disk space. To achieve this in transparent way, we will make our cluster high available.

Repmgr daemons

First, let’s turn our DB cluster to high available one. We will use repmgr events to hook promotion and following. Iptables rules will help us to redirect barman and app server traffic from ports 54321 and 54322 to a master server.
Create a script with custom iptables chains on backup server.
/usr/local/bin/repmgr_iptables_chains.sh

#!/bin/bash
#!/bin/bash
/sbin/sysctl -w net.ipv4.ip_forward=1
/sbin/iptables -t nat -N repmgr_dnat
/sbin/iptables -t nat -A OUTPUT -j repmgr_dnat
/sbin/iptables -t nat -A PREROUTING -j repmgr_dnat
/sbin/iptables -t nat -N repmgr_snat
/sbin/iptables -t nat -A POSTROUTING -j repmgr_snat
if [[ -f /usr/local/bin/repmgr_master ]]; then
        master=$(cat /usr/local/bin/repmgr_master)
        /usr/bin/sudo -u postgres /usr/local/bin/repmgr_handler.sh $master standby_promote
fi

And ensure it will be executed at every boot.

crontab -e
@reboot /usr/local/bin/repmgr_iptables_chains.sh

Create failover helper script. Repmgrd will execute it at every promote event and make our cluster high available.
/usr/local/bin/repmgr_iptables.sh

#!/bin/bash
readonly DB="$1"
readonly BACKUP="$2"
readonly PG_PORT="$3"
readonly IN_PORT="$4"

if [ -z "$5" ]; then
        iptables -t nat --flush repmgr_dnat
        iptables -t nat --flush repmgr_snat
        iptables -t nat -A repmgr_snat -p tcp -d $DB --dport $PG_PORT -j SNAT --to-source $BACKUP
fi
iptables -t nat -A repmgr_dnat -p tcp -d $BACKUP --dport $IN_PORT -j DNAT --to-destination $DB:$PG_PORT

Create repmgrd event handler script.
/usr/local/bin/repmgr_handler.sh

#!/bin/bash
readonly DB1="10.8.1.1"
readonly DB2="10.8.2.1"
readonly BACKUP="10.8.3.1"
readonly SLOT_NAME="barman"
readonly BARMAN_NAME="main_backup"
readonly PG_PORT="5432"
readonly BARMAN_PORT="54321"
readonly MASTER_PORT="54322"
readonly NODE_ID="$1"
EVENT="$2"

if [[ "$NODE_ID" == "1" ]]; then
        IP="$DB1"
else
        IP="$DB2"
fi

if [[ "$EVENT" =~ ^(primary_register|standby_promote|repmgrd_failover_promote)$ ]]; then
        sudo /usr/local/bin/repmgr_iptables.sh $IP $BACKUP $PG_PORT $BARMAN_PORT
        echo "$NODE_ID" > /usr/local/bin/repmgr_master
        psql -h $IP -p $PG_PORT -U repmgr -d repmgr -c "select pg_create_physical_replication_slot('$SLOT_NAME') where not exists(select 1 from pg_replication_slots where slot_name = '$SLOT_NAME');"
        sudo -u barman barman receive-wal --stop $BARMAN_NAME 2>/dev/null
        sudo -u barman barman switch-wal --force --archive --archive-timeout=90 $BARMAN_NAME
        sudo /usr/local/bin/repmgr_iptables.sh $IP $BACKUP $PG_PORT $MASTER_PORT "app"
fi

if [[ "$EVENT" =~ ^(standby_register|standby_follow|repmgrd_failover_follow|repmgrd_standby_reconnect)$ ]]; then
        psql -h $IP -p $PG_PORT -U repmgr -d repmgr -c "select pg_drop_replication_slot('$SLOT_NAME') from pg_replication_slots where slot_name = '$SLOT_NAME';"
fi

Let’s take a look at two interesting things.
#1. Following classic barman backup scenario, you need to create two replication slots – one on master and one on standby. In our case we should drop slot on each ex-master and create one on a new master after promotion. If we keep replication slot on ex-master server, it will grow infinitely – you will just get out of db server space fast, thanks to WAL files waiting for streaming.
#2. Before opening server for app connections using repmgr_iptables.sh script, we forcefully perform WAL switching and archieving old WAL. It resets barman and guaranteed barman not miss any WALs before independently reset.

Add execute permissions to all files.

chmod +x /usr/local/bin/repmgr_iptables_chains.sh
chmod +x /usr/local/bin/repmgr_iptables.sh
chmod +x /usr/local/bin/repmgr_handler.sh

Let repmgr daemons execute barman and helper script without a password.
/etc/sudoers.d/pg_sudoers

postgres ALL = (barman) NOPASSWD: /usr/bin/barman
postgres ALL = NOPASSWD: /usr/local/bin/repmgr_iptables.sh

Add the following line to repmgrd configs.
/etc/repmgrd_db1.conf
/etc/repmgrd_db2.conf

event_notification_command='/usr/local/bin/repmgr_handler.sh %n %e'

Restart daemons.

systemctl restart repmgrd_db1
systemctl restart repmgrd_db2

Open barman and app ports manually.

/usr/local/bin/repmgr_iptables_chains.sh
/usr/local/bin/repmgr_iptables.sh 10.8.1.1 10.8.3.1 5432 54321
/usr/local/bin/repmgr_iptables.sh 10.8.1.1 10.8.3.1 5432 54322 "app"

Add currently master node id to file.

echo 1 > /usr/local/bin/repmgr_master
chown postgres:postgres /usr/local/bin/repmgr_master

Check you can connect to both ports.

su - postgres
psql -U repmgr -d repmgr -h 10.8.3.1 -p 54321
psql -U repmgr -d repmgr -h 10.8.3.1 -p 54322

Barman

Now, barman.
Create barman users on db1.

su - postgres
createuser -s barman
psql -c "alter user barman with password 'ast84kd';"
createuser --replication streaming_barman
psql -c "alter user streaming_barman with password 'nxs62jk';"

Create pgpass for barman user on backup server.

su - barman
echo "*:*:*:barman:ast84kd" > ~/.pgpass
echo "*:*:*:streaming_barman:nxs62jk" >> ~/.pgpass
chmod 600 ~/.pgpass

Add the following lines to beginning of pg_hba.conf on db1 and db2 servers.
/etc/postgresql/11/main/pg_hba.conf

local replication streaming_barman md5
host replication streaming_barman 127.0.0.1/32 md5
host replication streaming_barman 10.8.3.1/32 md5

local postgres barman md5
host postgres barman 127.0.0.1/32 md5
host postgres barman 10.8.3.1/32 md5

Let’s do replication process synchronous.
Add the following lines to postgresql.conf on db1 server.
/etc/postgresql/11/main/postgresql.conf

synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (db2, barman_receive_wal)'

Add the following lines to postgresql.conf on db2 server.
/etc/postgresql/11/main/postgresql.conf

synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (db1, barman_receive_wal)'

By specifying “FIRST 1” parameter, we let server know to use synchronous replication for one server only. Database server has #1 priority.
If one server fall, cluster will continue to work synchronously with backup server.

Restart PostgreSQL on db1 and db2 servers to apply chages.

systemctl restart postgresql

Edit barman global config file on backup server.
/etc/barman.conf

[barman]
barman_user = barman
configuration_files_directory = /etc/barman.d
barman_home = /var/lib/barman
log_file = /var/log/barman/barman.log
log_level = INFO
compression = pygzip ; compress WALs, base backup compress is not implemented by barman
backup_options = concurrent_backup ; use modern 9.6+ concurrent backup
recovery_options = 'get-wal' ; recover, retrieving WALs from barman using get-wal or barman-wal-restore utils, rather than copying WALs directly to pg_wal 
minimum_redundancy = 2 ; keep at least two full backups at any time
retention_policy = RECOVERY WINDOW OF 2 WEEKS ; keep backups for at least two weeks

Create backup config file.
/etc/barman.d/main_backup.conf

[main_backup]
description =  "Main backup"

conninfo = host=10.8.3.1 port=54321 user=barman dbname=postgres
backup_method = postgres

streaming_conninfo = host=10.8.3.1 port=54321 user=streaming_barman
streaming_archiver = on
slot_name = barman
;streaming_archiver_name = barman_receive_wal

Check connection to barman users.

su - barman
psql -c 'SELECT version()' -U barman -h 10.8.1.1 postgres
psql -U streaming_barman -h 10.8.1.1 -c "IDENTIFY_SYSTEM" replication=1

Create two cron jobs.

crontab -e
* * * * * /usr/bin/barman cron
0 2 * * 1 /usr/bin/barman backup main_backup

barman cron – executes every minute, deleting old backups, running WAL archiving, etc.
barman backup main_backup – executes every monday at 2:00 a.m, performs base backup.

To perform streaming backup, barman use replication slots. Create one manually if it not exists. Next, run barman cron manually to start WAL archiving. Force WAL switching. Check barman running well. Do our first two backups to respect minimum redundancy value.

barman receive-wal --create-slot main_backup
barman cron
barman switch-wal --force --archive main_backup
barman check main_backup
barman backup main_backup
barman backup main_backup

(optionally) Tell repmgr it can use barman for cloning. It will help us to recover PostgreSQL instances without affecting master server. Add the following lines to repmgr.conf on db1 and db2 servers.
/etc/repmgr.conf

barman_host=barman@10.8.3.1
barman_server=main_backup
restore_command=/usr/bin/barman-wal-restore -U barman 10.8.3.1 main_backup %f %p

On db1 server run.

su - postgres
ssh-keygen -b 2048 -t rsa -N "" -C "postgres@10.8.1.1"
ssh-copy-id barman@10.8.3.1

On db2 server run.

su - postgres
ssh-keygen -b 2048 -t rsa -N "" -C "postgres@10.8.2.1"
ssh-copy-id barman@10.8.3.1

Now we are ready to test a cluster.

Leave a Comment

Your email address will not be published. Required fields are marked *