Part 1. Introduction.
Part 2. Replication. Repmgr.
Part 3. Auto-failover. Repmgr daemons.
Part 4. HA. Backup. Repmgr daemons. Barman.
Part 5. Testing. Ansible. Final thoughts.
At this part, we will:
- make PostgreSQL high available using repmgr daemons
- use barman for streaming backup
- combine HA feature and barman to store master and standby backups in the same barman place and save disk space
By default, barman can stream data from master and standby at the same time, storing backups at separate places. It’s the only option for an asynchronous replication. If master server will fall, your standby server will be inconsistent.
In case of synhronous replication, we can continue retrieving WAL from new master(ex-standby). It means, we can store backups in the same place saving disk space. To achieve this in transparent way, we will make our cluster high available.
Repmgr daemons
First, let’s turn our DB cluster to high available one. We will use repmgr events to hook promotion and following. Iptables rules will help us to redirect barman and app server traffic from ports 54321 and 54322 to a master server.
Create a script with custom iptables chains on backup server./usr/local/bin/repmgr_iptables_chains.sh
#!/bin/bash #!/bin/bash /sbin/sysctl -w net.ipv4.ip_forward=1 /sbin/iptables -t nat -N repmgr_dnat /sbin/iptables -t nat -A OUTPUT -j repmgr_dnat /sbin/iptables -t nat -A PREROUTING -j repmgr_dnat /sbin/iptables -t nat -N repmgr_snat /sbin/iptables -t nat -A POSTROUTING -j repmgr_snat if [[ -f /usr/local/bin/repmgr_master ]]; then master=$(cat /usr/local/bin/repmgr_master) /usr/bin/sudo -u postgres /usr/local/bin/repmgr_handler.sh $master standby_promote fi
And ensure it will be executed at every boot.
crontab -e @reboot /usr/local/bin/repmgr_iptables_chains.sh
Create failover helper script. Repmgrd will execute it at every promote event and make our cluster high available./usr/local/bin/repmgr_iptables.sh
#!/bin/bash readonly DB="$1" readonly BACKUP="$2" readonly PG_PORT="$3" readonly IN_PORT="$4" if [ -z "$5" ]; then iptables -t nat --flush repmgr_dnat iptables -t nat --flush repmgr_snat iptables -t nat -A repmgr_snat -p tcp -d $DB --dport $PG_PORT -j SNAT --to-source $BACKUP fi iptables -t nat -A repmgr_dnat -p tcp -d $BACKUP --dport $IN_PORT -j DNAT --to-destination $DB:$PG_PORT
Create repmgrd event handler script./usr/local/bin/repmgr_handler.sh
#!/bin/bash readonly DB1="10.8.1.1" readonly DB2="10.8.2.1" readonly BACKUP="10.8.3.1" readonly SLOT_NAME="barman" readonly BARMAN_NAME="main_backup" readonly PG_PORT="5432" readonly BARMAN_PORT="54321" readonly MASTER_PORT="54322" readonly NODE_ID="$1" EVENT="$2" if [[ "$NODE_ID" == "1" ]]; then IP="$DB1" else IP="$DB2" fi if [[ "$EVENT" =~ ^(primary_register|standby_promote|repmgrd_failover_promote)$ ]]; then sudo /usr/local/bin/repmgr_iptables.sh $IP $BACKUP $PG_PORT $BARMAN_PORT echo "$NODE_ID" > /usr/local/bin/repmgr_master psql -h $IP -p $PG_PORT -U repmgr -d repmgr -c "select pg_create_physical_replication_slot('$SLOT_NAME') where not exists(select 1 from pg_replication_slots where slot_name = '$SLOT_NAME');" sudo -u barman barman receive-wal --stop $BARMAN_NAME 2>/dev/null sudo -u barman barman switch-wal --force --archive --archive-timeout=90 $BARMAN_NAME sudo /usr/local/bin/repmgr_iptables.sh $IP $BACKUP $PG_PORT $MASTER_PORT "app" fi if [[ "$EVENT" =~ ^(standby_register|standby_follow|repmgrd_failover_follow|repmgrd_standby_reconnect)$ ]]; then psql -h $IP -p $PG_PORT -U repmgr -d repmgr -c "select pg_drop_replication_slot('$SLOT_NAME') from pg_replication_slots where slot_name = '$SLOT_NAME';" fi
Let’s take a look at two interesting things.
#1. Following classic barman backup scenario, you need to create two replication slots – one on master and one on standby. In our case we should drop slot on each ex-master and create one on a new master after promotion. If we keep replication slot on ex-master server, it will grow infinitely – you will just get out of db server space fast, thanks to WAL files waiting for streaming.
#2. Before opening server for app connections using repmgr_iptables.sh script, we forcefully perform WAL switching and archieving old WAL. It resets barman and guaranteed barman not miss any WALs before independently reset.
Add execute permissions to all files.
chmod +x /usr/local/bin/repmgr_iptables_chains.sh chmod +x /usr/local/bin/repmgr_iptables.sh chmod +x /usr/local/bin/repmgr_handler.sh
Let repmgr daemons execute barman and helper script without a password./etc/sudoers.d/pg_sudoers
postgres ALL = (barman) NOPASSWD: /usr/bin/barman postgres ALL = NOPASSWD: /usr/local/bin/repmgr_iptables.sh
Add the following line to repmgrd configs./etc/repmgrd_db1.conf
/etc/repmgrd_db2.conf
event_notification_command='/usr/local/bin/repmgr_handler.sh %n %e'
Restart daemons.
systemctl restart repmgrd_db1 systemctl restart repmgrd_db2
Open barman and app ports manually.
/usr/local/bin/repmgr_iptables_chains.sh /usr/local/bin/repmgr_iptables.sh 10.8.1.1 10.8.3.1 5432 54321 /usr/local/bin/repmgr_iptables.sh 10.8.1.1 10.8.3.1 5432 54322 "app"
Add currently master node id to file.
echo 1 > /usr/local/bin/repmgr_master chown postgres:postgres /usr/local/bin/repmgr_master
Check you can connect to both ports.
su - postgres psql -U repmgr -d repmgr -h 10.8.3.1 -p 54321 psql -U repmgr -d repmgr -h 10.8.3.1 -p 54322
Barman
Now, barman.
Create barman users on db1.
su - postgres createuser -s barman psql -c "alter user barman with password 'ast84kd';" createuser --replication streaming_barman psql -c "alter user streaming_barman with password 'nxs62jk';"
Create pgpass for barman user on backup server.
su - barman echo "*:*:*:barman:ast84kd" > ~/.pgpass echo "*:*:*:streaming_barman:nxs62jk" >> ~/.pgpass chmod 600 ~/.pgpass
Add the following lines to beginning of pg_hba.conf on db1 and db2 servers./etc/postgresql/11/main/pg_hba.conf
local replication streaming_barman md5 host replication streaming_barman 127.0.0.1/32 md5 host replication streaming_barman 10.8.3.1/32 md5 local postgres barman md5 host postgres barman 127.0.0.1/32 md5 host postgres barman 10.8.3.1/32 md5
Let’s do replication process synchronous.
Add the following lines to postgresql.conf on db1 server./etc/postgresql/11/main/postgresql.conf
synchronous_commit = on synchronous_standby_names = 'FIRST 1 (db2, barman_receive_wal)'
Add the following lines to postgresql.conf on db2 server./etc/postgresql/11/main/postgresql.conf
synchronous_commit = on synchronous_standby_names = 'FIRST 1 (db1, barman_receive_wal)'
By specifying “FIRST 1” parameter, we let server know to use synchronous replication for one server only. Database server has #1 priority.
If one server fall, cluster will continue to work synchronously with backup server.
Restart PostgreSQL on db1 and db2 servers to apply chages.
systemctl restart postgresql
Edit barman global config file on backup server./etc/barman.conf
[barman] barman_user = barman configuration_files_directory = /etc/barman.d barman_home = /var/lib/barman log_file = /var/log/barman/barman.log log_level = INFO compression = pygzip ; compress WALs, base backup compress is not implemented by barman backup_options = concurrent_backup ; use modern 9.6+ concurrent backup recovery_options = 'get-wal' ; recover, retrieving WALs from barman using get-wal or barman-wal-restore utils, rather than copying WALs directly to pg_wal minimum_redundancy = 2 ; keep at least two full backups at any time retention_policy = RECOVERY WINDOW OF 2 WEEKS ; keep backups for at least two weeks
Create backup config file./etc/barman.d/main_backup.conf
[main_backup] description = "Main backup" conninfo = host=10.8.3.1 port=54321 user=barman dbname=postgres backup_method = postgres streaming_conninfo = host=10.8.3.1 port=54321 user=streaming_barman streaming_archiver = on slot_name = barman ;streaming_archiver_name = barman_receive_wal
Check connection to barman users.
su - barman psql -c 'SELECT version()' -U barman -h 10.8.1.1 postgres psql -U streaming_barman -h 10.8.1.1 -c "IDENTIFY_SYSTEM" replication=1
Create two cron jobs.
crontab -e * * * * * /usr/bin/barman cron 0 2 * * 1 /usr/bin/barman backup main_backup
barman cron – executes every minute, deleting old backups, running WAL archiving, etc.
barman backup main_backup – executes every monday at 2:00 a.m, performs base backup.
To perform streaming backup, barman use replication slots. Create one manually if it not exists. Next, run barman cron manually to start WAL archiving. Force WAL switching. Check barman running well. Do our first two backups to respect minimum redundancy value.
barman receive-wal --create-slot main_backup barman cron barman switch-wal --force --archive main_backup barman check main_backup barman backup main_backup barman backup main_backup
(optionally) Tell repmgr it can use barman for cloning. It will help us to recover PostgreSQL instances without affecting master server. Add the following lines to repmgr.conf on db1 and db2 servers./etc/repmgr.conf
barman_host=barman@10.8.3.1 barman_server=main_backup restore_command=/usr/bin/barman-wal-restore -U barman 10.8.3.1 main_backup %f %p
On db1 server run.
su - postgres ssh-keygen -b 2048 -t rsa -N "" -C "postgres@10.8.1.1" ssh-copy-id barman@10.8.3.1
On db2 server run.
su - postgres ssh-keygen -b 2048 -t rsa -N "" -C "postgres@10.8.2.1" ssh-copy-id barman@10.8.3.1
Now we are ready to test a cluster.