How-to: PostgreSQL upgrades on large DBs

TL;DR; Use pglogical to upgrade between major versions of PostgreSQL with minimal downtime when running very large DB installations.

For databases beyond a couple of GB in size, an upgrade between major versions of PostgreSQL would need to use pg_basebackup and pg_upgrade, but at a size of around 1TB, even this means too long of a downtime for a SaaS application. The solution is to use logical asynchronous replication to create a hot-standby running a newer version of PostgreSQL. pglogical is an extension that works well for this, down to PostgreSQL version 9.4.

The upgrade procedure we used consists roughly of the following steps:

Prepare the new DB (primary keys, schema, users)
Install pglogical into the DB
Drop indices on large tables
Synchronise the data
Re-create indices and sync sequences
Switch over

At Infostars we upgraded the ~1TB Postgresql 9.4 database with multiple extensions (PostGIS, audit) to version 13 with a minimal downtime window of < 10 minutes. The production set-up consists of two DB servers (hot standby) with pgpool2 in front of the DB.

Preparation

As a preparation, we set-up a second set of master/standby servers and provisioned them using ansible. We use ansible to set-up the docker host and copy over a docker-compose set-up, which contains the backend, cron jobs and HTTP servers for the frontend.

In order to be able to use pglogical, it’s important that every table has a primary key constraint. The following query shows tables which are missing the primary key:

sudo -u postgres psql $DBNAME -c "SELECT
  n.nspname as schema,
  c.relname as table
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
AND NOT EXISTS (
  SELECT 1 FROM pg_constraint con
  WHERE con.conrelid = c.oid AND con.contype = 'p'
)
AND n.nspname <> ALL (
  ARRAY ['pg_catalog', 'sys', 'dbo', 'information_schema']
);"

We had to fix the schema a little, as there were some old tables left over from the days we used hibernate auto update, instead of liquibase.

In order to connect the new to the old servers, we use an OpenVPN tunnel. The servers have the following internal IPs that you’ll find in the scripts below:

10.56.34.2  New master DB server
10.56.34.1  Old master DB server

In order to make the migration a little easier, I set-up password-less SSH access from the new server to the old one. I also use a few environment variables in the scripts below, to make it more readable:

DBNAME=YOURDBNAME
DBUSER=YOURREGULARAPPDBUSER
SSHOLD='ssh root@10.56.34.1'
PSQLOLD="ssh root@10.56.34.1 sudo -u postgres psql"
PGLPASS="YOURSECRETPASSWORDFORPGLOGICAL"

We have all the data in a single database and use two namespaces (public and audit). We do use different database users to connect to the DB.

Copying the DB schema

Export the schema and DB users to files:

$ $SSHOLD sudo -u postgres pg_dump -Fc -C -s $DBNAME > $DBNAME_schema.dmp
$ $SSHOLD sudo -u postgres pg_dumpall -g > globals.sql

DBNAME_schema.dmp now contains a PostgreSQL binary dump of only the DB schema and globals.sql a text dump of the DB users. Now make sure your new DB postgresql cluster is up and running. In our case, we only started the master DB and kept the hot-standby off until after the initial synchronisation run.

$ sudo -u postgres psql < globals.sql
$ sudo -u postgres psql -c "CREATE DATABASE $DBNAME WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';"
$ sudo -u postgres pg_restore -F c -d $DBNAME < $DBNAME_schema.dmp

Set-up pglogical

Make sure the pglogical packages are installed on both the old and the new server. Then we set-up a config file with the required parameters for pglogical, as mentioned in the quick set-up.

$ PGDATA=/etc/postgresql/9.4/main/
$ echo "include 'pglogical.conf'" | $SSHOLD "tee -a $PGDATA/postgresql.conf";
$ echo -e "wal_level = 'logical'\nmax_worker_processes = 10\nmax_replication_slots = 10\nmax_wal_senders = 10\nshared_preload_libraries = 'pglogical'" | $SSOLD "tee $PGDATA/pglogical.conf"
$ $SSHOLD "sudo systemctl restart postgresql@9.4-main"

$ PGDATA=/etc/postgresql/13/main/
echo "include 'pglogical.conf'" >> $PGDATA/postgresql.conf; echo -e "wal_level = 'logical'\nmax_worker_processes = 10\nmax_replication_slots = 10\nmax_wal_senders = 10\nshared_preload_libraries = 'pglogical'" >> $PGDATA/pglogical.conf
$ sudo systemctl restart postgresql@13-main

We need to add the pglogical extension to the DB we want to replicate:

$ $PSQLOLD $DBNAME -c '"CREATE EXTENSION pglogical_origin;"' # Only for postgresql 9.4
$ $PSQLOLD $DBNAME -c '"CREATE EXTENSION pglogical;"'
sudo -u postgres psql $DBNAME -c "CREATE EXTENSION pglogical;"

And we need to grant permissions to allow schema changes through liquibase:

$ $PSQLOLD $DBNAME -c "\"GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA pglogical TO $DBUSER;\""
$ $PSQLOLD $DBNAME -c "\"GRANT USAGE ON SCHEMA pglogical TO $DBUSER;\""
$ $PSQLOLD -c "\"CREATE ROLE pglogical LOGIN REPLICATION SUPERUSER ENCRYPTED PASSWORD '$PGLPASS';\""
$ sudo -u postgres psql -c "CREATE ROLE pglogical LOGIN REPLICATION SUPERUSER ENCRYPTED PASSWORD '$PGLPASS';"

And the pglogical extension needs to have access to the DB. ATTENTION: Replace 10.56.24.0/24 with actual IP range of the peer and make sure the subscriber can access their own DB through the IP as well.

echo -e "# TODO Added for DB upgrade using pglogical remove after done\nlocal\treplication\tpglogical\ttrust\nhost\treplication\tpglogical\t10.56.24.0/24\tmd5\nlocal\t$DBNAME\tpglogical\ttrust\nhost\t$DBNAME\tpglogical\t10.56.24.0/24\tmd5" | $SSHOLD tee -a /etc/postgresql/9.4/main/pg_hba.conf > /dev/null
echo -e "# TODO Added for DB upgrade using pglogical remove after done\nlocal\treplication\tpglogical\ttrust\nhost\treplication\tpglogical\t10.56.24.0/24\tmd5\nlocal\t$DBNAME\tpglogical\ttrust\nhost\t$DBNAME\tpglogical\t10.56.24.0/24\tmd5" | tee -a /etc/postgresql/13/main/pg_hba.conf > /dev/null

Now reload PostgreSQL to apply the access changes. ATTENTION: Make sure you have log_min_messages in postgresql.conf (or in a conf.d file) set to ‘info’ as a minimum, or you won’t see why initial synchronisation fails later on.

$ $SSHOLD systemctl reload postgresql@9.4-main.service
$ systemctl reload postgresql@13-main.service

Optional: Traffic shaping

As we are using the same network interface for the OpenVPN connection that’s also connected to the WAN (i.e. clients using the server), I enabled traffic shaping using wondershaper:

$ $SSHOLD apt install wondershaper
$ $SSHOLD wondershaper -a tun2 -u 215040 -d 215040

tun2 is the OpenVPN network interface and I limited the bandwith to 210MBit/s (210 * 1024).

Dropping indices on the new DB

As pglogical is using the SQL COPY command to copy over data, the new DB would build the indices up while the data is being replicated. For large tables that would take ages and delay the initial copy phase. It’s way for efficient to drop the indices on the new DB before the initial replication and build them concurrently afterwards.

$ sudo -u postgres pg_dump -s -t LARGE_TABLE_NAME $DBNAME > LARGE_TABLE_schema.sql
$ grep INDEX LARGE_TABLE_schema.sql | sed 's/^.*INDEX \([^ ]*\).*/DROP INDEX \1;/' | sudo -u postgres psql $DBNAME

Creating the replication set

pglogical supports several replications sets, but we’ll just use the default as it’s meant for INSERT, UPDATE, DELETE and therefore fits our purpose. The pglogical extension needs to have one provider node and one subscriber node created, on the provider/subscriber respectively. The node as well as the subscription (refers to a replication set) need a postgresql DSN (how to connect) string as a parameter.

We are using two namespaces: public and audit (from the audit trigger) and need to exclude the table spatial_ref_sys from the PostGIS extension, as the table is automatically filled when the PostGIS extension is created (while we restored the schema to the new DB above). So you’ll have to adapt the commands here a little (also the IPs)

$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.create_node(node_name := 'provider', dsn := 'host=10.56.34.1 port=5432 dbname=$DBNAME user=pglogical');\""
$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.replication_set_add_all_tables('default', '{public}'::text[]);\""
$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.replication_set_add_all_tables('default', '{audit}'::text[]);\""
$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.replication_set_add_all_sequences(set_name := 'default', schema_names := '{public}'::text[], synchronize_data := true )\""
$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.replication_set_add_all_sequences(set_name := 'default', schema_names := '{audit}'::text[], synchronize_data := true )\""
$ $PSQLOLD $DBNAME -c "\"SELECT pglogical.replication_set_remove_table(set_name := 'default', relation := 'spatial_ref_sys')\""

If you made a mistake, you can remove all tables / sequences in a replication set like this:

SELECT pglogical.replication_set_remove_table('default', set_reloid) FROM pglogical.replication_set_table;
SELECT pglogical.replication_set_remove_sequence('default', set_seqoid) FROM pglogical.replication_set_seq;

Start the replication

Now create the subscriber node and the actual subscription (starts the replication):

$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.create_node(node_name := 'subscriber', dsn := 'host=10.56.34.2 port=5432 dbname=$DBNAME user=pglogical password=$PGLPASS');"
$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.create_subscription(subscription_name := 'subscription', provider_dsn := 'host=10.56.34.1 port=5432 dbname=$DBNAME user=pglogical password=$PGLPASS', replication_sets := '{default}'::text[] );"

As you can see the connection string (DSN) for the node refers to the new primary DB host, whereas the connection string for the subscriptions refers to the old primary DB host.

Now look at the logs on the new primary. As mentioned above, make sure your log_min_messages parameter in postgresql.conf (or in a conf.d file) is set to ‘info’ as a minimum, or you won’t see why initial synchronisation fails. You can ignore messages like WARNING: snapshot 0x55a9335acdc8 still active.

tail -f -n400 /var/log/postgresql/postgresql-13-main.log | grep -v -e 'WARNING:  snapshot' -e 'apply COMMIT in commit'

If you like you can ask postgresql to show the status of the WAL replication (on the provider / old primary) and wait for initial synchronisation to complete (on the subscriber / new primary):

$ $PSQLOLD -c '"SELECT * from pg_stat_replication;"'
$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.wait_for_subscription_sync_complete('subscription');"

Once the query returns, you should have the pglogical replication in replication state (2nd return column):

$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.show_subscription_status('subscription')"

You can check whether you got all data from large tables by getting the largest ID and then doing the same query on both machines. This query uses a primary key index to be reasonably fast, a regular count(id) would do a full table scan.

BEGIN TRANSACTION; SELECT count(*) FROM (SELECT DISTINCT id FROM YOUR_LARGE_TABLE WHERE id < YOUR_MAX_ID) t; ROLLBACK;

Please be-ware that the replication sequences is a bit special. I handled them manually. See below.

In order to show subscripts, replication sets, and nodes just run a select on the respective tables:

SELECT * FROM pglogical.subscription;
SELECT * FROM pglogical.replication_set;
SELECT * FROM pglogical.node;

Re-create indices

Now re-create all indices that you’ve dropped before, concurrently on the subscriber:

$ grep INDEX LARGE_TABLE_schema.sql | sed 's/INDEX /INDEX CONCURRENTLY /' | sudo -u postgres psql $DBNAME

and wait for all indices to become active. You can check with \d LARGE_TABLE which will show the index as INVALID while it’s still being built. See also PostgreSQL CREATE INDEX.

During this time, you can ignore the following errors:
ERROR: could not read block 0 in file “base/16387/49241”: read only 0 of 8192 bytes
LOG: apply worker [1210644] at slot 1 generation 8 exiting with error

Switching over

Before we did the final switching over, we activated our new hot standby server. We’ve set-up pgpool2 in a way that this is a single command. See pgpool-II Online recovery for instructions.

The procedure of switching over is highly application specific. In our case it consisted of:

Disabling writes on the old primary application server
Syncing sequences to the new primary DB server
Stopping pglogical and shutting down the old DB server
Bringing up the app backend stack on the new primary server
Pointing our floating IP to the new primary server

One weak spot with pglogical are sequences. Contrary to the docs, they are never automatically synced in my set-up. I’ve been able to get pglogical.synchronize_sequence to work to sync them manually, but it adds 1000 to the last value, as documented. This is not what I want however, so I synced them manually as a part of switching the production system to the new DB using the following commands. Please make sure you do this after you disabled writing to the old primary DB and before you enable writing to the new primary DB.

# ATTENTION: Only includes the 'public' namespace, use e.g '\ds audit.*' to list sequences in other namespaces
$ sudo -u postgres psql $DBNAME -t -A -c "\ds" | awk -F '|' '{ print $2 }' | sort > sequences.txt
$ cat sequences.txt | while read seq; do lastval=$(sudo -u postgres psql $DBNAME -t -A -c "SELECT last_value FROM $seq" 2>/dev/null); echo $seq $lastval; done > sequence-vals.txt
# on subscriber
$ cat sequence-vals.txt | while read seq val; do sudo -u postgres psql $DBNAME -c "SELECT setval('$seq', $val);"; done

After disabling writing to the old DB, you stop the pglogical replication and then shut down the old primary DB just to be safe:

$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.drop_subscription('subscription');"
$ sudo -u postgres psql $DBNAME -c "SELECT pglogical.drop_node(node_name := 'subscriber');"
$ $SSHOLD systemctl stop postgresql@9.4-main

For our own migration, at this point we would bring up pgpool2, all the docker services to re-enable our backend and point our floating IP to the new server.

$ echo DONE

Flog

Just another WordPress weblog