In today's fast-paced digital landscape, downtime is not an option. Every second of downtime can translate into lost revenue, diminished user trust, and damaged reputation. For businesses relying on Aurora PostgreSQL, minimizing downtime is crucial to maintaining seamless operations and ensuring customer satisfaction.
In this blog post, we'll explore how we perfomed a major version upgrade for one of our client with minimal downtime with logical replication and Aurora PostgreSQL copy on write clones. Copy on write clones enabled us to configure the standby with very minimal time as it use the same storage as the primary. You can get more information about the copy on write clones here.
Below are the steps to perform the upgradation
Make sure all the tables on source have PK
It’s requied to have the primary keys for PostgreSQL tables to configure the logical replication from source to destination. Identify tables that are missing primary key using below command and If any of the tables do not have the primary key check the possibility to create them, if the primary key creation is not possible set the replica identify as FULL which takes all the columns in the row to identify the row.
select tab.table_schema,
tab.table_name
from information_schema.tables tab
left join information_schema.table_constraints tco
on tab.table_schema = tco.table_schema
and tab.table_name = tco.table_name
and tco.constraint_type = 'PRIMARY KEY'
where tab.table_type = 'BASE TABLE'
and tab.table_schema not in ('pg_catalog', 'information_schema')
and tco.constraint_name is null
order by table_schema,
table_name;
Check the below parameters in source
Verify that below parameters are enabled on the source
wal_level is set to logical
rds.logical_replication is on
track_commit_timestamp is on
select name,setting from pg_settings where name in (
'wal_level',
'track_commit_timestamp',
'rds.logical_replication'
) ;
Create Instance and Cluster parameter group for latest version (here we updated to 16.2)
Create Instance and cluster parameter groups for the latest version which will be used for latest Aurora PostgreSQL cluster and instance. Make sure you have the parameters like logging , shared preload libraries from the primary server.
Take a backup of source cluster
Take a snapshot of the existing PostgreSQL cluster before upgrade.
Create a publication for source tables
publication is a database object which holds the tables whose changes will be replicated to logical standby server. Create publication for all tables in the database
CREATE PUBLICATION pub_for_major_upgrade FOR ALL TABLES;
Create replication slot
Create replication slot using below command and make a note of the LSN value which will be used to sync DB after upgrade from this point.
SELECT pg_create_logical_replication_slot('major_upgrade_slot', 'pgoutput');
Create Aurora PostgreSQL cluster clone
Create Aurora PostgreSQL clone from the source cluster, once the clone creation is completed, login to the cloned cluster and capture the LSN.
SELECT * FROM aurora_volume_logical_start_lsn();
Drop existing replication slot on cloned DB
As it is a clone of the primary server it has the slot we created in the primary during time of the clone.
select slot_name from pg_replication_slots ;
drop replication slot
select pg_drop_replication_slot('slot_name');
Upgrade cloned db to aurora postgres 16.2
Upgrade the cloned db to aurora postgres latest version, during the time of upgrade choose the latest parameter groups created earlier. You can follow the steps given here.
Create a subscription on new cloned DB
Once the upgradation is completed, create a subscription on new cloned DB. Here we are putting the subscription in the disabled state so that replication will not start.
CREATE SUBSCRIPTION sub_for_major_upgrade
CONNECTION 'host=hostname dbname=db_name port=5432 user=postgres password=xxx'
PUBLICATION pub_for_major_upgrade
WITH ( copy_data = false,
create_slot = false, enabled = false, connect = true,
slot_name = 'major_upgrade_slot' );
Capture replication origin on new db
Replication origins are unique identifiers associated with logical replication streams. Each replication origin represents a specific point in the replication stream from which changes are being consumed by subscribers.
SELECT * FROM pg_replication_origin;
Advance the replication origin
Advance Replication origin to the LSN number captured post creation of the Aurora clone.
SELECT pg_replication_origin_advance(value from pg_replication_origin,'value from aurora_volume_logical_start_ls
Start replication
Enable the subscription on standby which will start the replication
ALTER SUBSCRIPTION sub_for_major_upgrade ENABLE;
Verify replication lag
Login to the primary and verify the logical replication lag.
SELECT now() AS CURRENT_TIME,
slot_name,
active,
active_pid,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),confirmed_flush_lsn)) AS diff_size,
pg_wal_lsn_diff(pg_current_wal_lsn(),confirmed_flush_lsn) AS diff_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';
Stop writes to source (down time starts here)
Once the primary and standby instances are in sync, put the primary instance in read only mode so that no transactions happen except select queries. From here the actual downtime starts.
bynatree=> \c postgres
postgres=> ALTER DATABASE bynatree SET default_transaction_read_only = true;
ALTER DATABASE
Update sequences
Logical replication do not replicate the sequences so the sequences on the destination need to be set with last value of the sequence on which the sequences are being used.
Save the below command in a file “reset.sql“ which generate the commands in update_seq.sql file which need to be executed on the destination to set the last value of the sequence.
SELECT $$select setval('$$ || quote_ident(schemaname)||$$.$$|| quote_ident(sequencename) || $$', $$ || last_value || AS sql FROM pg_sequences;
psql -Atq -h <source _DB_endpoingt> -d < DB_NAME> -U postgres -f reset.sql -o update_seq.sql
Execute update_seq .sql on New Target database to update all the sequences with latest value
psql -h <target _DB_endpoingt> -d <DB_NAME> -U postgres -f update_seq.sql
Cutover
Stop the writes to the new database cluster, drop the subscription and rename the clusters/instances.
bynatree=> \c postgres
postgres=> ALTER DATABASE bynatree SET default_transaction_read_only = true;
ALTER DATABASE
# drop the subscription
ALTER SUBSCRIPTION sub_for_major_upgrade DISABLE;
DROP SUBSCRIPTION sub_for_major_upgrade ;
Rename the clusters/instances
Rename the old cluster/instance to cluster-old/instance-old.
Rename the new cluster with same name as the old cluster.
By this application are not required to change the configuration. This complestes the upgradation to the latest version.
Start writes on upgrade DB
bynatree=> \c postgres
postgres=> ALTER DATABASE bynatree SET default_transaction_read_only = false ;
ALTER DATABASE