Migrate Google Cloud SQL for PostgreSQL to Amazon RDS and Amazon Aurora using pglogical

PostgreSQL has become the preferred open-source relational database for many enterprise developers and startups, allowing builders to rapidly create prototypes all the way to powering mission-critical applications. Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition makes it easier to set up, operate, and scale PostgreSQL deployments on the AWS Cloud. With Amazon RDS and Amazon Aurora, you can deploy scalable PostgreSQL deployments in minutes with cost-efficient and resizable hardware capacity. RDS for PostgreSQL and Aurora PostgreSQL manages complex and time-consuming administrative tasks such as PostgreSQL software installation and upgrades, storage management, backups and replication for disaster recovery, high availability and read throughput.

In this post, we provide the steps to migrate a PostgreSQL database from Google Cloud SQL to RDS for PostgreSQL and Aurora PostgreSQL using the pglogical extension. We also demonstrate the necessary connection attributes required to support the database migration. The pglogical extension works for the community PostgreSQL version 9.4 and higher, and is supported on RDS for PostgreSQL and Aurora PostgreSQL as of version 12+.

Solution overview

The pglogical extension asynchronously replicates only the changes in the data by using logical decoding. This makes replication very efficient because only the differences are replicated. It is also tolerant of network faults because it can resume after the fault. We use the publisher/subscriber model using PostgreSQL pglogical extension for the logical streaming replication. This solution works for both RDS for PostgreSQL and Aurora PostgreSQL.

The following diagram illustrates the architecture:

In the following sections, we show how to set up the Cloud SQL for PostgreSQL database configuration, which is a primary database.

Configure the primary database

To configure the primary database, complete the following steps:

Go to the Cloud SQL Instances page on the Google Cloud console.
Enable access on the primary instance for the IP address of the external replica, more details in the How to Configure authorized networks.
Connect to Cloud SQL Instances – using PostgreSQL command-line client or Cloud Shell
Create a PostgreSQL user with the REPLICATION attribute:
```
CREATE USER repl WITH REPLICATION IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'XXXXXXX';
```
Note that REPLICATION is a privileged role and the user to use it should be a database superuser.
To configure logical replication with pglogical extension, edit the Cloud SQL instance to add and set the following parameters:
```
a. cloudsql.enable_pglogical
b. cloudsql.logical_decoding
c. max_replication_slots
d. max_worker_processes
e. max_wal_senders
```
More details in the Set up logical replication and decoding.
Restart the database, then log in again using the PostgreSQL command-line client or Cloud Shell and change to the repl, user and create the pglogical extension:
```
CREATE EXTENSION pglogical;
grant usage on schema pglogical to repl;   
grant all on schema pglogical to repl;
```
Next, we create a pglogical node, which represents a physical PostgreSQL instance and stores connection details for that instance.
Create a provider (publisher) node with the following code:
```
SELECT pglogical.create_node( node_name := 'provider',
dsn := 'host=XXXXXXX port=5432 dbname=test user=repl password=XXXXXXX');
```
Follow the logging techniques to minimize the risk of this being logged to the server.
Note: We have used default port (5432) for this configuration, you can check the port number by using below command. Follow the logging techniques to minimize the risk of this being logged to the server.
```
Select * from pg_settings where name=’port’;
```

If you are starting with a new database, create the same database and tables on both the primary and replica instances. For example:

CREATE DATABASE cloudsqltordsdb;
\connect cloudsqltordsdb;
CREATE TABLE cloudsqltordsdb_replica_table (emp_id SERIAL PRIMARY KEY, emp_name text);
INSERT INTO cloudsqltordsdb_replica_table  VALUES (default, 'tom');
INSERT INTO cloudsqltordsdb_replica_table  VALUES (default, 'sam');
INSERT INTO cloudsqltordsdb_replica_table  VALUES (default, 'harry');

Pre create the tables on secondary instance. For large database with many tables, it is recommended to use pg_dump to export all schemas without data into script file and import the script file into the target database.

Configure the RDS for PostgreSQL instance

To configure the RDS for PostgreSQL instance, complete the following steps:

Connect to the RDS for PostgreSQL or Aurora PostgreSQL instance using one of the following techniques:
1. Using pgAdmin to connect to a RDS for PostgreSQL DB instance
2. Using psql to connect to your RDS for PostgreSQL DB instance
Create a special user with rds_superuser privilege for replication and grant replication privileges:
```
CREATE ROLE repl WITH LOGIN NOSUPERUSER;
GRANT rds_replication, rds_superuser TO repl;
```
If you are starting with a new database, use the repl user to create the same database on both the primary and replica instances. For example:
```
CREATE DATABASE cloudsqltordsdb;
```
Configure your RDS instance to turn on logical replication in the target database, set the parameter in the database parameter group, more details in the Working with parameter groups documentation. When using Aurora PostgreSQL, parameters need to be changed in the DB cluster parameter group, more details in the Amazon Aurora DB cluster and DB instance parameters documentation.
```
rds.logical_replication=1
shared_preload_libraries = 'pglogical'
```
Reboot the instance after you configured the parameter in parameter group for the parameters to take effect as these parameters are static parameters. Rebooting a DB instance, Rebooting a DB instance within an Aurora cluster
Create a pglogical extension:
```
CREATE EXTENSION pglogical;
```

Create the provider node on the Cloud SQL Instance by connecting to the Cloud SQL Instance and run below command:

SELECT pglogical.create_node(
node_name := 'provider',
dsn := 'host=XXXXXXXXXX port=5432 dbname=cloudsqltordsdb user=repl password=XXXXX'
  	 );

Add all tables in the schema to be migrated from Cloud SQL to RDS PostgreSQL or Aurora PostgreSQL database to the default replication set by running below command:
```
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
```

Create the subscriber node on Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database by running below command:

SELECT pglogical.create_node(node_name := 'subscriber',
  dsn := 'host= XXXXXXXXXX port=5432 dbname= test user=repl password=XXXXXX');

Create a pglogical subscription on the subscriber node (Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database) by running the following command:

SELECT pglogical.create_subscription(
	subscription_name := 'cloudsqltords',
	provider_dsn := 'host=XXXXXXXXXX port=5432 dbname=cloudsqltordsdb user=repl password=XXXXX');

Check the status of the subscription by running the following command on the subscriber node:
```
SELECT subscription_name,status,slot_name FROM pglogical.show_subscription_status();
```
If the status appears as replicating, the setup is successful.

To validate the replication you can insert the data in the CloudSQL for PostgreSQL:

INSERT INTO cloudsqltordsdb_replica_table VALUES (default,’robert’);
		INSERT INTO cloudsqltordsdb_replica_table VALUES (default,’adam’);

Check whether the records are replicating to Amazon RDS PostgreSQL or Amazon Aurora PostgreSQL database.
```
select * from cloudsqltordsdb_replica_table;
```

Known limitations of pglogical and workarounds

Using pglogical comes with the following limitations:

Sequences:

With pglogical replication, to replicate all sequences for the schema public requires that the sequences are added to the replication set by running the following command:

SELECT pglogical.replication_set_add_all_sequences(set_name := 'default', schema_names := '{public}'::text[], synchronize_data := true );

Primary Key:

One limitation with pglogical is that it does not support the replication of primary key changes directly. This is due to the unique constraints imposed by primary keys and the potential for conflicts in a distributed environment. When a primary key is changed on the source, it may conflict with the existing primary key values on the target.A common workaround for this limitation involves using a surrogate key (like a UUID or a separate serial column) as the primary key, while maintaining a unique constraint on the original natural key. This allows you to replicate changes based on the surrogate key, avoiding conflicts. Here’s a simplified example:

Create a Surrogate Key:

ALTER TABLE your_table ADD COLUMN surrogate_key UUID DEFAULT uuid_generate_v4() UNIQUE;

Update Data and Replication Set:
Change your replication set to include the new surrogate key column.Perform an initial data update to populate the surrogate key column.
Replication with Surrogate Key:
Replicate based on the surrogate key, and on the target, maintain the uniqueness constraint on the natural key column.

On the target database:

CREATE TABLE your_table (surrogate_key UUID UNIQUE,other_columns TEXT);

By using a surrogate key for replication purposes, you avoid conflicts in primary key values. However, be cautious when implementing such changes, and ensure proper testing in a controlled environment. Also, consider the specific requirements and constraints of your application before adopting this workaround.

Extensions:

Database extensions in PostgreSQL are managed per database. Extensions are not replicated as part of pglogical replication. In order to move the extensions, the ideal workaround is to capture the list of extensions used in the source which is Google CloudSQL for PostgreSQL instance and create those extensions manually to RDS for PostgreSQL database.Please check the list of extensions supported by Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL.
To identify all installed extensions in CloudSQL for PostgreSQL run the following command:

SELECT * FROM pg_extension;

Materialized Views:

Pglogical only replicates tables, not materialized views. There is no support for refreshing the views. The workaround for this issue is replicate base tables, then create and refresh materialized views separately on the subscriber. Consider using triggers or event-based scheduling to refresh the materialized views based on changes in the replicated tables. Alternatively, if you’re using materialized views only for performance, consider using regular views and apply caching at the application or query level instead.

Schema Changes (DDLs):

DDLs are not replicated automatically. The recommendation is to avoid DDLs during migration. This limits the number of variables that a user must consider. However, if DDL is completely unavoidable, the user can do the following:

Using pglogical.replicate_ddl_command function to run DDL on the source database. The DDL statement get queued and replicated to the target database. And the user used to run pglogical.replicate_ddl_command function must have the same username on both the source and the target database with superuser privileges or the owner of the table being migrated.

Example: To replicate DDL:

create table public.t_test5 (id serial PRIMARY KEY, name text);

Run the following command on the source database to create the table “public.t_test5”

select pglogical.replicate_ddl_command(command := 'create table public.t_test5 (id serial PRIMARY KEY, name text);', replication_sets := ARRAY['default']);

Confirm if the DDL statement has been queued on the source database by running:
```
select * from pglogical.queue order by queued_at desc;
```
Add the table to the replication set for data to be replicated to the target database for the newly created table:
```
select pglogical.replication_set_add_table('default', 'public.t_test5');
```

Large Objects:

Currently the PostgreSQL’s logical decoding does not support decoding for large objects, so pglogical cannot replicate large objects. Thus, it’s recommended to convert Large Objects to bytea column and replicate via pglogical or store binary data to Amazon S3.

Conclusion

This post discussed the setup of pglogical for a database migration from Google Cloud to Amazon RDS for PostgreSQL and Aurora PostgreSQL. Using pglogical, you can also easily replicate lower PostgreSQL versions to higher PostgreSQL versions.

If you have any questions or comments, post your thoughts in the comments section.

About the authors

Sarabjeet Singh is a Sr.Database Specialist Solutions Architect at AWS. He works with customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

Kranthi Kiran Burada is as a Sr.Database Migration Specialist at AWS, specializing in facilitating the transition of clients from commercial databases to open-source solutions like PostgreSQL. With a keen focus on PostgreSQL, he actively engages in performance optimization, database design, troubleshooting, and imparting best practices during migrations from Oracle/SQL Server to PostgreSQL.

Jerome Darko is a Solution Architect at AWS with the Database Migration Accelerator team. He is passionate about Database technologies and accelerating customer’s database and analytic workloads migrations to AWS to derive the required value for their business.

AWS Database Blog