AWS Database Blog
Replicate spatial data using AWS DMS and Amazon RDS for PostgreSQL
In this post, we show you how to migrate spatial (geospatial) data from self-managed PostgreSQL, Amazon RDS for PostgreSQL, or Amazon Aurora PostgreSQL-Compatible Edition to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition using AWS Database Migration Service (AWS DMS). Spatial data is useful for applications such as mapping, routing, asset tracking, and geographic visualization. We walk through setting up your environment, configuring AWS DMS, and validating the successful migration of spatial datasets.
Geospatial data represents physical network paths, building locations, and elements that cross space. It includes critical data that helps identify geographic locations and enables insight generation for various applications. This type of data can be represented using a 2D coordinate system with latitude and longitude coordinates. The data can take multiple forms including:
- Points (such as GPS coordinates)
- Lines (such as roads or rivers)
- Polygons (such as borders and land use zones)
- Collections of these features
AWS DMS treats these columns as large objects (LOBs) rather than native spatial types. Without configuring LOB handling and enabling PostGIS support on the target database, replication tasks may fail or lead to incomplete data migration.
Amazon RDS for PostgreSQL spatial data
PostgreSQL databases can use the PostGIS extension to handle geospatial data types and functions effectively. This extension enables storage and manipulation of geometry and geography spatial types while providing specialized spatial functions. When working with spatial data, PostGIS allows operations such as ST_Within for performing radius searches and distance calculations between geometries. The extension supports various geometric objects and can perform complex spatial operations such as checking if a polygon is contained within another polygon using simple SQL function such as ST_Contains since it’s a PostGIS spatial SQL function.
Amazon Relational Database Service (Amazon RDS) for PostgreSQL provides full integration with PostGIS functionality, making it suitable for applications requiring geospatial operations.
Solution overview
This solution uses AWS DMS to replicate geospatial data managed by the PostGIS extension. The source and target databases are both Amazon RDS for PostgreSQL. While the source and target can be different PostgreSQL versions, ensure that the PostGIS extension versions are compatible between two databases. AWS DMS supports cross-version PostgreSQL migrations.
Prerequisites
Before setting up the PostGIS extension for RDS for Postgres in both source and target, make sure you have the following:
- AWS setup:
- AWS account with permissions to create and manage AWS DMS resources
- Amazon RDS for PostgreSQL instances as source and target. For testing purposes, you can use a single RDS instance and replicate data between different databases within that instance to reduce costs.
- AWS DMS replication instance, endpoints, and AWS DMS task created
- Database configuration:
- PostGIS installed on both source and target databases
- To install PostGIS, connect to each database using your SQL client of preference, and enter the following command:
Note: Without the PostGIS extension on the target database, the AWS DMS task will fail with errors such as “type geometry does not exist.”
- Tables must have primary key or unique key constraints
Note: Geometry columns must be nullable on target tables for LOB handling.
Note: AWS DMS will remove LOB columns if tables don’t have a primary key or unique constraint. This is critical for capturing LOB changes during CDC. AWS DMS requires geometry columns to be nullable on target tables for LOB handling. If your source table has NOT NULL constraints on geometry columns, migration will fail. You can reapply NOT NULL constraints after migration completes.
- Additional considerations:
- Review Managing spatial data with the PostGIS extension in the Amazon RDS documentation
- Evaluate which additional PostGIS extensions might be needed for your use case
- Ensure the database instance has sufficient resources for PostGIS operations. Monitor key Amazon CloudWatch metrics: CPUUtilization (<60%), FreeableMemory (>25% available), ReadIOPS/WriteIOPS, and FreeStorageSpace.
Best practices when configuring PostGIS extension
Consider the following best practices when configuring PostGIS:
- Validate PostgreSQL and PostGIS version compatibility. To validate the extension was created, use the following command:
- Evaluate additional PostGIS extensions based on workload requirements.
- Monitor resource usage when executing complex spatial queries.
- Plan appropriate backup strategies for spatially enabled databases.
Implementation steps
Step 1: Begin by creating a spatial table in the source Amazon RDS for PostgreSQL database. To define spatial tables and load sample data, complete the following steps. The following example includes a geometry column defined using the 3857 spatial reference system (SRID):
- Create the sample table in your source database. Using the sample DDL provided in the prerequisites section (2c), create the table in your source Amazon RDS for PostgreSQL database.
- Insert the sample data in the
province_boundarytable by using the following query: - Enter the following query to measure the geometry size (in bytes):
Step 2: Prepare Target Database and Configure AWS DMS
- Before starting the AWS DMS task, connect to your target Amazon RDS for PostgreSQL database and create the PostGIS extension:
Note: AWS DMS creates the target table automatically if it doesn’t exist, regardless of the table preparation mode selected. However, the PostGIS extension must be installed beforehand.
- Configure the AWS DMS replication task Create a new AWS DMS task with the following configuration
- Migration type: Full load and CDC
- Target table preparation mode: Do nothing
- LOB mode: Limited LOB mode
- Maximum LOB size: 2048 KB
Note: “The maximum LOB size for geospatial data cannot exceed 980,000 KB.” - To determine the appropriate LOB size for your spatial data, use the queries provided in the AWS DMS documentation for determining LOB column sizes.
- Turn on CloudWatch logs: Essential for monitoring migration progress, troubleshooting issues, and validating successful data transfer.
- Enable log context for detailed monitoring
- Add selection rules to include your spatial tables (e.g., schema: public, table: province_boundary)
- After configuring all the settings, start the AWS DMS task to begin the migration.
- Monitor the migration process
Once the task starts, monitor the CloudWatch logs to verify successful data transfer. You should see log entries similar to the following:
Common errors that we encounter during the replication
During spatial data migration, you may encounter errors related to LOB handling or PostGIS data types. The following example shows a typical error when AWS DMS fails to process geometry data
Upon reviewing the database logs, an ‘invalid input json’ error was identified.
Generate test data on the source database
To test the spatial data migration with a larger dataset, use the following procedure to generate synthetic spatial data:
- Connect to your source Amazon RDS for PostgreSQL database using your preferred SQL client (psql, pgAdmin, and more)
- Run the following script to generate test data. You can change the value of `v_rows` to control how many rows are generated (the current script generates 10 rows)
- Verify the data was created successfully by checking the row count
Post-migration validation
To validate your data after migration, follow these steps:
- Compare row counts between source and target tables.
- Verify that geometry data types are preserved correctly.
Run the following query on both source and target databases to compare geometry columnsEnsure the SRID and geometry type match between source and target.
- Rebuild spatial indexes as needed.
- Validate the functionality of PostGIS spatial functions (for example,
ST_Within,ST_Contains). - Reapply NOT NULL constraints to geometry columns if they were temporarily made nullable.
Consider the following best practices for your migration:
- Implement table parallelism for efficient data loading.
- Monitor task progress through CloudWatch logs.
Clean up
To avoid ongoing charges, clean up all resources after migration including AWS DMS tasks, the AWS DMS replication instance, endpoints, and databases. For information about deleting AWS DMS tasks, refer to delete-replication-task. To delete the replication instance, refer to Deleting a replication instance. If you created test databases specifically for this migration and no longer need them, delete the Amazon RDS for PostgreSQL instances.
Conclusion
In this post, we walked through how to successfully migrate spatial data between Amazon RDS for PostgreSQL instances using AWS Database Migration Service. We covered the critical configuration steps for handling geospatial data as LOBs, including PostGIS extension setup, primary key requirements, LOB mode configuration, and CloudWatch monitoring. By following these guidelines ensuring tables have primary keys, making geometry columns nullable, and properly sizing LOB parameters you can avoid common pitfalls and achieve reliable replication of spatial datasets. Understanding these requirements is essential because spatial data presents unique challenges in database migration, requiring careful attention to both AWS DMS configuration and PostGIS-specific considerations.
We encourage you to try this solution in your own environment, starting with a test instance to validate the configuration before migrating production spatial data. If you have questions or insights from your spatial data migrations, please share them in the comments below.