AWS Database Blog
Data consistency with AWS DMS data resync
In this post, we deep dive into AWS Database Migration Service Data Resync, a feature that was introduced in DMS version 3.6.1 to detect and resolve data inconsistencies during database migrations, eliminating the need for manual intervention. With Data Resync, any data inconsistencies identified through data validation between your source and target database are identified and addressed. We’ll discuss the steps to enable the Data Resync feature and how it can identify data inconsistencies through examples.
Before data resync was available, data inconsistencies required user intervention, such as issuing table reload on a full load and change data capture (CDC) task or manually updating the records on the target. Data resync is available in all Regions where AWS DMS supports migrations from Oracle or SQL server to PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition.
AWS DMS data resync configuration
Data resync operates by reading the discrepancies identified by DMS data validation, retrieving the current values from the source and applying it to the target to sync the record on the target. For a full load only task, resync, when enabled, runs immediately after all the tables have been validated. For tasks with CDC, resync must be scheduled via task settings, at which point the task will pause CDC and validation to minimize write conflicts.
We recommend that you schedule resync windows during periods of minimal source database activity and for a short duration, as recommended in Best practices. This helps minimize the latency spikes due to CDC being paused.
To configure data resync, you need to enable it while creating or modifying a task. On the AWS DMS console, under Data resync, select Schedule resync, as shown in the following screenshot.
The resync schedule uses a Cron expression to schedule data resync runs:
For example, the following settings schedule the data resync to run on Saturday at midnight:
For more examples, refer to Data resync configuration and examples.
With data resync, AWS DMS creates an awsdms_validation_failures_v2 table on the PostgreSQL target endpoint with the structure shown in the following screenshot.
This table is referenced to track and address mismatches on the target tables during the validation process by looking up the data on the source using the primary key. When upgrading or moving a task to AWS DMS version 3.6.1 and above, validation failures that occurred before the upgrade won’t be automatically resynced. To address upgrade validation failures, you need to initiate a table reload or revalidation. New validation failures that occur after the upgrade will be tracked and resynced through the awsdms_validation_failures_v2
table.
During a resync operation, AWS DMS completes the following sequence of steps, depending upon the task type. The following messages can be found in the CloudWatch logs for each step, depending upon the task type:
For a FULL LOAD and CDC or CDC task:
- Trigger resync operation:
- Pause validation:
- Pause CDC:
- Resync tables:
- Resume CDC:
- Resume validation:
For a FULL LOAD only task, you don’t need to specify a schedule because the resync manager triggers after the validation process is complete:
- Trigger resync operation:
- Resync tables:
Use cases for AWS DMS data resync
There are several uses cases in which AWS DMS data resync is valuable. In this section, we examine two.
Accidental deletion of records on target
The first use case we examine is one in which records on target have been accidentally deleted. To illustrate this use case, we migrate a table called REVIEWS from Oracle to PostgreSQL. When the full load is complete, we accidentally delete a few records on the target. In the following instance, we invoke the Data Manipulation Language (DML) statement on the target to delete a specific record on the target:
In this scenario, attempts to revalidate the table will lead to mismatch, which can be confirmed by entering the following command or by checking the AWS console:
When data resync is enabled, these mismatches are processed by checking the source and then reapplying to the target. In the following instance, we can confirm the record reflected in the public.awsdms_validation_failures_v2
table where it was reapplied to the target, as shown by the RESYNC_ACTION
of UPSERT
. The RESYNC_TIME
shows the timestamp when the action was performed:
Imagine a scenario in which we accidentally delete a few more records on the target during CDC. For instance, in the following SQL command, 20 records on the target are deleted at random:
We can observe that data resync has processed these records and applied them successfully to the target:
In both the full load and CDC scenarios we’ve described, data resync requires the revalidation of tables so that all data inconsistencies are properly identified and corrected. This revalidation is necessary because the changes on the target haven’t been made by AWS DMS.
Resuming CDC task after table error
Another use case can happen during a migration when a table is in the error state and changes for that table won’t be replicated to the target. While a task is running, you can reload a table. However, for a CDC only task, you need to restart the task from the LSN where the table failed. If there are several tables in an AWS DMS task, starting a DMS task from a certain timeframe can result in reapplying changes to the target.
Consider a scenario in which you migrate five tables under ADMIN schema from Oracle to PostgreSQL. In the following screenshot, three out of the five tables have ended in error.
You can tell from the CloudWatch logs that these tables have ended in error at different timestamps. Because the tables failed at different timestamps, you need to use the earliest timestamp when the table errored as the CDC start time and create a CDC only task with these three tables. The earliest timestamp in this case is 2025-06-05T03:40:13
.
During data resync, you can confirm that the detected conflicts are addressed, as shown in the following screenshot.
Conclusion
In this post, we introduced Data Resync, showed you how to configure it and discussed two use cases wherein we can use data resync to check and rectify inconsistencies during validation. For more details, refer to AWS DMS data resync