AWS Database Blog

Data consistency with AWS DMS data resync

In this post, we deep dive into AWS Database Migration Service Data Resync, a feature that was introduced in DMS version 3.6.1 to detect and resolve data inconsistencies during database migrations, eliminating the need for manual intervention. With Data Resync, any data inconsistencies identified through data validation between your source and target database are identified and addressed. We’ll discuss the steps to enable the Data Resync feature and how it can identify data inconsistencies through examples.

Before data resync was available, data inconsistencies required user intervention, such as issuing table reload on a full load and change data capture (CDC) task or manually updating the records on the target. Data resync is available in all Regions where AWS DMS supports migrations from Oracle or SQL server to PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition.

AWS DMS data resync configuration

Data resync operates by reading the discrepancies identified by DMS data validation, retrieving the current values from the source and applying it to the target to sync the record on the target. For a full load only task, resync, when enabled, runs immediately after all the tables have been validated. For tasks with CDC, resync must be scheduled via task settings, at which point the task will pause CDC and validation to minimize write conflicts.

We recommend that you schedule resync windows during periods of minimal source database activity and for a short duration, as recommended in Best practices. This helps minimize the latency spikes due to CDC being paused.

To configure data resync, you need to enable it while creating or modifying a task. On the AWS DMS console, under Data resync, select Schedule resync, as shown in the following screenshot.

The resync schedule uses a Cron expression to schedule data resync runs:

* * * * * 
| | | | | 
| | | | | 
| | | | +---- Day of Week (0-6) 
| | | +------ Month (1-12)
| | +-------- Day of Month (1-31)
| +---------- Hour (0-23)
+------------ Minute (0-59)

For example, the following settings schedule the data resync to run on Saturday at midnight:

"ResyncSettings": 
{
    "EnableResync": true,
    "ResyncSchedule": "0 0 * * 6", // Run Saturday at midnight
    "MaxResyncTime": 360,  // Run for maximum of 360 minutes, or 6 hours
    "ValidationTaskId": "" //Optional, used only if validation is performed as a separate Validation only task
}

For more examples, refer to Data resync configuration and examples.

With data resync, AWS DMS creates an awsdms_validation_failures_v2 table on the PostgreSQL target endpoint with the structure shown in the following screenshot.

This table is referenced to track and address mismatches on the target tables during the validation process by looking up the data on the source using the primary key. When upgrading or moving a task to AWS DMS version 3.6.1 and above, validation failures that occurred before the upgrade won’t be automatically resynced. To address upgrade validation failures, you need to initiate a table reload or revalidation. New validation failures that occur after the upgrade will be tracked and resynced through the awsdms_validation_failures_v2 table.

During a resync operation, AWS DMS completes the following sequence of steps, depending upon the task type. The following messages can be found in the CloudWatch logs for each step, depending upon the task type:

For a FULL LOAD and CDC or CDC task:

  1. Trigger resync operation:
    [DATA_RESYNC ]I: Data Resync Manager schedule window time matched to start resync
  2. Pause validation:
    [DATA_RESYNC ]I: Trying to STOP validation before resync process. (resync_manager.c:331)
  3. Pause CDC:
    [DATA_RESYNC ]I: Data Resync Manager sending command to sorter to PAUSE applying changes to target.
  4. Resync tables:
    [RESYNC_UNLOAD ]I: Sent ctrl command for Resync Unload of table with id: 1
  5. Resume CDC:
    [DATA_RESYNC ]I: Data Resync Manager sending command to sorter to RESUME applying changes to target
  6. Resume validation:
    [DATA_RESYNC ]I: Trying to RESUME validation after resync process

For a FULL LOAD only task, you don’t need to specify a schedule because the resync manager triggers after the validation process is complete:

  1. Trigger resync operation:
    [DATA_RESYNC     ]I:  Data Resync Manager sending command to start up resync subtasks
  2. Resync tables:
    [TASK_MANAGER    ]I:  All tables are loaded. Validation is finished. Waiting for resync to finish...  (replicationtask.c:4953)
    [DATA_RESYNC     ]I:  Stopped Data Resync Manager, exiting thread
    

Use cases for AWS DMS data resync

There are several uses cases in which AWS DMS data resync is valuable. In this section, we examine two.

Accidental deletion of records on target

The first use case we examine is one in which records on target have been accidentally deleted. To illustrate this use case, we migrate a table called REVIEWS from Oracle to PostgreSQL. When the full load is complete, we accidentally delete a few records on the target. In the following instance, we invoke the Data Manipulation Language (DML) statement on the target to delete a specific record on the target:

dmsdb=> delete from dms_test.reviews where review_id=8193;
DELETE 1

In this scenario, attempts to revalidate the table will lead to mismatch, which can be confirmed by entering the following command or by checking the AWS console:

aws dms  describe-table-statistics --replication-task-arn arn:aws:dms:us-east-1:xxxxxxxxxxxx:task:xxxxxxxxxxxx --filters Name=table-name,Values="REVIEWS"

{
    "TableStatistics": [
        {
            "SchemaName": "DMS_TEST",
            "TableName": "REVIEWS",
            "Inserts": 0,
            "Deletes": 0,
            "Updates": 0,
            "Ddls": 0,
            "AppliedInserts": 0,
            "AppliedDeletes": 0,
            "AppliedUpdates": 0,
            "AppliedDdls": 0,
            "FullLoadRows": 3500,
            "FullLoadCondtnlChkFailedRows": 0,
            "FullLoadErrorRows": 0,
            "FullLoadStartTime": "2025-06-03T14:24:23.062000-05:00",
            "FullLoadEndTime": "2025-06-03T14:24:25.408000-05:00",
            "FullLoadReloaded": false,
            "LastUpdateTime": "2025-06-03T14:35:12.009000-05:00",
            "TableState": "Table completed",
            "ValidationPendingRecords": 0,
 "ValidationFailedRecords": 1,
            "ValidationSuspendedRecords": 0,
 "ValidationState": "Mismatched records"
        }
    ]
}

When data resync is enabled, these mismatches are processed by checking the source and then reapplying to the target. In the following instance, we can confirm the record reflected in the public.awsdms_validation_failures_v2 table where it was reapplied to the target, as shown by the RESYNC_ACTION of UPSERT. The RESYNC_TIME shows the timestamp when the action was performed:

dmsdb=> select * from public.awsdms_validation_failures_v2;
-[ RECORD 1 ]-+---------------------------
RESYNC_ID     | 1029
TASK_NAME     | BESR3KWW2FCLLH4AJBFSEYSNW4
TABLE_OWNER   | dms_test
TABLE_NAME    | reviews
FAILURE_TIME  | 2025-06-03 19:33:26.410998
KEY_TYPE      | Row
KEY           | {                         +
              |         "key":  ["8193"]  +
              | }
FAILURE_TYPE  | MISSING_TARGET
DETAILS       |
RESYNC_RESULT | SUCCESS
RESYNC_TIME   | 2025-06-03 19:35:06.322
RESYNC_ACTION | UPSERT

Imagine a scenario in which we accidentally delete a few more records on the target during CDC. For instance, in the following SQL command, 20 records on the target are deleted at random:

dmsdb=> delete from dms_test.reviews where ctid in (select ctid from dms_test.reviews order by RANDOM() LIMIT 20);
DELETE 20

We can observe that data resync has processed these records and applied them successfully to the target:

dmsdb=> select "TABLE_OWNER", "TABLE_NAME","RESYNC_ACTION", "FAILURE_TYPE", "RESYNC_RESULT",count(*) from public.awsdms_validation_failures_v2 group by "TABLE_OWNER", "TABLE_NAME","RESYNC_ACTION", "FAILURE_TYPE", "RESYNC_RESULT";
-[ RECORD 1 ]-+---------------
TABLE_OWNER | dms_test
TABLE_NAME | reviews
RESYNC_ACTION | UPSERT FAILURE_TYPE | MISSING_TARGET RESYNC_RESULT | SUCCESS count | 21

In both the full load and CDC scenarios we’ve described, data resync requires the revalidation of tables so that all data inconsistencies are properly identified and corrected. This revalidation is necessary because the changes on the target haven’t been made by AWS DMS.

Resuming CDC task after table error

Another use case can happen during a migration when a table is in the error state and changes for that table won’t be replicated to the target. While a task is running, you can reload a table. However, for a CDC only task, you need to restart the task from the LSN where the table failed. If there are several tables in an AWS DMS task, starting a DMS task from a certain timeframe can result in reapplying changes to the target.

Consider a scenario in which you migrate five tables under ADMIN schema from Oracle to PostgreSQL. In the following screenshot, three out of the five tables have ended in error.

You can tell from the CloudWatch logs that these tables have ended in error at different timestamps. Because the tables failed at different timestamps, you need to use the earliest timestamp when the table errored as the CDC start time and create a CDC only task with these three tables. The earliest timestamp in this case is 2025-06-05T03:40:13.

2025-06-05T03:40:13 [TASK_MANAGER ]W: Table 'ADMIN'.'DMST1' was errored/suspended (subtask 0 thread 1). 

2025-06-05T03:47:53 [TASK_MANAGER ]W: Table 'ADMIN'.'DMST2' was errored/suspended (subtask 0 thread 1). 

2025-06-05T03:52:32 [TASK_MANAGER ]W: Table 'ADMIN'.'DMST5' was errored/suspended (subtask 0 thread 1). 

During data resync, you can confirm that the detected conflicts are addressed, as shown in the following screenshot.

dmsdb=> select * from public.awsdms_validation_failures_v2;
-[ RECORD 1 ]-+---------------------------
RESYNC_ID     | 9949
TASK_NAME     | 6LOQBMAKQFDELB5WQB5BPG5Q74
TABLE_OWNER   | admin
TABLE_NAME    | dmst1
FAILURE_TIME  | 2025-06-05 05:26:58.027987
KEY_TYPE      | Row
KEY           | {                         +
              |         "key":  ["101"]   +
              | }
FAILURE_TYPE  | MISSING_TARGET
DETAILS       |
RESYNC_RESULT | SUCCESS
RESYNC_TIME   | 2025-06-05 05:30:06.423
RESYNC_ACTION | UPSERT

Conclusion

In this post, we introduced Data Resync, showed you how to configure it and discussed two use cases wherein we can use data resync to check and rectify inconsistencies during validation. For more details, refer to AWS DMS data resync


About the Authors

Suchindranath Hegde

Suchindranath Hegde

Suchindranath is a Senior Data Migration Specialist Solutions Architect at Amazon Web Services. He works with our customers to provide guidance and technical assistance on data migration to the AWS Cloud using AWS DMS.

Mahesh Kansara

Mahesh Kansara

Mahesh is a database engineering manager at Amazon Web Services. He closely works with development and engineering teams to improve the migration and replication service. He also works with our customers to provide guidance and technical assistance on various database and analytical projects, helping them improve the value of their solutions when using AWS.

Sridhar Ramasubramanian

Sridhar Ramasubramanian

Sridhar is a database engineer with the AWS Database Migration Service team. He works on improving the DMS service to better suit the needs of AWS customers.