AWS Database Blog

Accelerate SQL Server to Amazon Aurora migrations with a customizable solution

Migrating from SQL Server to Amazon Aurora can significantly reduce database licensing costs and modernize your data infrastructure, but organizations often face challenges with complex schema transformations during the migration process. To accelerate your migration journey, we have developed a migration solution that offers ease and flexibility. You can use this migration accelerator to achieve fast data migration and minimum downtime while customizing it to meet your specific business requirements, such as complex data transformations, distributed data sources, and minimal impact on live systems.

In this post, we showcase the core features of the migration accelerator, demonstrated through a complex use case of consolidating 32 SQL Server databases into a single Amazon Aurora instance with near-zero downtime, while addressing technical debt through refactoring. We walk you through the step-by-step solution implementation that you can use to reinvent your data infrastructure. Finally, we discuss the solution benefits, including flexibility, scalability, observability, reusability, and resilience.

Solution overview

Our migration accelerator provides a robust pathway for migrating large-scale SQL Server databases to Amazon Aurora PostgreSQL-Compatible Edition, with built-in support for handling distributed data sources, complex transformations, and schema conversions. The solution is designed with several key features that provide reliable, efficient, and configurable migrations while maintaining data consistency in live environments.

The solution implements three critical capabilities:

  • Re-entrant points allow jobs to resume from where they are left off after interruptions, making the migration process resilient to failures.
  • The decoupled execution model separates export, transform, and import operations into independent components that can run in parallel, significantly improving throughput for large-scale migrations.
  • The solution offers extensive configurability through configurable parameters and mapping files, enabling you to define custom transformations and control execution parameters without modifying the core migration logic.

The following diagram outlines the solution architecture.

Architecture diagram showing data migration flow from SQL Server to Amazon Aurora PostgreSQL using AWS services.

The diagram illustrates the data migration flow from a source SQL Server database to Amazon Aurora using Amazon Simple Storage Service (Amazon S3) for bulk export, transform, and import. The process follows these steps:

  1. AWS Database Migration Service (AWS DMS) or AWS Glue export jobs can be used to extract data from source databases into an S3 raw bucket.
  2. AWS Glue jobs process the raw data, applying necessary transformations defined in the mapping files. The transformed data is stored in a separate S3 bucket as load-ready files (LRFs).
  3. An AWS Glue cleanup job manages target table preparation. An AWS Glue import job loads the LRFs into Aurora PostgreSQL staging tables. A final AWS Glue job runs stored procedures to switch the partitions between staging and base tables in the target database.
  4. AWS Glue scripts and comprehensive migration statistics such as execution duration, row counts, and processing data are stored in S3 Glue asset and migration metrics buckets to support data migration processes and enable performance analysis.
  5. The entire process is orchestrated by an end-to-end wrapper script running on AWS Fargate for Amazon Elastic Container Service (Amazon ECS), with the Docker image registered in Amazon Elastic Container Registry (Amazon ECR).
  6. Amazon EventBridge automates the execution of the end-to-end wrapper script by scheduling it to run at user-defined intervals, providing timely and consistent database migration operations.
  7. An AWS Lambda function enables operational control by allowing administrators to stop AWS DMS, AWS Glue, or end-to-end wrapper jobs on demand and perform targeted cleanup of S3 objects when needed.
  8. Amazon CloudWatch and AWS CloudTrail provide end-to-end visibility into the migration process, and Amazon Simple Notification Service (Amazon SNS) delivers real-time alerts for failures, enabling immediate attention to issues.
  9. AWS Secrets Manager securely stores database credentials, and Parameter Store, a capability of AWS Systems Manager, maintains configuration parameters for migration jobs, supporting secure and flexible deployment across environments. AWS Key Management Service (AWS KMS) and AWS Identity and Access Management (IAM) provide fine-grained access control and permission policies for AWS resources.

Data migration between heterogeneous database systems, where the source and target databases differ in database engine, data structure, and data format, is a common challenge in enterprise environments. This solution offers a flexible approach that can handle both straightforward 1:1 source-to-target table mappings and more complex scenarios involving data transformations and table joins. For simple migrations where the source and target table structures are identical or very similar, this solution can directly map and transfer data with minimal transformation. This is particularly useful for large-scale migrations where maintaining data integrity and structure is paramount. However, the real power of this solution lies in its ability to handle more complex migration scenarios. It can perform data transformations, combine data from multiple source tables, and create new consolidated target tables. This is especially valuable when migrating from legacy systems to modern databases, where data restructuring is often necessary to optimize for new business requirements. To demonstrate this capability, let’s consider a more complex example. Let’s say we have two tables (cell_performance_data and ue_statistics_data) in the source SQL Server database, and we want to perform a heterogeneous migration of these tables to the target PostgreSQL database. Instead of simply copying these tables as-is, we want to create a new, consolidated target table called cell_perf_stats by joining the two source tables. This join operation allows us to:

  1. Combine related data from multiple sources.
  2. Potentially reduce data redundancy.
  3. Create a more comprehensive view of the data for analysis in the target system.

Here’s how we might approach this join operation:

SELECT 
	c.cell_location, 
	c.cell_id, 
	c.bandwidth_usage_total, 
	c.signal_strength_indicator_1, 
	c.dropped_calls_total, 
	u.device_id, 
	u.data_usage_total, 
	u.network_quality_indicator_1, 
	u.handover_attempts, 
	c.data_timestamp 
FROM cell_performance_data c 
LEFT JOIN 
	ue_statistics_data u 
ON c.cell_id = u.device_id 
AND c.data_timestamp = u.data_timestamp ;

One of the ways to achieve this is to perform the join in the source database and then move the joined data to the target database, but this approach has some major drawbacks:

  • With a table containing billions of records, joining on the source database, which is serving live queries, is going to negatively impact the database performance.
  • You would need a considerable amount of database space, which could be costly.
  • If you have these tables distributed between several source servers or databases, it will be lot of manual and time-consuming work to perform this join across all the databases and then move all those to the target database.

The solution described in the following sections helps mitigate these issues.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Clone the code repository

Clone the GitHub repo with the following code:

git clone https://github.com/aws-samples/sample-accelerate-SQLServer-to-AmazonAurora-migration-with-customizable-solution

Set up source and target databases

Complete the following steps to set up your source and target databases:

  1. Create 2 tables in the source SQL Server database using the following SQL commands:
    -- Table 1: cell_performance_data 
    CREATE TABLE cell_performance_data ( 
    	cell_location STRING, 
    	cell_id STRING, 
    	power_consumption_idle DECIMAL(11,3), 
    	power_consumption_peak DECIMAL(11,3), 
    	bandwidth_usage_total DECIMAL(11,3), 
    	measurement_time TIMESTAMP, 
    	signal_strength_indicator_1 INT, 
    	signal_strength_indicator_2 INT, 
    	dropped_calls_total INT, 
    	antenna_tilt SMALLINT, 
    	channel_utilization SMALLINT, 
    	data_timestamp TIMESTAMP, 
    	PRIMARY KEY (cell_id, data_timestamp)
    );

    -- Table 2: ue_statistics_data 
    CREATE TABLE ue_statistics_data ( 
    	device_coordinate STRING, 
    	device_id STRING, 
    	data_usage_download DECIMAL(11,3), 
    	data_usage_upload DECIMAL(11,3), 
    	data_usage_total DECIMAL(11,3), 
    	connection_time TIMESTAMP, 
    	total_connected_time INT, 
    	total_idle_time INT, 
    	total_active_time INT, 
    	network_quality_indicator_1 STRING, 
    	handover_attempts INT, 
    	data_timestamp TIMESTAMP, 
    	PRIMARY KEY (device_id, data_timestamp), 
    	FOREIGN KEY (device_id) REFERENCES cell_performance_data(cell_id)
    );
  2. Create the target table in Aurora PostgreSQL-Compatible database using:
    CREATE TABLE cell_perf_stats ( 
    	cell_location STRING, 
    	cell_id STRING, 
    	bandwidth_usage_total DECIMAL(11,3), 
    	signal_strength_indicator_1 INT, 
    	dropped_calls_total INT, 
    	device_id STRING, 
    	data_usage_total DECIMAL(11,3), 
    	network_quality_indicator_1 STRING, 
    	handover_attempts INT, 
    	data_timestamp TIMESTAMP
    );

Our solution supports schema modifications during migration, as demonstrated in our example where two source tables with complex schemas (cell_performance_data with 12 columns and ue_statistics_data with 12 columns) are consolidated into a simplified target table (cell_perf_stats with 10 essential columns), eliminating redundant fields while preserving critical business data.

Set up AWS Glue ETL

Complete the following steps to set up AWS Glue extract, transform, and load (ETL) jobs:

  1. Using the Amazon S3 console, create four S3 buckets:
    • <your-raw-data-bucket>: For exported data from SQL Server
    • <your-transformed-data-bucket>: For transformed data
    • <your-migration-stats-bucket>: For job run statistics
    • <your-glue-asset-bucket>: For storing glue scripts
  2. Navigate to the AWS Secrets Manager console and create a secret for SQL Server authentication with following format:
    {
      "username": "<db-user-name>",
      "password": "<db-passwd>",
      "engine": "sqlserver",
      "host": "<db-host-name>",
      "port": "1433",
      "dbname": "DatabaseA"
    }
  3. Go to the IAM console to create an IAM role for AWS Glue jobs.
    1. Create a new role with AWS Glue as the trusted entity
    2. Attach the AWSGlueServiceRole policy
    3. Add permissions for accessing your S3 buckets and Secrets Manager
  4. In the AWS Glue console, import the following AWS Glue jobs from your cloned repository’s directory. Use the “Import job” option in the AWS Glue console to upload each JSON file:
    • ue_statistics_data_ExportMssqlToRaw.json
    • cell_performance_data_ExportMssqlToRaw.json
    • cellPerfStats_rawToTransformed.json
    • truncate_stageTableAurora.json
    • cellPerfStats_transformedToAurora.json
  5. Still in the AWS Glue console, for each imported job:
    1. Select the job from the list
    2. Go to the job details tab and update the job parameters with your specific S3 bucket names, Secrets Manager secret, and other details.

Set up the orchestration script and Fargate

In the following code examples, replace <ACC-ID> with the your 12-digit AWS account number where you are deploying this solution:

  1. Navigate to AWS Systems Manager Parameter Store console and create two parameters:
    1. Create the migration control parameter:
      Name: /migration-parameter/cell-perf-stats 
      Value: 
      { "job_mode": "resume", 
      "log_level": "INFO", 
      "input_partition": "20250101", 
      "partition_col_date": "data_timestamp", 
      "source_filter_cell_performance_data": "data_timestamp BETWEEN '2025-01-01 00:00:00:000' AND '2025-03-31 23:59:59' ", 
      "source_filter_ue_statistics_data": "data_timestamp BETWEEN '2022-10-01 00:00:00:000' AND '2025-03-31 23:59:59' " 
      }
    2. Create the server mappings parameter:
      Name: /migration-parameter/server-mappings
      Value:
      {
          "server_mappings": {
              "server01": "sql-server-host-name:1433"
          },
          "databases": ["DatabaseA", "DatabaseB"],
          "categories": {
              "cell_perf_stats": [
                  "cell_performance_data",
                  "ue_statistics_data"
              ]
          }
      }
      

    Now you’re ready to set up Amazon ECR and Amazon ECS. The following steps 2 to 11 in this section are performed in your local environment using AWS CLI.

  2. Navigate to your cloned repository directory and create a new file named “Dockerfile “. Copy the provided Docker configuration that includes system packages, AWS CLI installation, and Python environment setup.
    # Install all necessary system packages:
    RUN apt-get update && \
        apt-get install -y \
        build-essential \
        zlib1g-dev \
        libncurses5-dev \
        libgdbm-dev \
        libnss3-dev \
        libssl-dev \
        libreadline-dev \
        libffi-dev \ 
        curl \
        wget \
        git \
        jq \
        nodejs \
        npm \
        python3 \
        python3-pip \
        python3-venv \
        unzip
    
    #Install AWS CLI:
    RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
       unzip awscliv2.zip && \
       ./aws/install && \
       rm -rf aws awscliv2.zip
    RUN apt-get clean && \
       rm -rf /var/lib/apt/lists/*
    
    # Add local bin to PATH
    ENV PATH="/root/.local/bin:${PATH}"
    
    # Create and activate virtual environment
    RUN python3 -m venv /opt/venv
    ENV PATH="/opt/venv/bin:$PATH"
    
    COPY dev-requirements.txt .
    RUN pip install -r dev-requirements.txt -q
    COPY e2e_wrapper_cell_perf_stats.py .
    
    #Create a new group and user called 'appuser'
    RUN groupadd -r appuser && useradd -r -g appuser appuser
    USER appuser
    
    # Set the default command
    CMD ["python3", "-u", "e2e_wrapper_cell_perf_stats.py"]
  3. Ensure you are in the repository directory and build the Docker image using:
    docker build -t data-mig-app:latest .
  4. Create a new Amazon ECR repository to store your Docker image using:
    aws ecr create-repository --repository-name data-mig-app
  5. Authenticate your Docker client to your Amazon ECR registry using:
    aws ecr get-login-password --region us-west-1 | docker login --username AWS --password-stdin <ACC-ID>.dkr.ecr.us-west-1.amazonaws.com
  6. Tag your local Docker image and push it to your Amazon ECR repository using:
    docker tag data-mig-app:latest <ACC-ID>.dkr.ecr.us-west-1.amazonaws.com/data-mig-app
    docker push <ACC-ID>.dkr.ecr.us-west-1.amazonaws.com/data-mig-app
  7. Create an ECS task execution role to allow ECS to execute tasks on your behalf using:
    aws iam create-role --role-name ecsTaskExecutionRole --assume-role-policy-document '{
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "ecs-tasks.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }'
    
    aws iam attach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
  8. Create a new file named “task-definition.json” in your repository directory. Copy the provided task definition that specifies container configurations, including environment variables and resource requirements. Update the environment variables in the following task according to your environment:
        {
      "family": "data-mig-app",
      "networkMode": "awsvpc",
      "executionRoleArn": "arn:aws:iam::<ACC-ID>:role/ecsTaskExecutionRole",
      "containerDefinitions": [
        {
          "name": "data-mig-app",
          "image": "<ACC-ID>.dkr.ecr.us-west-1.amazonaws.com/data-mig-app:latest",
          "portMappings": [
            {
              "containerPort": 80,
              "hostPort": 80,
              "protocol": "tcp"
            }
          ],
          "environment": [
            {
            "name": "MIGRATION_PARAMETER",
            "value": "/migration-parameter/cell-perf-stats"
            },
            {
            "name": "SERVER_MAPPINGS_PARAMETER",
            "value": "/migration-parameter/server-mappings"
            },
            {
            "name": "RDS_ENDPOINT",
            "value": "aurora-pg-cluster.cluster-mydb.eu-west-1.rds.amazonaws.com"
            },
            {
            "name": "RDS_USER",
            "value": "mydb_user"
            }
          ],
          "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
              "awslogs-group": "/ecs/data-mig-app",
              "awslogs-region": "us-west-1",
              "awslogs-stream-prefix": "ecs"
            }
          },
          "essential": true
        }
      ],
      "requiresCompatibilities": [
        "FARGATE"
      ],
      "cpu": "256",
      "memory": "512"
    }
  9. Register the task definition with Amazon ECS using:
    aws ecs register-task-definition --cli-input-json file://task-definition.json
  10. Create a new ECS cluster to host your Fargate tasks using:
    aws ecs create-cluster --cluster-name data-mig-app-cluster
  11. Create a service with a Fargate task in your ECS cluster. Replace the subnets and security groups placeholders with your actual configurations based on your account setup:
    aws ecs create-service \
      --cluster data-mig-app-cluster \
      --service-name data-mig-app-srv \
      --task-definition data-mig-app \
      --desired-count 1 \
      --launch-type FARGATE \
      --network-configuration "awsvpcConfiguration={subnets=[subnet1- xxxxx,subnet2- xxxxx],securityGroups=[sg- xxxxx],assignPublicIp=ENABLED}"

Launch the ECS migration task

In your local terminal, execute the ECS task with the following command. Replace the subnets and security groups according to your account setup:

aws ecs run-task \
  --cluster your-cluster-name \
  --task-definition data-mig-app \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-xxxxx],securityGroups=[sg-xxxxx]}" \
  --launch-type FARGATE \
  --region us-west-1

Complete the data migration using a stored procedure

Create the stored procedure in Aurora PostgreSQL-Compatible using the script stage.stage_ctas_null_upd_cell_perf_stats.sql found in your cloned repository. Once the stored procedure is created, execute it with the following command:

CALL stage.stage_ctas_null_upd_cell_perf_stats('database', 'cell_perf_stats');

This stored procedure handles the final data transfer from staging tables to the target production table in your Aurora PostgreSQL database.

Monitor the migration

You can monitor migration progress through multiple AWS services.

  • Use the AWS Glue console to track the status and performance of your ETL jobs.
  • For comprehensive operational visibility, utilize Amazon CloudWatch and AWS CloudTrail to monitor the end-to-end migration process.
  • Set up Amazon SNS to receive real-time alerts for any failures, enabling immediate response to potential issues.
  • Review detailed migration statistics including execution duration, row counts, and processing statistics in the designated S3 bucket (<your-migration-stats-bucket>).
  • Verify the migration results by checking the summaries of completed partitions in your statistics S3 bucket.

Abort/Resume the migration

If you notice any discrepancies during monitoring, you can abort the migration using the AWS Lambda function, which enables operational control by allowing administrators to stop AWS DMS, AWS Glue, or end-to-end wrapper jobs on demand. The solution’s re-entrant points feature allows you to resume the migration from where it is left off after interruptions, making the process resilient to failures. You can control this through the ‘job_mode‘ parameter in the Parameter Store, setting it to ‘resume‘ when ready to continue the migration.

Optional: Set up AWS DMS for small tables

For straightforward migrations of small tables (less than 10 GB or 10 million records) that don’t require table joins, you can use AWS DMS to migrate the data between the databases. For detailed instructions on setting up AWS DMS, refer to the AWS DMS documentation.

Clean up

To avoid incurring future charges, delete the following resources used in the solution deployment:

  1. Database resources
    1. Delete the Amazon Aurora PostgreSQL cluster and its associated instances.
    2. Delete the SQL Server instances by terminating the EC2 instances.
  2. AWS Glue resources
    1. Navigate to AWS Glue console and delete the imported Glue jobs.
    2. Delete any AWS glue connections created.
  3. Amazon ECS resources
    1. Delete the Fargate service and ECS cluster from the Amazon ECS console.
  4. Amazon ECR resources
    1. Delete the repository named “data-mig-app” from the Amazon ECR console.
  5. IAM roles
    1. Delete the ECS task execution role.
    2. Delete the AWS Glue service role.
  6. S3 resources
    1. Delete the raw data bucket.
    2. Delete the transformed data bucket.
    3. Delete the migration statistics bucket.
    4. Delete the Glue asset bucket.
  7. Parameter Store parameters
    1. Delete the migration control parameter.
    2. Delete the server mappings parameter.
  8. Secrets Manager secret
    1. Delete the SQL Server authentication secret.
  9. Monitoring resources
    1. Delete CloudWatch log groups created for the migration.
    2. Delete any CloudWatch alarms or metrics created.
    3. Delete any trails in AWS CloudTrail created for migration monitoring.
    4. Delete SNS topics and subscriptions created for migration alerts.

Conclusion

In this post, we demonstrated a comprehensive solution for migrating SQL Server databases to Amazon Aurora PostgreSQL-Compatible that addresses the challenges of handling large-scale, distributed data migrations. Our solution provides an automated, reliable pathway for database modernization while maintaining minimal disruption to live environments.

The solution follows the guidelines provided by the AWS Well-Architected Framework. It delivers several key benefits for organizations looking to modernize their database infrastructure:

  • Flexibility – The solution accommodates various migration scenarios through configurable parameters and mapping files, allowing you to customize transformations without modifying core code.
  • Scalability – With support for chunking data at day/month levels and parallel processing capabilities, the solution efficiently handles multi-terabyte databases distributed across multiple servers.
  • Observability – Comprehensive monitoring and reporting mechanisms provide clear visibility into the migration progress, with statistics captured for each phase and notifications for any failures.
  • Reusability – The solution’s modular design means it can be quickly adapted for different database migration projects across your organization.
  • Resilience – Re-entrant jobs and the ability to pause and resume migrations make sure that even complex migrations can recover gracefully from interruptions.

We invite you to try this database migration solution available in the following GitHub repo to accelerate your journey to a more scalable, cost-effective database solution on Amazon Aurora.


About the authors

Aman Verma

Aman Verma

Aman Verma is a Delivery Consultant at AWS Professional Services, bringing over 18 years of experience in data and technology. He is passionate about big data, distributed systems, and AI/ML. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading and traveling.

David Zhang

David Zhang

David is an Engagement Manager at AWS Professional Services, where he leads enterprise-scale AI/ML, cloud transformation initiatives for Fortune 100 customers in telecom, finance, media, and entertainment. Outside of work, he enjoys experimenting with new recipes in his kitchen, playing tenor saxophone, and capturing life’s moments through his camera.

Jignesh Patel

Jignesh Patel

Jignesh is as a Principal Engineer at AWS, where he architects and delivers scalable enterprise solutions across multiple verticals. Beyond his professional achievements, he continues to struggle to achieve a competitive edge on the tennis court in his free time.