AWS Database Blog

Simplify data integration using zero-ETL from Amazon RDS to Amazon Redshift

Organizations rely on real-time analytics to gain insights into their core business drivers, enhance operational efficiency, and maintain a competitive edge. Traditionally, this has involved the use of complex extract, transform, and load (ETL) pipelines. ETL is the process of combining, cleaning, and normalizing data from different sources to prepare it for analytics, AI, and machine learning (ML) workloads. Although ETL processes have long been a staple of data integration, they often prove time-consuming, complex, and less adaptable to the fast-changing demands of modern data architectures. By transitioning towards zero-ETL architectures, you can can foster agility in analytics, streamline processes, and make sure that data is immediately actionable.

In this post, we demonstrate how you can set up a zero-ETL integration between Amazon Relational Database Service (Amazon RDS) for MySQL or Amazon Relational Database Service (Amazon RDS) for PostgreSQL (source) and Amazon Redshift (destination). The transactional data from the source gets refreshed in near real time on the destination, which processes analytical queries.

What is zero-ETL?

Zero-ETL is a set of fully managed integrations by AWS that minimizes the need to build ETL data pipelines. It makes data available in Amazon Redshift from multiple operational, transactional, and enterprise sources. Zero-ETL integrations help unify your data across applications and data sources for holistic insights and breaking data silos. They provide a fully managed, no-code, near real-time solution for making petabytes of transactional data available in Amazon Redshift within seconds of data being written into Amazon RDS for MySQL or Amazon RDS for PostgreSQL. This alleviates the need to create your own ETL jobs simplifying data ingestion, reducing your operational overhead and potentially lowering your overall data processing costs. This enables you to focus more on deriving actionable insights and less on managing the complexities of data integration. The following image illustrates how you can achieve near real time analytics from Amazon RDS to Amazon Redshift.

zetl overview

A zero-ETL integration makes the data in your RDS database available in Amazon Redshift in near real-time. When that data is in Amazon Redshift, you can power your analytics, ML, and AI workloads using the built-in capabilities of Amazon Redshift, such as ML, materialized views, data sharing, federated access to multiple data stores and data lakes, and integrations with Amazon SageMaker, Amazon QuickSight, and other AWS services.

Solution overview

To create a zero-ETL integration, you specify an RDS database as the source, and a Redshift data warehouse as the target. The integration replicates data from the source database into the target data warehouse. The following diagram illustrates this architecture.

Analytics using zero-ETL

You’ll use AWS Command Line Interface (AWS CLI) to create the zero-ETL integration. To do so, you’ll first create a source RDS database instance, a target Redshift cluster and then initiate the integration.

Prerequisites

You must have the following prerequisites:

  • The AWS Command Line Interface (AWS CLI) v2 installed and configured with appropriate credentials.
  • Sufficient AWS Identify and Access management (AWS IAM) permissions to create and configure Amazon RDS. For more details, refer to Creating an Amazon RDS DB instance.
  • A RDS for MySQL or RDS for PostgreSQL (source) DB instance set up and accessible on their respective SQL ports. For this post, we use RDS DB instances with MySQL 8.0 or PostgreSQL 15.7
  • An Amazon Elastic Compute Cloud (Amazon EC2) Security group set up and allowing a DB instance port connection to the source and target DB instances.

Create a custom RDS DB parameter group

MySQL: For MySQL source, use the following command to create a custom RDS DB parameter group. This example illustrates creating a parameter group for MySQL 8.0.

aws rds create-db-parameter-group \ 
--db-parameter-group-name zetl-mysql-parameter-group \ 
--db-parameter-group-family mysql8.0 \ 
--description "Parameter group for mysql" \ 
--region {region}

Then modify the binlog_format and binlog_row_image parameter values. The binary logging format is important because it determines the record of data changes that is recorded in the source and sent to the replication targets. For information about the advantages and disadvantages of different binary logging formats for replication, see Advantages and Disadvantages of Statement-Based and Row-Based Replication in the MySQL documentation.

aws rds modify-db-parameter-group \ 
--db-parameter-group-name zetl-mysql-parameter-group \ 
--region {region} \ 
--parameters "ParameterName=binlog_format,ParameterValue=ROW,ApplyMethod=immediate" \ 
"ParameterName=binlog_row_image,ParameterValue=full,ApplyMethod=immediate"

PostgreSQL: For PostgreSQL source, use the following command to create a custom RDS DB parameter group. This example illustrates creating a parameter group for PostgreSQL15.

aws rds create-db-parameter-group \ 
--db-parameter-group-name zetl-pgsql-parameter-group \ 
--db-parameter-group-family postgres15 \ 
--description "Parameter group for postgresql" \ 
--region {region} 

Then modify the rds.logical_replication, rds.replica_identity_full, session_replication_role, wal_sender_timeout, max_wal_sendersmax_replication_slots parameters to set their new values. See Zero-ETL documentation for more information these parameters.

aws rds modify-db-parameter-group \ 
--db-parameter-group-name zetl-pgsql-parameter-group \ 
--region {region} \ 
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot" \ 
"ParameterName=rds.replica_identity_full,ParameterValue=1,ApplyMethod=immediate" \ 
"ParameterName=session_replication_role,ParameterValue=origin,ApplyMethod=immediate" \ 
"ParameterName=wal_sender_timeout,ParameterValue=0,ApplyMethod=immediate" \ 
"ParameterName=max_wal_senders,ParameterValue=20,ApplyMethod=immediate" \ 
"ParameterName=max_replication_slots,ParameterValue=20,ApplyMethod=immediate" 

Select or create a source RDS database

If you already have an RDS instance, you can use that, or you can create a new instance with the following code:

To create a new RDS MySQL DB instance with engine version 8.0.42:

aws rds create-db-instance \ 
--engine mysql \ 
--engine-version 8.0.42 \ 
--db-instance-class db.r5.large \ 
--master-username test \ 
--master-user-password ****** \ 
--db-parameter-group-name zetl-mysql-parameter-group \ 
--allocated-storage 10 \ 
--db-instance-identifier zetl-db \ 
--region {region} 
{
    "DBInstance": {
        "DBInstanceIdentifier": "zetl-db",
        "DBInstanceClass": "db.r5.large",
        "Engine": "mysql",
        "DBInstanceStatus": "creating",
        "DBInstanceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-db",
        //Skipping rest of the API response
    }
}

To create a new RDS PostgreSQL DB instance with engine version 15.7:

aws rds create-db-instance \ 
--db-name zetldb \ 
--engine postgres \ 
--engine-version 15.7 \ 
--db-instance-class db.r5.large \ 
--master-username test \ 
--master-user-password ****** \ 
--db-parameter-group-name zetl-pgsql-parameter-group \ 
--allocated-storage 10 \ 
--db-instance-identifier zetl-pg-db \ 
--region {region} 
{     
    "DBInstance": {
        "DBInstanceIdentifier": "zetl-pg-db",
        "DBInstanceClass": "db.r5.large",
        "Engine": "postgres",
        "DBInstanceStatus": "creating",
        "DBName": "zetldb",
        "DBInstanceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-pg-db",
        //Skipping rest of the API response
    }
}

Wait for your RDS instance to be in available status. You can make a describe-db-instances API call to verify the DB instance status.

For RDS MySQL:

aws rds describe-db-instances --filters 'Name=db-instance-id,Values=zetl-db' | grep DBInstanceStatus 

For RDS PostgreSQL:

aws rds describe-db-instances --filters 'Name=db-instance-id,Values=zetl-pg-db' | grep DBInstanceStatus 

Reboot your RDS database instance so that parameter group values can be applied (especially the ones that need a reboot)

RDS for MySQL:

aws rds reboot-db-instance --db-instance-identifier zetl-db

RDS for PostgreSQL:

aws rds reboot-db-instance --db-instance-identifier zetl-pg-db

Create a subnet group for the Redshift cluster

Use the following command to create a cluster subnet group

aws redshift create-cluster-subnet-group \
--cluster-subnet-group-name zetl-subnet-group \
--subnet-ids "subnet-*******" "subnet-*******" "subnet-*******" "subnet-*******" \
--description "subnet group for redshift" \
--region {region}
{
    "ClusterSubnetGroup": {
        "ClusterSubnetGroupName": "zetl-subnet-group",
        "Description": "subnet group for redshift",
        "VpcId": "vpc-********",
        "SubnetGroupStatus": "Complete",
        //Skipping rest of API response
    }
}

Create a custom parameter group for the Redshift cluster

Use the following command to create a custom parameter group for the Redshift cluster:

aws redshift create-cluster-parameter-group \
--parameter-group-name zetl-redshift-parameter-group \
--parameter-group-family redshift-1.0 \
--description "cluster parameter group for zetl" \
--region {region}
{
    "ClusterParameterGroup": {
        "ParameterGroupName": "zetl-redshift-parameter-group",
        "ParameterGroupFamily": "redshift-1.0",
        "Description": "cluster parameter group for zetl",
        "Tags": []
    }
}

Modify the enable_case_sensitive_identifier parameter and set its value as ON. This is required to support the case sensitivity of source tables and columns. The enable_case_sensitive_identifier parameter is a configuration value that determines whether name identifiers of databases, tables, and columns are case sensitive. This parameter must be turned on to create zero-ETL integrations in the data warehouse.

aws redshift modify-cluster-parameter-group \ 
--parameter-group-name zetl-redshift-parameter-group \ 
--parameters ParameterName=enable_case_sensitive_identifier,ParameterValue=ON \ 
--region {region}  
{ 
    "ParameterGroupName": "zetl-redshift-parameter-group", 
    "ParameterGroupStatus": "Your parameter group has been updated. If you changed only dynamic parameters, associated clusters are being modified now. If you changed static parameters, all updates, including dynamic parameters, will be applied when you reboot the associated clusters." 
} 

Select or create the target Redshift cluster

If you already have a Redshift cluster, you can use that, or you can create a new cluster with the following command:

aws redshift create-cluster \
--cluster-identifier zetl-redshift-cluster \
--cluster-parameter-group-name zetl-redshift-parameter-group \
--port 5439 \
--master-username test \
--master-user-password *****\
--node-type ra3.xlplus \
--number-of-nodes 2 \
--maintenance-track CURRENT \
--cluster-subnet-group-name zetl-subnet-group \
--region {region}
{
    "Cluster": {
        "ClusterIdentifier": "zetl-redshift-cluster",
        "NodeType": "ra3.xlplus",
        "ClusterStatus": "creating",
        "ClusterAvailabilityStatus": "Modifying",
        "MasterUsername": "test",
        "DBName": "dev",
        "ClusterSubnetGroupName": "zetl-subnet-group",
        "ClusterNamespaceArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
        //Skipping rest of API response
    }
}

Wait for your cluster to be available, you can make a describe-clusters API call to verify cluster status:

aws redshift describe-clusters --query 'Clusters[?ClusterIdentifier==`zetl-redshift-cluster`]' | grep ClusterStatus 

Configure authorization using an Amazon Redshift resource policy

You can use the Amazon Redshift API operations to configure resource policies that work with zero-ETL integrations.To control the source that can create an inbound integration into the Amazon Redshift namespace, create a resource policy and attach it to the namespace. With the resource policy, you can specify the source that has access to the integration. The resource policy is attached to the namespace of your target data warehouse to allow the source to create an inbound integration to replicate live data from the source into Amazon Redshift.

RDS for MySQL:

aws redshift put-resource-policy \
--resource-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid \
--policy "{ \"Version\": \"2012-10-17\", \"Statement\": \
[ { \"Effect\": \"Allow\",\"Principal\": { \"Service\": \
\"redshift.amazonaws.com\" } , \"Action\": [ \
\"redshift:AuthorizeInboundIntegration\"], \
\"Condition\":{\"StringEquals\":{\"aws:SourceArn\":\"arn:aws:rds:{region}:{account-id}:db:zetl-db\"}} },\ 
{\"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"{account-id}\"}, \
\"Action\":[\"redshift:CreateInboundIntegration\"] }] }" \
--region {region}
{
    "ResourcePolicy": {
        "ResourceArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
        "Policy": "{\n  \"Version\" : \"2012-10-17\",\n  \"Statement\" : [ {\n    \"Effect\" : \"Allow\",\n    \"Principal\" : {\n      \"Service\" : \"redshift.amazonaws.com\"\n    },\n    \"Action\" : \"redshift:AuthorizeInboundIntegration\",\n    \"Resource\" : \"arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid\",\n    \"Condition\" : {\n      \"StringEquals\" : {\n        \"aws:SourceArn\" : \"arn:aws:rds:{region}:{account-id}:db:zetl-db\"\n      }\n    }\n  }, {\n    \"Effect\" : \"Allow\",\n    \"Principal\" : {\n      \"AWS\" : \"arn:aws:iam::{account-id}:root\"\n    },\n    \"Action\" : \"redshift:CreateInboundIntegration\",\n    \"Resource\" : \"arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid\"\n  } ]\n}"
    }
}

RDS for PostgreSQL:

aws redshift put-resource-policy \
--resource-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid \
--policy "{ \"Version\": \"2012-10-17\", \"Statement\": \
[ { \"Effect\": \"Allow\",\"Principal\": { \"Service\": \
\"redshift.amazonaws.com\" } , \"Action\": [ \
\"redshift:AuthorizeInboundIntegration\"], \
\"Condition\":{\"StringEquals\":{\"aws:SourceArn\":\"arn:aws:rds:{region}:{account-id}:db:zetl-pg-db\"}} },\ 
{\"Effect\":\"Allow\", \"Principal\":{\"AWS\":\"{account-id}\"}, \
\"Action\":[\"redshift:CreateInboundIntegration\"] }] }" \
--region {region}
{
    "ResourcePolicy": {
        "ResourceArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
        "Policy": "{\n  \"Version\" : \"2012-10-17\",\n  \"Statement\" : [ {\n    \"Effect\" : \"Allow\",\n    \"Principal\" : {\n      \"Service\" :  \"redshift.amazonaws.com\"\n    },\n    \"Action\" :  \"redshift:AuthorizeInboundIntegration\",\n    \"Resource\" : \"arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid\",\n    \"Condition\" : {\n      \"StringEquals\" : {\n        \"aws:SourceArn\" : \"arn:aws:rds:{region}:{account-id}:db:zetl-pg-db\"\n      }\n    }\n  }, {\n    \"Effect\" : \"Allow\",\n    \"Principal\" : {\n      \"AWS\" : \"arn:aws:iam::{account-id}:root\"\n    },\n    \"Action\" : \"redshift:CreateInboundIntegration\",\n    \"Resource\" : \"arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid\"\n  } ]\n}" 
    }
}

Load data in the source RDS database

If you’re using MySQL, connect to the source MySQL database and run the following commands:

mysql -h zetl-db.************.us-east-1.rds.amazonaws.com  -u test -P 3306 -p
MySQL [(none)]> CREATE DATABASE my_db;
MySQL [(none)]> USE my_db;
MySQL [my_db]> CREATE TABLE books_table (ID int NOT NULL, Title VARCHAR(50) NOT NULL, Author VARCHAR(50) NOT NULL,
Copyright INT NOT NULL, Genre VARCHAR(50) NOT NULL,  (ID));
MySQL [my_db]> INSERT INTO books_table VALUES (1, 'The Shining', 'Stephen King', 1977, 'Supernatural fiction');
MySQL [my_db]> commit;

If you’re using PostgreSQL, connect to the source PostgreSQL database and run the following commands:

psql -h zetl-pg-db.************.us-east-1.rds.amazonaws.com -d zetldb -U test -p 5432 -W
zetldb=> CREATE TABLE books_table (ID int primary key, Title VARCHAR(50), Author VARCHAR(50),Copyright int, Genre VARCHAR(50));
zetldb=> INSERT INTO books_table VALUES (1, 'The Shining', 'Stephen King', 1977, 'Supernatural fiction');
zetldb=> commit;

This data will be our historic data. After we create an integration, we will generate new live data.

Create a Zero-ETL integration

In this step, we create an Amazon RDS zero-ETL integration with Amazon Redshift where we specify the source RDS database and the target Redshift data warehouse. You can optionally also provide data filters, an AWS Key Management Service (AWS KMS) key that you want to use for encryption, tags, and other configurations.

RDS for MySQL:

aws rds create-integration \
--source-arn arn:aws:rds:{region}:{account-id}:db:zetl-db \
--target-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid \
--integration-name zetl-test-integration \
--data-filter "include: *.*" \
--region {region}
{
    "SourceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-db",
    "TargetArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
    "IntegrationName": "zetl-test-integration",
    "IntegrationArn": "arn:aws:rds:{region}:{account-id}:integration:integration-uuid",
    "KMSKeyId": "arn:aws:kms:{region}:{account-id}:key/eda7134d-cd39-4af1-b62b-ad2415b6bccc",
    "Status": "creating",
    "Tags": [],
    "CreateTime": "2025-05-29T16:56:58.117000+00:00",
    "DataFilter": "include: *.*"
}

RDS for PostgreSQL:

aws rds create-integration \
--source-arn arn:aws:rds:{region}:{account-id}:db:zetl-pg-db \
--target-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid \
--integration-name zetl-test-pg-integration \
--data-filter "include: zetldb.*.*" \
--region {region}
{
    "SourceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-pg-db",
    "TargetArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
    "IntegrationName": "zetl-test-pg-integration",
    "IntegrationArn": "arn:aws:rds:{region}:{account-id}:integration:integration-uuid",
    "KMSKeyId": "arn:aws:kms:{region}:{account-id}:key/bfc67da5-33ff-4255-ae85-17fb2eb6503c",
    "Status": "creating",
    "Tags": [],
    "CreateTime": "2025-07-30T23:14:06.575000+00:00",
    "DataFilter": "include: zetldb.*.*"
}

Monitor the integration

Make a describe API call to verify the integration is in active status:

aws rds describe-integrations --region {region}
{
    "Integrations": [
      {
        "SourceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-db",
        "TargetArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
        "IntegrationName": "zetl-test-integration",
        "IntegrationArn": "arn:aws:rds:{region}:{accountid}:integration:integration-uuid",
        "KMSKeyId": "arn:aws:kms:{region}:{accountid}:key/eda7134d-cd39-4af1-b62b-ad2415b6bccc", 
        "AdditionalEncryptionContext": {},
        "Status": "active",
        "Tags": [],
        "CreateTime": "2025-05-29T16:56:58.117000+00:00",
        "DataFilter": "include: *.*"
      }
    ]
}

The SourceArn will be arn:aws:rds:{region}:{account-id}:db:zetl-db for MySQL and arn:aws:rds:{region}:{account-id}:db:zetl-pg-db for PostgreSQL

Verify the solution

To verify the solution, we will use Amazon Redshift Query Editor v2 to verify the data on the target cluster. In order to use Amazon Redshift query editor v2, you need to create a database and connect to it. For instructions, see Creating destination databases in Amazon Redshift. We will be using mydb as the destination database.

Verify the historic data in Amazon Redshift.

MySQL:

MySQL Full Load Data in Redshift

PostgreSQL:

PostgreSQL Full Load Data in Redshift

Next, add some new live data on the source database:

RDS for MySQL:

MySQL [my_db]> INSERT INTO books_table VALUES (2, 'AWS', 'Jeff', 1960, 'Amazon');

RDS for PostgreSQL:

zetldb=> INSERT INTO books_table VALUES (2, 'AWS', 'Jeff', 1960, 'Amazon');

Verify the new changes you made to the source database are now replicated in Amazon Redshift within seconds.

MySQL:

MySQL CDC data in Redshift

PostgreSQL:

PostgreSQL CDC Data in Redshift

You have successfully configured a zero-ETL integration and new changes on the source will be replicated to the target. However, there are a few limitations that apply to RDS zero-ETL integrations with Amazon Redshift.

Clean up

You can clean up after verification is complete:
RDS for MySQL:

aws rds delete-integration \
--integration-identifier arn:aws:rds:{region}:{account-id}:integration:integration-uuid --region {region}
{
    "SourceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-db",
    "TargetArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
    "IntegrationName": "zetl-test-integration",
    "IntegrationArn": "arn:aws:rds:{region}:{account-id}:integration:integration-uuid",
    "KMSKeyId": "arn:aws:kms:{region}:{account-id}:key/eda7134d-cd39-4af1-b62b-ad2415b6bccc",
    "AdditionalEncryptionContext": {},
    "Status": "deleting",
    "Tags": [],
    "CreateTime": "2025-05-29T16:56:58.117000+00:00",
    "DataFilter": "include: *.*"
}

RDS for PostgreSQL:

aws rds delete-integration \
--integration-identifier arn:aws:rds:{region}:{account-id}:integration:integration-uuid --region {region}
{
    "SourceArn": "arn:aws:rds:{region}:{account-id}:db:zetl-pg-db",
    "TargetArn": "arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid",
    "IntegrationName": "zetl-test-pg-integration",
    "IntegrationArn": "arn:aws:rds:{region}:{account-id}:integration:integration-uuid",
    "KMSKeyId": "arn:aws:kms:{region}:{account-id}:key/bfc67da5-33ff-4255-ae85-17fb2eb6503c",
    "AdditionalEncryptionContext": {},
    "Status": "deleting",
    "Tags": [],
    "CreateTime": "2025-07-30T23:14:06.575000+00:00",
    "DataFilter": "include: zetldb.*.*"
}

You can also clean up the Redshift cluster, RDS instance, parameter and subnet group that you created.To delete the Redshift cluster without taking a final snapshot, use the following code:

aws redshift delete-cluster --cluster-identifier zetl-redshift-cluster --skip-final-cluster-snapshot

To delete the RDS DB instance without taking the final snapshot, you can run the following code:

aws rds delete-db-instance --db-instance-identifier zetl-db --skip-final-snapshot

Conclusion

In this post, we showed how you can run a zero-ETL integration from Amazon RDS for MySQL and Amazon RDS for PostgreSQL to Amazon Redshift using the AWS CLI. This minimizes the need to maintain complex data pipelines and enables near real time analytics on transactional and operational data. With zero-ETL integrations, you can focus more on deriving value from your data and less on managing data movement.

As next steps, consider exploring how you can apply this zero-ETL approach to other data sources in your organization. You might also want to investigate how to combine zero-ETL with the advanced analytics capabilities of Amazon Redshift, such as ML integration or federated queries. To learn more about zero-ETL integrations and start implementing them in your own environment, refer to zero-ETL documentation and begin simplifying your data integration today.


About the authors

Harpreet Kaur Chawla

Harpreet Kaur Chawla

Harpreet is a Senior Software Development Engineer at AWS. She specializes in building scalable, secure, serverless solutions that simplify analytics, data integration, and migration strategies, helping organizations modernize their data architectures. She holds several patents in distributed systems and database technology. Outside of work, she enjoys volunteering to support women in tech, judging and mentoring early and mid-career engineers.