Skip to main content

Guidance for Connecting CDPs to Data Lakes with AWS Clean Rooms

Overview

This Guidance shows you how to use customer data platforms (CDPs) to set up a collaboration between first-party marketing data and third-party data from a publishing partner. By using an AWS Clean Rooms collaboration, CDPs can facilitate the connection between separate data lakes on AWS. Marketers can upload their data to the CDP application, then use the application to run reports from the compiled data, helping them activate their audiences.

How it works

This architecture diagram shows how marketers using customer data platforms (CDPs) can set up AWS Clean Rooms collaborations with publishing partners to combine first- and third-party customer data directly.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Amazon CloudWatch, which continuously monitors operations and enables access to log files, is configurable so you can monitor the reliability, availability, and performance of AWS Clean Rooms. AWS CloudTrail automatically tracks event histories, enabling you to access information about who made requests to AWS Clean Rooms, the IP address from which the request was made, when it was made, and additional details. You can also configure an event trail for more details in tracking API requests.

Read the Operational Excellence whitepaper

This Guidance lets you use scoped-down AWS Identity and Access Management (IAM) policies to provide specific users and roles access. Using IAM, you can apply the principle of least privilege to restrict who can access and run queries on AWS Clean Rooms.

Read the Security whitepaper

Amazon S3 stores multiple copies of data across Availability Zones, providing 99.999999999 percent durability of the data stored within S3 buckets. Additionally, AWS Glue and AWS Clean Rooms are serverless and fully managed by AWS, so the overall infrastructure is elastic, highly available, and fault tolerant, with built-in reliability and resiliency.

Read the Reliability whitepaper

AWS Glue crawlers enable you to quickly scan and define the schemas for your data and register these schemas to your Data Catalog. You can configure these crawlers to run on a schedule or use an invocation to crawl source data. You can also configure AWS Glue to scale up or down within a specified range of AWS Glue job workers so that it only uses as much compute capacity as needed. Additionally, AWS Clean Rooms enables you to share subsets of your data quickly and securely, and it only provisions the necessary capacity to implement a query.

Read the Performance Efficiency whitepaper

Amazon S3 provides low-cost storage for building data lakes and storing data. It also provides different storage tiers and lifecycle policies to optimize storage. For example, you can use Amazon S3 Intelligent-Tiering to provide automated data archiving based on usage or implement lifecycle policies to move data between storage tiers, helping you optimize costs. Additionally, this Guidance uses pay-as-you-go services, so you pay only for what you consume.

Read the Cost Optimization whitepaper

AWS Clean Rooms enables you to share only subsets of your data, reducing the need for data duplication across multiple platforms. Additionally, this Guidance reduces the need for CDPs to create custom solutions that might require additional compute resources. AWS Glue and AWS Clean Rooms are both serverless services, which means they scale seamlessly to meet compute needs, such as by provisioning only the compute resources required to run a query. This enables you to avoid unnecessary compute and waste of resources so that you use the least amount of carbon generation necessary.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.