Skip to main content

Guidance for Connecting Data Sources for Advertising and Marketing Analytical Workloads on AWS

Overview

This Guidance introduces data ingestion patterns for connecting advertising and marketing data to AWS services. Data can come from a variety of data stores, and once activated, can be used for setting up a customer 360 profile, an AWS Clean Rooms collaboration, artificial intelligence and machine learning (AI/ML) training, and analytics applications. This Guidance includes an overview architecture diagram demonstrating the data pipeline in addition to six architectural patterns that show different approaches to provision data for your analytical workloads.

How it works

Overview

This architecture diagram shows an overview of how to connect data sources stored in a variety of data sources to AWS.

Diagram illustrating the AWS architecture for connecting data sources in advertising and marketing analytics. Shows the flow from data sources (SaaS applications, databases, file shares) through ingestion, storage, and transformation, to consumption by services like AWS Clean Rooms, AWS Entity Resolution, Amazon SageMaker, Amazon Redshift, Amazon Athena, and Amazon QuickSight with use cases such as Customer 360, data clean rooms, and measurement attribute verification.

Connecting Amazon Ads and Amazon Selling Partner Data to AWS – API Pull Pattern with AWS Lambda

This architecture diagram shows data ingestion and integration patterns for the Amazon Ads and Amazon Selling Partner APIs.

Architecture diagram showing how to connect advertising and marketing data sources using AWS services such as EventBridge, Lambda, Step Functions, Amazon S3, AWS Glue, DynamoDB, Amazon SNS, and security services, for API data pulls and analytical workloads.

Connecting SaaS Application Data to AWS – API Pull Pattern with Amazon AppFlow

This architecture diagram shows introduces data ingestion and a pull pattern for data available in SaaS applications.

Architecture diagram showing the ingestion and processing of data from SaaS applications using Amazon AppFlow, Amazon EventBridge, AWS Step Functions, AWS Lambda, AWS Glue, Amazon S3, and orchestration with Amazon DynamoDB and SNS, including security with AWS Secrets Manager, AWS KMS, IAM, and CloudWatch.

Connecting SaaS Applications to AWS – Push Pattern with Amazon S3

This architecture diagram shows data ingestion and a push pattern for data available in SaaS applications.

Architecture diagram showing integration of SaaS data sources into AWS Cloud for advertising and marketing analytics using Amazon S3, EventBridge, Lambda, Glue, SNS, and security components such as Secrets Manager and IAM.

Connecting RDBMS Sources to AWS – Batch Pull and Change Data Capture Pattern

This architecture diagram shows how to build a connector for relational database management systems (RDBMS) to AWS.

Architecture diagram showing a batch pull CDC (Change Data Capture) pattern for advertising and marketing analytics on AWS. Illustrates data flow from cloud and relational databases to Amazon S3 using AWS Glue and AWS DMS, event triggering with Amazon EventBridge, data processing with AWS Glue and Step Functions, cataloging with AWS Glue Data Catalog, notification with Amazon SNS, and security with AWS Secrets Manager, AWS KMS, IAM, and Amazon CloudWatch.

How it works (continued)

Connecting SFTP Data Sources to AWS – Managed File Transfer Pattern

This architecture diagram shows how to build a connector for file systems to AWS.

Architecture diagram illustrating the flow of advertising and marketing analytics data in AWS. It shows files transferred via SFTP to AWS Transfer Family, processed by managed workflows and AWS Lambda, with data stored in Amazon S3 and notifications sent via Amazon SNS, featuring integrated security components like AWS Secrets Manager, AWS KMS, IAM, and Amazon CloudWatch.

Connecting File and Cloud Object Storage to AWS – File Replication Pattern

This architecture diagram shows how to build a connector for cloud-based object storage services to AWS.

Diagram showing file replication for advertising and marketing analytics using AWS services including DataSync, S3, Lambda, Glue, EventBridge, SNS, and associated security features.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

The services in this Guidance are serverless, which eliminates the need for users to manage (virtual or bare metal) servers. For example, Step Functions is a serverless managed service for building workflows and reduces undifferentiated heavy lifting associated with building and managing a workflow solution. AWS Glue is a serverless managed service for data processing tasks.

Similarly, the following services eliminate the need for capacity management: Amazon SNS for notifications, AWS KMS for key management, Secrets Manager for secrets, EventBridge for event driven architectures, DynamoDB for low-latency NoSQL databases, AppFlow for integrating with third-party applications, Transfer Family for file transfer protocols, DataSync for discovery and sync of remote data sources (on-premises or other clouds), and AWS DMS for a managed data migration service that simplifies migration between supported databases.

Read the Operational Excellence whitepaper

IAM manages least privilege access to specific resources and operations. AWS KMS provides encryption for data at rest and data in transit using Pretty Good Privacy (PGP) encryption of data files. Secrets Manager provides secrets for remote system access and hashing keys for personally identifiable information (PII) data. CloudWatch monitors logs and metrics across all services used in this Guidance. As managed services, these services not only support a security strong posture, but help free up time for you to focus efforts on data and application logic for fortified security.

Read the Security whitepaper

Use of Lambda in the pipeline is limited to file-level processing, such as decryption. This avoids the pipeline from hitting the 15-minute run time limit. For all row-level processing, AWS Glue Spark engine scales to handle large volume of data processing. Additionally, you can use Step Functions to set up retries, back-off rates, max attempts, intervals, and timeouts for any failed AWS Glue job.

Read the Reliability whitepaper

The serverless services in this Guidance (including Step Functions, AWS Glue, Lambda, EventBridge, and Amazon S3) reduce the amount of underlying infrastructure you need to manage, allowing you to focus on solving your business needs. You can use automated deployments to quickly deploy the architectural components into any AWS Region while also addressing data residency and low latency requirements.

Read the Performance Efficiency whitepaper

When AWS Glue performs data transformations, you only pay for infrastructure during the time the processing is occurring. For Data Catalog, you pay a simple monthly fee for storing and accessing the metadata. With EventBridge Free Tier, you can schedule rules to initiate a data processing workflow. With a Step Functions workflow, you are charged based on the number of state transitions. In addition, through a tenant isolation model and resource tagging, you can automate cost usage alerts to help you measure costs specific to each tenant, application module, and service.

Read the Cost Optimization whitepaper

Serverless services used in this Guidance (such as AWS Glue, Lambda, and Amazon S3) automatically optimize resource utilization in response to demand. You can extend this Guidance by using Amazon S3 lifecycle configuration to define policies that move objects to different storage classes based on access patterns.

Read the Sustainability whitepaper

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.