AWS Partner Network (APN) Blog
How to Secure Data Movement with Fivetran and AWS PrivateLink
By Amy Peterson, Sr. Group Product Manager – Fivetran
 By Jet Celaj, Sr. Security Sales Engineer – Fivetran
 By Tamara Astakhova, Sr. Partner Solution Architect – AWS
 By Stan Kroshchenko, Solution Architect – AWS
|  | 
| Fivetran | 
|  | 
As companies move to and invest in cloud data platforms to more easily access their valuable data, security becomes a big focus.
Data security is a top concern for enterprises, especially those in highly regulated industries like healthcare and financial services, with regional compliance requirements such as GDPR or CCPA, and for organizations that process sensitive personally identifiable information (PII) or personal health information (PHI).
Data teams need to securely send data between sources and destinations in a way that doesn’t violate security compliance requirements and protects customer and business data to reduce the threat of breaches and associated financial and reputational damage.
Some security teams have put policies in place restricting certain systems from accessing the internet, while others need to enforce firewall policies on traffic. Managing various security requirements for different sources and destinations in this way quickly creates enormous overhead and complexity for data and security teams.
To reduce the technical complexity and manual processes needed to build pipelines that meet various policies, Fivetran offers flexible connection options to securely and reliably move enterprise data from source to destination.
Fivetran is an AWS Data and Analytics Competency Partner with service delivery specializations in Amazon Redshift and AWS PrivateLink. Fivetran is a fully managed data movement platform in the cloud that automatically ingests and centralizes data from hundreds of sources into ready-to-analyze schemas.
Fivetran runs on Amazon Web Services (AWS) and can support customers syncing data from databases and to destinations hosted on AWS, including the Amazon Redshift data warehouse.
Fivetran’s integration with AWS PrivateLink allows customers to send data without using the public internet, which helps protect communication between Amazon Virtual Private Cloud (VPC) and the Fivetran Data Pipelines platform.
In this post, you will learn how using Fivetran with AWS PrivateLink can help customers securely synchronize data sources with their data warehouse destination in the cloud.
Challenge: Avoiding the Public Internet
Fivetran’s customers choose the software-as-a-service (SaaS) platform for data integration because it is fully managed. Fivetran adapts to API changes, supports schema drift, and sets up the necessary infrastructure to deploy in the cloud in a customer’s chosen geography and region.
SaaS, however, can come with limitations to configurability. To address the varying needs of the market when it comes to controlling and securing data flows, Fivetran launched Business Critical to offer, among other features, flexible connection options.
Different connection options include Secure Shell (SSH), reverse virtual private network (VPN), and AWS PrivateLink. Support for PrivateLink ensures customers with strict security requirements reduce the risk of exposing traffic to the public internet when passing through Fivetran into their Redshift environment. With PrivateLink, the data connection runs along the AWS backbone, avoiding any exposure to the public internet.
Fivetran Data Movement with AWS PrivateLink
AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises networks, without exposing traffic to the public internet. PrivateLink makes it easy to connect services across different accounts and VPCs to significantly simplify your network architecture.
Fivetran can connect to data sources and destinations using PrivateLink:
- Amazon Redshift destination
- Amazon EC2-deployed PostgreSQL database
- Amazon RDS
- Amazon S3
- On-premises Oracle database which is connected to the customer’s VPC using AWS Direct Connect
- Databricks destinations hosted on AWS
- Snowflake destinations hosted on AWS
The following architecture illustrates the integration of Fivetran with AWS PrivateLink:
Figure 1 – Fivetran with AWS PrivateLink integration.
In the diagram above, there are three VPCs and a corporate data center that are all communicating over a private IP address. Between the VPCs, AWS PrivateLink is being used to keep all traffic on private IP addresses.
Additionally, AWS Direct Connect is used between customer VPC B and the corporate data center to maintain private IP communications to the on-premises database.
Setup Steps
Amazon Redshift
To set up AWS PrivateLink between Fivetran and an Amazon Redshift destination, simply create a Redshift-managed endpoint and grant access to Fivetran. For the connection to succeed, the Redshift instance must be in the same AWS region that Fivetran is deployed in.
AWS EC2-Deployed Database Instance (such as Postgres)
- Create an endpoint service (service provider) dedicated to the Amazon Elastic Compute Cloud (Amazon EC2)-deployed database and share that information with Fivetran.
- Fivetran will generate an interface endpoint to connect (consume service) to that service.
- Approve Fivetran’s access request.
- Since AWS endpoint services only work with a Network Load Balancer (NLB), you’ll need to create or use an existing NLB which will route the traffic directly to the EC2 instance.
Amazon RDS for MySQL
- Create an endpoint service (service provider) dedicated to the Amazon RDS for MySQL database and share that information with Fivetran.
- Fivetran will generate an interface endpoint to connect (consume service) to that service.
- Approve Fivetran’s access request.
- Since AWS endpoint services only work with a Network Load Balancer, you’ll need to create or use an existing NLB.
Note that NLBs can only route traffic to an EC2 instance, IP address, or AWS Lambda function. In the case of Amazon Relational Database Service (Amazon RDS) databases, a port forwarder is recommended because RDS instances don’t have a dedicated IP address or EC2 instance ID. When configured correctly, the port forwarder will be able to reach the RDS database even when its IP address changes.
On-Premises Database
- Create an endpoint service (service provider) dedicated to the EC2-deployed database and share that information with Fivetran.
- Fivetran will generate an interface endpoint to connect (consume service) to that service.
- Approve Fivetran’s access request.
- Since AWS endpoint services only work with a Network Load Balancer, you’ll need to create or use an existing NLB which will route the traffic to the private IP address of the on-premises database through an existing AWS Direct Connect connection.
Note that detailed setup guides can be found in the Fivetran documentation.
Customer Success Story
A real estate and shared space customer of Fivetran and AWS has faced a stricter compliance landscape since becoming a publicly traded company. Fivetran empowers this organization, among many others, to maintain data governance and compliance, ingesting data from hundreds of cloud-based and on-premises sources into Snowflake— all in their AWS cloud environment.
For example, Fivetran pulls data from Postgres databases at each location and consolidates it in Snowflake. Standardizing this information in a central location in the cloud gives stakeholders real-time insight into occupancy, turnover, outstanding renewals, member growth, and profit margins per location/per region and holistically across the company. This visibility allows the organization’s community managers, senior leadership team, and investors to see what’s working and what’s not, so they can draw specific conclusions about contributing factors and prioritize resources to replicate success across locations.
There are also security and compliance benefits: Fivetran’s VPN tunneling capabilities, for example, allow the company to meet strict access-control requirements.
Another Fivetran customer in the healthcare space uses AWS PrivateLink to securely sync its application data to an Amazon Redshift destination. The sensitive nature of the personal health information (PHI) data being synced meant the organization did not want any traffic flowing over the public internet.
By utilizing AWS PrivateLink to connect to its Postgres RDS sources and Amazon Redshift destination, the customer can confidently sync this important data with Fivetran while staying entirely within the AWS environment.
Conclusion
With AWS PrivateLink in Fivetran, users get full control over how data traffic reaches its destination, since the endpoint is on their virtual private cloud.
Utilizing AWS PrivateLink eliminates the need for ongoing maintenance of the connection and avoids any risk of data exposure by misconfiguration or security vulnerability. AWS PrivateLink can also eliminate the need to set up proxies between sources and destinations as a stopgap for traffic control.
Automation provided by Fivetran enables enterprises to securely accelerate migration of data to the cloud for analytics, ensures regulatory compliance requirements are being met, and reduces overall data stack infrastructure complexity, which saves time and bandwidth for engineering teams.
Fivetran is available through AWS Marketplace. Start a free trial today.
.
 
 .
Fivetran – AWS Partner Spotlight
Fivetran is an AWS Competency Partner and fully managed data movement platform that automatically ingests and centralizes data from hundreds of sources into ready-to-analyze schemas.
