AWS Partner Network (APN) Blog
Embrace Data Sovereignty and Low Latency in Building a Trusted Data Lake with AWS Outposts and Talend
By Tamara Astakhova, Sr. Partner Solution Architect – AWS
 By Shashank Jain, Solutions Engineering Manager – Talend
 By Naveen Gupta, Sr. Principal Product Manager – Talend
 By Stefan Glover, Strategic Alliance Director – Talend
|  | 
Globalization and digitalization require companies to comply with data protection rules in the limitless world of the internet. Businesses must establish checks and controls to comply with regulations relating to data protection and privacy.
With data growing by terabytes and petabytes per day, many businesses are struggling to analyze and respond to provide a personalized customer experience.
Meeting this challenge requires maximum data agility and availability along with minimal data latency. This must happen while also minimizing or mitigating regulatory liability around compliance with local government laws on data confidentiality and data sovereignty.
Integration with AWS Outposts makes Talend Data Fabric accessible to customers where and when they need it. It allows you to quickly integrate, clean, and move data from hundreds of different sources with low-latency, and meet data sovereignty regulatory requirements for hybrid deployments.
Talend is an AWS Partner with the Migration Competency and AWS Outposts service designation. Talend Data Fabric combines data integration, data integrity, and data governance in a single, unified platform that makes it easy to collect, transform, clean, govern and share your data.
In this post, we’ll discuss how Talend Data Fabric can help you improve operational efficiency, reduce risk, ensure regulatory compliance, and maximize data value.
About AWS Outposts
AWS Outposts is a fully managed service that offers the same Amazon Web Services (AWS) infrastructure, services, APIs, and tools to virtually any data center, co-location space, or on-premises facility for a truly consistent hybrid experience.
AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, and data residency.
AWS compute, storage, database, and other services run locally on AWS Outposts, and customers can access the full range of AWS services available in the region to build, manage, and scale their on-premises applications using familiar AWS services and tools.
Talend Data Fabric Deployment Architecture
The Talend Data Fabric architecture has two major components: the data plane (primarily the remote engine where customer’s data is processed via Talend Jobs), and the control plane (the administration part of the jobs without customer data).
Thus, Talend Data Fabric can be deployed in AWS Outposts in two different options:
Hybrid Solution
Design, development, and deployment of Talend Data Fabric (Talend Remote Engine) on Amazon Elastic Compute Cloud (Amazon EC2).
With this approach, customer data always remains within AWS Outposts. The control plane, also called the Talend Management Console (TMC), is hosted in an AWS region, and is securely connected with customer’s AWS Outposts.
This approach is suggested for customers who are already using or are going to use the TMC and want to migrate data from on-premises or other clouds to Amazon Simple Storage Service (Amazon S3).
This is also a recommended approach by Talend, as it not only achieves data sovereignty goals, but also helps customers reap the benefits of managing jobs using Talend’s latest innovations in the cloud.
On-Premises Solution
Design, development, deployment, and administration of Talend Data Fabric on-premises in Amazon EC2. In this approach, both the data plane (Talend Remote Engine) and the control plane (Talend Administrative Console) are deployed within AWS Outposts.
This is suggested for customers where, due to strict data privacy laws, even Talend Jobs metadata cannot be transported outside AWS Outposts.
Architecture for Hybrid Solution
The architecture diagram in Figure 1 shows how to ingest data into Amazon S3 running on AWS Outposts from on-premises application, database, or other cloud using Talend Data Fabric.
Figure 1 – Talend hybrid architecture with AWS Outposts.
Designing of data flow or Talend Jobs should be done using Talend Studio, which is installed on a laptop or EC2 running on AWS Outposts.
Data flow or Talend Jobs execution takes place at Talend Remote Engine, which processes and computes all business rules on customer data and moves data to S3 running on AWS Outposts from on-premises application, database, or other cloud.
Talend Remote Engine is paired with the Talend Management Console via an outgoing port (443). The type of metadata transferred between TMC and Talend Remote Engine includes status information and metrics of Talend jobs, lifecycle commands, credentials, logs, and task artefact binaries.
TMC is a software-as-a-service (SaaS) platform offered by Talend and is hosted on AWS as a fully managed by the Talend service. It is the control plane in this solution.
Architecture for On-Premises Solution
The architecture diagram in Figure 2 shows how to ingest data into S3 running on AWS Outposts from on-premises application, database or other cloud using Talend Data Fabric.
Figure 2 – Talend on-premises architecture with AWS Outposts.
Designing of data flow or Talend Jobs should be done using Talend Studio, which is installed on a laptop or EC2 running on AWS Outposts.
Data flow or Talend Jobs execution takes place at the Job Server, which processes and computes all business rules on data and moves data to S3 running on AWS Outposts from on-premises application, database, or other cloud.
Job Server is mapped with the Talend Administrative Console (TAC) for operational purposes.
TAC is an on-premises administration solution offered by Talend and installed on EC2 managed by the customer or Talend. It is the control plane in this solution.
Benefits of the Integration
Talend’s ability to deploy their enterprise-tier capabilities via Talend Data Fabric on AWS Outposts, making the best use of Talend’s robust and flexible data platform architecture, provides customers with a variety of advantages for numerous vertical specific use cases.
Employing Talend Data Fabric modules such as Talend Studio and Talend Remote Engine, deployed on an EC2 instance, integrated with S3 running on AWS Outposts, brings data integration, data quality, data governance, and local job processing at latency and geographical specifications per customer demands.
In addition, these modules can communicate securely with the Talend Management Console deployed on AWS that performs monitoring and management tasks. Customers can use the same AWS APIs, tools, and security controls to run, manage, and secure their applications locally as they do in the cloud.
AWS Outposts allows organizations to securely store and process customer data that must remain locally situated. Talend Remote Engine processes Talend Jobs, allowing customers to perform data integration tasks within AWS Outposts that help meet the data residency and low latency requirements.
In addition, AWS Outposts provides customers with a fully managed infrastructure supported by AWS. This helps to reduce the time, resources, operational risk, and maintenance downtime required for IT infrastructure management.
In collaboration with AWS, Talend can address the business needs of maintaining data locality requirements and processing large volumes of data with low latency in an on-premises environment with AWS Outposts.
Customer Story
A financial services institution is embarking on a cloud transformation journey, strategically targeting an AWS-backed data lake for data migration, consolidation, and accessibility requirements. However, it’s constrained by data sovereignty regulations in the country in which it operations.
The customer looked to deploy a hybrid strategy leveraging AWS Outposts. They use Talend Data Fabric and its integration with S3 and EC2 to connect with, and load data from, multiple enterprise sources into a data lake on AWS Outposts at the speed demanded by the business.
This approach makes use of Talend’s Remote Engines, running on AWS Outposts supported EC2 instances.
Summary
Talend Data Fabric on AWS Outposts is the realization of growing customer demand for solutions which meet their data regulatory and low data latency needs.
Any organization with current or emerging regulatory liability should be seeking to match the inherent benefits of AWS with the sophistication of Talend capabilities in a location which meets their business requirements.
The rapidly expanding use cases illustrating the exponential growth of data generation and scenarios, which insist upon data proximity to minimize data latency, are a powerful argument for the combination of AWS Outposts and Talend Data Fabric.
Without this leading combination of solutions, customers face an increasingly difficult path to effective access, govern, and utilize their data.
To see the solution in action, sign up for a free 14-day trial of Talend Data Fabric on AWS. For more information, visit the Talend for AWS microsite or contact awsalliance@talend.com.
Talend is available for purchase through AWS Marketplace. Please speak to your Talend account representative for custom purchase options through AWS Marketplace Private Offer.
Talend – AWS Partner Spotlight
Talend is an AWS Competency Partner that provides a data integration platform enabling companies to accelerate migrations to cloud data lakes and warehouses on AWS.
Contact Talend | Partner Overview | AWS Marketplace
*Already worked with Talend? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.


