AWS Big Data Blog

Category: Amazon SageMaker Unified Studio

Simplify real-time analytics with zero-ETL from Amazon DynamoDB to Amazon SageMaker Lakehouse

At AWS re:Invent 2024, we introduced a no code zero-ETL integration between Amazon DynamoDB and Amazon SageMaker Lakehouse, simplifying how organizations handle data analytics and AI workflows. In this post, we share how to set up this zero-ETL integration from DynamoDB to your SageMaker Lakehouse environment.

Embracing event driven architecture to enhance resilience of data solutions built on Amazon SageMaker

This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.

Unify streaming and analytical data with Amazon Data Firehose and Amazon SageMaker Lakehouse

In this post, we show you how to create Iceberg tables in Amazon SageMaker Unified Studio and stream data to these tables using Firehose. With this integration, data engineers, analysts, and data scientists can seamlessly collaborate and build end-to-end analytics and ML workflows using SageMaker Unified Studio, removing traditional silos and accelerating the journey from data ingestion to production ML models.

Access Amazon Redshift Managed Storage tables through Apache Spark on AWS Glue and Amazon EMR using Amazon SageMaker Lakehouse

With SageMaker Lakehouse, you can access tables stored in Amazon Redshift managed storage (RMS) through Iceberg APIs, using the Iceberg REST catalog backed by AWS Glue Data Catalog. This post describes how to integrate data on RMS tables through Apache Spark using SageMaker Unified Studio, Amazon EMR 7.5.0 and higher, and AWS Glue 5.0.

Unified scheduling for visual ETL flows and query books in Amazon SageMaker Unified Studio

Today, we’re excited to introduce a new unified scheduling feature that simplifies this process. SageMaker Unified Studio allows you to create ETL flows using a visual interface and write SQL analytics queries using query books. In this post, we walk through how to schedule your visual ETL flows and query books with just a few clicks, explore the underlying architecture, and demonstrate how this feature can streamline your data workflow automation.

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and Amazon Redshift

This series of posts demonstrates how you can onboard and access existing AWS data sources using SageMaker Unified Studio. This post focuses on onboarding existing AWS Glue Data Catalog tables and database tables available in Amazon Redshift.

Access your existing data and resources through Amazon SageMaker Unified Studio, Part 2: Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

In this post we discuss integrating additional vital data sources such as Amazon Simple Storage Service (Amazon S3) buckets, Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, and Amazon EMR clusters. We demonstrate how to configure the necessary permissions, establish connections, and effectively use these resources within SageMaker Unified Studio. Whether you’re working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows.

Streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio

We’re excited to introduce a new enhancement to the search experience in Amazon SageMaker Catalog, part of the next generation of Amazon SageMaker—exact match search using technical identifiers. In this post, we demonstrate how to streamline data discovery with precise technical identifier search in Amazon SageMaker Unified Studio.

Connect, share, and query where your data sits using Amazon SageMaker Unified Studio

In this blog post, we will demonstrate how business units can use Amazon SageMaker Unified Studio to discover, subscribe to, and analyze these distributed data assets. Through this unified query capability, you can create comprehensive insights into customer transaction patterns and purchase behavior for active products without the traditional barriers of data silos or the need to copy data between systems.