AWS Big Data Blog

Category: Amazon SageMaker Data & AI Governance

Automate email notifications for governance teams working with Amazon SageMaker Catalog

In this post, we show you how to create custom notifications for events occurring in SageMaker Catalog using Amazon EventBridge, AWS Lambda, and Amazon SNS. You can expand this solution to automatically integrate SageMaker Catalog with in-house enterprise workflow tools like ServiceNow and Helix.

Visualize data lineage using Amazon SageMaker Catalog for Amazon EMR, AWS Glue, and Amazon Redshift

Amazon SageMaker offers a comprehensive hub that integrates data, analytics, and AI capabilities, providing a unified experience for users to access and work with their data. Through Amazon SageMaker Unified Studio, a single and unified environment, you can use a wide range of tools and features to support your data and AI development needs, including […]

Introducing restricted classification terms for governed classification in Amazon SageMaker Catalog

Security and compliance concerns are key considerations when customers across industries rely on Amazon SageMaker Catalog. Customers use SageMaker Catalog to organize, discover, and govern data and machine learning (ML) assets. A common request from domain administrators is the ability to enforce governance controls on certain metadata terms that carry compliance or policy significance. Examples […]

Develop and deploy a generative AI application using Amazon SageMaker Unified Studio

In this post, we demonstrate how to use Amazon Bedrock Flows in SageMaker Unified Studio to build a sophisticated generative AI application for financial analysis and investment decision-making.

Automate data lineage in Amazon SageMaker using AWS Glue Crawlers supported data sources

In this post, we explore its real-world impact through the lens of an ecommerce company striving to boost their bottom line. To illustrate this practical application, we walk you through how you can use the prebuilt integration between SageMaker Catalog and AWS Glue crawlers to automatically capture lineage for data assets stored in Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Introducing MCP Server for Apache Spark History Server for AI-powered debugging and optimization

Today, we’re announcing the open source release of Spark History Server MCP, a specialized Model Context Protocol (MCP) server that transforms this workflow by enabling AI assistants to access and analyze your existing Spark History Server data through natural language interactions. This project, developed collaboratively by AWS open source and Amazon SageMaker Data Processing, turns complex debugging sessions into conversational interactions that deliver faster, more accurate insights without requiring changes to your current Spark infrastructure. You can use this MCP server with your self-managed or AWS managed Spark History Servers to analyze Spark applications running in the cloud or on-premises deployments.

Unifying data insights with Amazon QuickSight and Amazon SageMaker

Amazon SageMaker has announced an integration with Amazon QuickSight, bringing together data in SageMaker seamlessly with QuickSight capabilities like interactive dashboards, pixel perfect reports and generative business intelligence (BI)—all in a governed and automated manner. In this post, we walk through the complete process of integrating Amazon QuickSight with Amazon SageMaker Unified Studio, demonstrating how teams can move from raw data to published dashboards in a secure and governed environment.

Unifying metadata governance across Amazon SageMaker and Collibra

Amazon Web Services (AWS) and Collibra have built a new integrated solution that demonstrates the integration between the Collibra Platform and the next generation of Amazon SageMaker. In this post, we take a closer look at the integration, describe the use cases it enables, walk through the architecture, and show how to implement the solution in your environment.

Develop and monitor a Spark application using existing data in Amazon S3 with Amazon SageMaker Unified Studio

In this post, we demonstrate how to develop and monitor a Spark application using existing data in Amazon S3 using SageMaker Unified Studio. The solution addresses key challenges organizations face in managing big data analytics workloads through an integrated development environment where data teams can develop, test, and refine Spark applications while leveraging EMR Serverless for dynamic resource allocation and built-in monitoring tools.

Embracing event driven architecture to enhance resilience of data solutions built on Amazon SageMaker

This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.