Skip to main content

Amazon SageMaker Catalog

Amazon SageMaker Catalog

Discover, govern, and collaborate on data and AI securely

Overview

Amazon SageMaker Catalog simplifies the discovery, governance, and collaboration for data and AI across your structured and unstructured data, AI models, business intelligence dashboards, and applications. You can securely discover and access approved data and models using semantic search with generative AI-created metadata or just ask Amazon Q Developer with natural language to find your data. Users can consistently define and enforce access policies using a single permission model with fine-grained access controls centrally in the Amazon SageMaker Unified Studio. Seamlessly share and collaborate on data and AI assets through easy publishing and subscribing workflows. Build trust throughout your organization with data quality monitoring, data classification, and end-to-end automated column-level lineage for data and AI assets.

Benefits

Discover your data and AI assets at scale with SageMaker Catalog, built on Amazon Datazone. Enhance data discovery with generative AI to automatically enrich your data and metadata with business context, making it easier for all users to find, understand, and use data. Share your data, AI models, prompts, and generative AI assets with filtering by table and column names or business glossary terms. Automatically recommend valuable columns and relevant analytical applications for each dataset, enabling the use of the right data to quickly build the right models. Support both centralized and decentralized governance models with seamless data and AI sharing through publishing and subscribing workflows in a single experience through Projects.

Gain trust through real-time visibility of data quality and data and ML lineage in SageMaker. Automate data profiling and data quality recommendations, monitor data quality rules, and receive alerts. Resolve hard-to-find data quality challenges by using rule-based and ML approaches to reconcile entities so you can deliver high-quality data to make confident business decisions. Drive transparency in data pipelines and AI projects with built-in model monitoring to detect bias or report on how features contribute to your model prediction.

Centralize data and AI security in SageMaker with fine-grained access controls, data classification, and guardrails to ensure data, analytics, and AI models are appropriately used. Define permissions once, and enforce them across data and models. With Amazon Bedrock natively integrated, customers can use Amazon Bedrock Guardrails in their generative AI application by blocking harmful content, filtering hallucinations, and enabling customizable safeguards for privacy, safety, and accuracy. Automatically identify sensitive information within your pipelines using Amazon Comprehend.

Meet audit and regulatory compliance with data usage and model logging and monitoring. Support acceptable use of your analytics and AI assets across your enterprise with project-based isolation. Understand data and model usage across your lakehouse for enhanced security. Use Amazon SageMaker Clarify to monitor models for bias, accuracy, and robustness, aligning with your responsible AI standards. Align costs to business initiatives and provides a clear view of your business investments.

Features

Data and AI Catalog

Discover, govern, and collaborate on structured data, unstructured data, AI models, BI dashboards, and applications from a single catalog. 

Missing alt text value

Business Glossary

Standardize terminology with shared business definitions and customizable metadata forms. Support restricted classification terms to enforce consistent tagging of sensitive data and enable downstream governance workflows.

Missing alt text value

Data Lineage

Track how data moves and changes across systems. OpenLineage-compatible lineage helps users understand origins, transformations, and consumption patterns to improve trust, debugging, and governance.

Missing alt text value

Data Quality Monitoring

View data quality metrics from AWS and third-party tools. Consumers gain trust and context when searching, while data teams can integrate external quality signals through APIs into a unified portal.

Missing alt text value

Data Discovery

Enrich technical metadata with business context so users can quickly find, understand, and trust the data they use.

Missing alt text value

Automated Metadata Recommendations

Use LLM-powered automation to generate business-friendly names and descriptions, improving context, consistency, and clarity of technical assets.

Missing alt text value

BI Dashboards

Go from data to insights by bringing together data in SageMaker with Amazon Quick Suite capabilities like interactive dashboards, pixel perfect reports and generative business intelligence (BI) - all in a governed and automated manner. 

Missing alt text value

Data Products

Package related assets into business-focused data products with shared metadata. Improve discovery, unify access requests, and reduce administrative overhead while enabling governance teams to track product-level consumption.

Missing alt text value

Customers

Natera, Inc.

“By integrating Amazon QuickSight with Amazon SageMaker, our lab operations teams and scientists can now monitor clinical test performance across all sites in real time. We’ve developed unified dashboards that consolidate throughput, quality control metrics, and turnaround times, enabling detailed trend analysis and ongoing performance optimization. Scientists can now perform comprehensive data analysis – from exploratory review to model development – all within a single, integrated environment.”

Mirko Buholzer, VP of Software Engineering, Natera, Inc.

How Natera scales genomics with Amazon SageMaker Catalog

Cisco

"You want to discover, share, and govern your data. Whether you call it a data mesh or a data fabric, data exists across different teams in multiple silos, and you need a way to bring it together. Amazon SageMaker Catalog connects data producers and consumers, enabling producers to share data with built-in controls and data contracts while allowing consumers to access the data using the tools of their choice"

Shaja Arul Selvamani, Sr. Director AI/ML, Cisco

The Cisco logo featuring the company's name in blue with a stylized bridge design above it.

NatWest

"Our Data Platform Engineering team has been deploying multiple end-user tools for data engineering, ML, SQL, and gen AI tasks. As we look to simplify processes across the bank, we’ve been looking at streamlining user authentication and data access authorization. Amazon SageMaker delivers a ready-made user experience to help us deploy one single environment across the organization, reducing the time required for our data users to access new tools by around 50%."

Zachery Anderson, CDAO, NatWest Group

Missing alt text value

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages