Skip to main content

Amazon SageMaker Catalog FAQs

Data and AI Governance

Open all

The next generation of Amazon SageMaker simplifies the discovery, governance, and collaboration for data and AI across your lakehouse, AI models, and applications. With Amazon SageMaker Catalog, built on Amazon DataZone, users can securely discover and access approved data and models using semantic search with generative AI created metadata, or you could just ask Q Developer with natural language to find your data. Users can define and enforce access policies consistently using a single permission model with fine-grained access controls centrally in the SageMaker Unified Studio. Seamlessly share and collaborate on data and AI assets through easy publishing and subscribing workflows. With Amazon SageMaker, you can safeguard and protect your AI models using Amazon Bedrock guardrails and implement responsible AI policies. Build trust throughout your organization with data quality monitoring and automation, sensitive data detection and data and ML lineage.

You can access SageMaker Catalog through the Amazon SageMaker Unified Studio, which is a single environment for data and AI development. To programmatically set up, configure, or integrate with existing processes, SageMaker Catalog has APIs published with guidelines on how to use existing Amazon DataZone APIs.

You can launch Amazon QuickSight directly from Amazon SageMaker Unified Studio and build dashboards using project data, without any manual setup. Dashboards are automatically tied to the Amazon SageMaker project and can be published to the SageMaker Catalog for discovery and sharing. This streamlines the path from data exploration to insight delivery, eliminating tool switching and governance gaps.

The data and AI Governance Amazon in Amazon SageMaker helps data teams with:

  • Faster data discovery and collaboration: Users can quickly find and share relevant data across the organization, reducing time spent searching for information and promoting teamwork.
  • Improved trust through lineage and quality: Tracking data origin and improving data quality to increase confidence in data-driven decisions and AI model outputs.
  • Enhanced data and AI model security: Securing data and Models to only be accessible via projects, it ensures only those authorized to see the assets in the project can access it, maintaining security and privacy standards.
  • Reduced business risk and better regulatory compliance: Logging activities help organizations align to industry regulations and internal policies, helping to reduce organizational risks.
  • Unlock business productivity with asset search and discovery: Search and discover data and AI assets to empower teams, reduce time spent finding critical assets, and drive faster, data-driven decision-making.
  • Centralized data access policy management: Define and manage data access rules from a single point, leading to consistent application across various AWS services and third-party environments.
  • Data enrichment with business context and classification: Add metadata and categorization to datasets, making it easier for users to understand data relevance and applicability to specific business needs.
  • Log activities for users and systems: Monitor and record interactions with data and AI systems, providing visibility into usage patterns and potential security issues.
  • AI/ML data governance implementation: Extend data governance principles to AI and machine learning processes, ensuring that only approved data is used in model training and that AI systems adhere to defined permissions and ethical guidelines.

Amazon SageMaker Catalog is built on Amazon DataZone, offering the same governance capabilities in a unified user experience. Amazon DataZone experience continues to stay as is to enable existing Amazon DataZone customers to continue using the familiar interface if they so desire.

The pricing details can be found here: https://aws.amazon.com/datazone/pricing/.

Amazon SageMaker Unified Studio and SageMaker Catalog are built on Amazon DataZone (using the same back-end entity store/database, identity and access mechanisms, and APIs) and are therefore included in the scope of all of the same compliance programs as Amazon DataZone. Please refer to the list of Services in Scope by Compliance Program to view the programs for which Amazon DataZone is in scope. This includes SOC, certain ISO certifications, PCI DSS, and HITRUST CSF. Amazon Datazone is also included in the list of HIPAA eligible services.

You can launch Amazon QuickSight directly from Amazon SageMaker Unified Studio and build dashboards using project data, without any manual setup. Dashboards are automatically tied to the Amazon SageMaker project and can be published to the SageMaker Catalog for discovery and sharing. This streamlines the path from data exploration to insight delivery, eliminating tool switching and governance gaps.

Yes. With support for S3 general purpose buckets, you can now catalog unstructured data by creating S3 Object Collections in SageMaker Catalog. These assets can be enriched with business metadata, discovered by other teams, and securely shared using built-in access controls.

Amazon SageMaker automatically ingests metadata from sources like the AWS Glue Data Catalog into SageMaker Catalog when a domain is created or updated. There’s no need to manually configure IAM roles or write ingestion scripts. As a result, datasets are immediately searchable and ready for use in machine learning, analytics, and governance workflows.