AWS Big Data Blog
Category: Analytics
Enhance Amazon EMR observability with automated incident mitigation using Amazon Bedrock and Amazon Managed Grafana
In this post, we demonstrate how to integrate real-time monitoring with AI-powered remediation suggestions, combining Amazon Managed Grafana for visualization, Amazon Bedrock for intelligent response recommendations, and AWS Systems Manager for automated remediation actions on Amazon Web Services (AWS).
Build data pipelines with dbt in Amazon Redshift using Amazon MWAA and Cosmos
In this post, we explore a streamlined, configuration-driven approach to orchestrate dbt Core jobs using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) and Cosmos, an open source package. These jobs run transformations on Amazon Redshift. With this setup, teams can collaborate effectively while maintaining data quality, operational efficiency, and observability.
The Amazon SageMaker lakehouse architecture now automates optimization configuration of Apache Iceberg tables on Amazon S3
The Amazon SageMaker lakehouse architecture now automates optimization of Iceberg tables stored in Amazon S3 with catalog-level configuration, optimizing storage in your Iceberg tables and improving query performance. This post demonstrates an end-to-end flow to enable catalog level table optimization setting.
Boosting search relevance: Automatic semantic enrichment in Amazon OpenSearch Serverless
In this post, we show how automatic semantic enrichment removes friction and makes the implementation of semantic search for text data seamless, with step-by-step instructions to enhance your search functionality.
Create an OpenSearch dashboard with Amazon OpenSearch Service
This post demonstrates how to harness OpenSearch Dashboards to analyze logs visually and interactively. With this solution, IT administrators, developers, and DevOps engineers can create custom dashboards to monitor system behavior, detect anomalies early, and troubleshoot issues faster through interactive charts and graphs.
Build a multi-tenant healthcare system with Amazon OpenSearch Service
In this post, we address common multi-tenancy challenges and provide actionable solutions for security, tenant isolation, workload management, and cost optimization across diverse healthcare tenants.
Integrate scientific data management and analytics with the next generation of Amazon SageMaker, Part 1
In this blog post, AWS introduces a solution to a common challenge in scientific research – the inefficient management of fragmented scientific data. The post demonstrates how the next generation of Amazon SageMaker, through its Unified Studio and Catalog features, helps scientists streamline their workflow by integrating data management and analytics capabilities.
Develop and deploy a generative AI application using Amazon SageMaker Unified Studio
In this post, we demonstrate how to use Amazon Bedrock Flows in SageMaker Unified Studio to build a sophisticated generative AI application for financial analysis and investment decision-making.
Near real-time streaming analytics on protobuf with Amazon Redshift
In this post, we explore an end-to-end analytics workload for streaming protobuf data, by showcasing how to handle these data streams with Amazon Redshift Streaming Ingestion, deserializing and processing them using AWS Lambda functions, so that the incoming streams are immediately available for querying and analytical processing on Amazon Redshift.
Amazon Redshift out-of-the-box performance innovations for data lake queries
In this post, we first briefly review how planner statistics are collected and what impact they have on queries. Then, we discuss Amazon Redshift features that deliver optimal plans on Iceberg tables and Parquet data even with the lack of statistics. Finally, we review some example queries that now execute faster because of these latest Amazon Redshift innovations.