AWS Big Data Blog

Category: Announcements

Unlock granular resource control with queue-based QMR in Amazon Redshift Serverless

With Amazon Redshift Serverless queue-based Query Monitoring Rules (QMR), administrators can define workload-aware thresholds and automated actions at the queue level—a significant improvement over previous workgroup-level monitoring. You can create dedicated queues for distinct workloads such as BI reporting, ad hoc analysis, or data engineering, then apply queue-specific rules to automatically abort, log, or restrict queries that exceed execution-time or resource-consumption limits. By isolating workloads and enforcing targeted controls, this approach protects mission-critical queries, improves performance predictability, and prevents resource monopolization—all while maintaining the flexibility of a serverless experience. In this post, we discuss how you can implement your workloads with query queues in Redshift Serverless.

Amazon EMR Serverless eliminates local storage provisioning, reducing data processing costs by up to 20%

In this post, you’ll learn how Amazon EMR Serverless eliminates the need to configure local disk storage for Apache Spark workloads through a new serverless storage capability. We explain how this feature automatically handles shuffle operations, reduces data processing costs by up to 20%, prevents job failures from disk capacity constraints, and enables elastic scaling by decoupling storage from compute.

Amazon EMR HBase on Amazon S3 transitioning to EMR S3A with comparable EMRFS performance

Starting with version 7.10, Amazon EMR is transitioning from EMR File System (EMRFS) to EMR S3A as the default file system connector for Amazon S3 access. This transition brings HBase on Amazon S3 to a new level, offering performance parity with EMRFS while delivering substantial improvements, including better standardization, improved portability, stronger community support, improved performance through non-blocking I/O, asynchronous clients, and better credential management with AWS SDK V2 integration. In this post, we discuss this transition and its benefits.

Introducing Apache Iceberg materialized views in AWS Glue Data Catalog

Hundreds of thousands of customers build artificial intelligence and machine learning (AI/ML) and analytics applications on AWS, frequently transforming data through multiple stages for improved query performance—from raw data to processed datasets to final analytical tables. Data engineers must solve complex problems, including detecting what data has changed in base tables, writing and maintaining transformation […]

Introducing AWS Glue 5.1 for Apache Spark

AWS recently announced Glue 5.1, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue 5.1 upgrades the Spark engines to Apache Spark 3.5.6, giving you newer Spark release along with the newer dependent libraries so you can develop, run, and scale your data integration workloads and get insights faster. In this post, we describe what’s new in AWS Glue 5.1, key highlights on Spark and related libraries, and how to get started on AWS Glue 5.1.

Auto-optimize your Amazon OpenSearch Service vector database

AWS recently announced the general availability of auto-optimize for the Amazon OpenSearch Service vector engine. This feature streamlines vector index optimization by automatically evaluating configuration trade-offs across search quality, speed, and cost savings. You can then run a vector ingestion pipeline to build an optimized index on your desired collection or domain. Previously, optimizing index […]

Build billion-scale vector databases in under an hour with GPU acceleration on Amazon OpenSearch Service

AWS recently announced the general availability of GPU-accelerated vector (k-NN) indexing on Amazon OpenSearch Service. You can now build billion-scale vector databases in under an hour and index vectors up to 10 times faster at a quarter of the cost. This feature dynamically attaches serverless GPUs to boost domains and collections running CPU-based instances. With […]