AWS Big Data Blog
Category: Intermediate (200)
Unlock granular resource control with queue-based QMR in Amazon Redshift Serverless
With Amazon Redshift Serverless queue-based Query Monitoring Rules (QMR), administrators can define workload-aware thresholds and automated actions at the queue level—a significant improvement over previous workgroup-level monitoring. You can create dedicated queues for distinct workloads such as BI reporting, ad hoc analysis, or data engineering, then apply queue-specific rules to automatically abort, log, or restrict queries that exceed execution-time or resource-consumption limits. By isolating workloads and enforcing targeted controls, this approach protects mission-critical queries, improves performance predictability, and prevents resource monopolization—all while maintaining the flexibility of a serverless experience. In this post, we discuss how you can implement your workloads with query queues in Redshift Serverless.
How Slack achieved operational excellence for Spark on Amazon EMR using generative AI
In this post, we show how Slack built a monitoring framework for Apache Spark on Amazon EMR that captures over 40 metrics, processes them through Kafka and Apache Iceberg, and uses Amazon Bedrock to deliver AI-powered tuning recommendations—achieving 30–50% cost reductions and 40–60% faster job completion times.
AWS analytics at re:Invent 2025: Unifying Data, AI, and governance at scale
re:Invent 2025 showcased the bold Amazon Web Services (AWS) vision for the future of analytics, one where data warehouses, data lakes, and AI development converge into a seamless, open, intelligent platform, with Apache Iceberg compatibility at its core. Across over 18 major announcements spanning three weeks, AWS demonstrated how organizations can break down data silos, […]
Amazon EMR Serverless eliminates local storage provisioning, reducing data processing costs by up to 20%
In this post, you’ll learn how Amazon EMR Serverless eliminates the need to configure local disk storage for Apache Spark workloads through a new serverless storage capability. We explain how this feature automatically handles shuffle operations, reduces data processing costs by up to 20%, prevents job failures from disk capacity constraints, and enables elastic scaling by decoupling storage from compute.
How Taxbit achieved cost savings and faster processing times using Amazon S3 Tables
In this post, we discuss how Taxbit partnered with Amazon Web Services (AWS) to streamline their crypto tax analytics solution using Amazon S3 Tables, achieving 82% cost savings and five times faster processing times.
Power data ingestion into Splunk using Amazon Data Firehose
With Kinesis Data Firehose, customers can use a fully managed, reliable, and scalable data streaming solution to Splunk. In this post, we tell you a bit more about the Kinesis Data Firehose and Splunk integration. We also show you how to ingest large amounts of data into Splunk using Kinesis Data Firehose.
Accelerate Apache Hive read and write on Amazon EMR using enhanced S3A
In this post, we demonstrate how Apache Hive on Amazon EMR 7.10 delivers significant performance improvements for both read and write operations on Amazon S3.
Introducing AWS Glue 5.1 for Apache Spark
AWS recently announced Glue 5.1, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue 5.1 upgrades the Spark engines to Apache Spark 3.5.6, giving you newer Spark release along with the newer dependent libraries so you can develop, run, and scale your data integration workloads and get insights faster. In this post, we describe what’s new in AWS Glue 5.1, key highlights on Spark and related libraries, and how to get started on AWS Glue 5.1.
Achieve 2x faster data lake query performance with Apache Iceberg on Amazon Redshift
In 2025, Amazon Redshift delivered several performance optimizations that improved query performance over twofold for Iceberg workloads on Amazon Redshift Serverless, delivering exceptional performance and cost-effectiveness for your data lake workloads. In this post, we describe some of the optimizations that led to these performance gains.
Accelerate data lake operations with Apache Iceberg V3 deletion vectors and row lineage
In this post, we walk you through the new capabilities in Iceberg V3, explain how deletion vectors and row lineage address these challenges, explore real-world use cases across industries, and provide practical guidance on implementing Iceberg V3 features across AWS analytics, catalog, and storage services.









