AWS Big Data Blog
Category: Best Practices
Perform per-project cost allocation in Amazon SageMaker Unified Studio
Amazon SageMaker Unified Studio enables per-project cost allocation through resource tagging, allowing organizations to track and manage costs across different projects and domains effectively. This post demonstrates how to implement cost tracking using AWS Billing and Cost Management tools, including Cost Explorer and Data Exports, to help finance and business analysts follow FinOps best practices for controlling cloud infrastructure costs.
How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture
In this post, we show you how Stifel implemented a modern data platform using AWS services and open data standards, building an event-driven architecture for domain data products while centralizing the metadata to facilitate discovery and sharing of data products.
Enhance stability with dedicated cluster manager nodes using Amazon OpenSearch Service
In this post, we show how to enhance the stability of your OpenSearch Service domain with dedicated cluster manager nodes and how using these in deployment enhances your cluster’s stability and reliability.
Building serverless event streaming applications with Amazon MSK and AWS Lambda
In this post, we describe how you can simplify your event-driven application architecture using AWS Lambda with Amazon MSK. We demonstrate how to configure Lambda as a consumer for Kafka topics, including a cross-account setup and how to optimize price and performance for these applications.
Enhance security and performance with TLS 1.3 and Perfect Forward Secrecy on Amazon OpenSearch Service
Amazon OpenSearch Service recently introduced a new Transport Layer Security (TLS) policy Policy-Min-TLS-1-2-PFS-2023-10, which supports the latest TLS 1.3 protocol and TLS 1.2 with Perfect Forward Secrecy (PFS) cipher suites. This new policy improves security and enhances OpenSearch performance. In this post, we discuss the benefits of this new policy and how to enable it using the AWS Command Line Interface (AWS CLI).
Embracing event driven architecture to enhance resilience of data solutions built on Amazon SageMaker
This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.
Architecture patterns to optimize Amazon Redshift performance at scale
In this post, we will show you five Amazon Redshift architecture patterns that you can consider to optimize your Amazon Redshift data warehouse performance at scale using features such as Amazon Redshift Serverless, Amazon Redshift data sharing, Amazon Redshift Spectrum, zero-ETL integrations, and Amazon Redshift streaming ingestion.
Best practices for upgrading Amazon MWAA V1.x to V2.x
In this post, we explore best practices for upgrading your Amazon MWAA environment and provide a step-by-step guide to seamlessly transition to the latest version.
Enhancing data durability in Amazon EMR HBase on Amazon S3 with the Amazon EMR WAL feature
In this post, we dive deep into the new Amazon EMR WAL feature to help you understand how it works, how it enhances durability, and why it’s needed. We explore several scenarios that are well-suited for this feature.
Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with Amazon EMR Serverless
In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to Amazon EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.