AWS Big Data Blog

Category: Learning Levels

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.

Integrate ThoughtSpot with Amazon Redshift using AWS IAM Identity Center

In this post, we walk you through the process of setting up ThoughtSpot integration with Amazon Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.

Optimize multimodal search using the TwelveLabs Embed API and Amazon OpenSearch Service

In this blog post, we show you the process of integrating TwelveLabs Embed API with OpenSearch Service to create a multimodal search solution. You’ll learn how to generate rich, contextual embeddings from video content and use OpenSearch Service’s vector database capabilities to enable search functionalities. By the end of this post, you’ll be equipped with the knowledge to implement a system that can transform the way your organization handles and extracts value from video content.

Correlate telemetry data with Amazon OpenSearch Service and Amazon Managed Grafana

In this post, we show you how to use Amazon OpenSearch Service and Amazon Managed Grafana to correlate the various observability signals that improve root cause analysis, thereby resulting in reduced Mean Time to Resolution (MTTR). We also provide a reference solution that can be used at scale for proactive monitoring of enterprise applications to avoid a problem before they occur.

Build a data lakehouse in a hybrid Environment using Amazon EMR Serverless, Apache DolphinScheduler, and TiDB

This post discusses a decoupled approach of building a serverless data lakehouse using AWS Cloud-centered services, including Amazon EMR Serverless, Amazon Athena, Amazon Simple Storage Service (Amazon S3), Apache DolphinScheduler (an open source data job scheduler) as well as PingCAP TiDB, a third-party data warehouse product that can be deployed either on premises or on the cloud or through a software as a service (SaaS).

Migrate from Standard brokers to Express brokers in Amazon MSK using Amazon MSK Replicator

Creating a new cluster with Express brokers is straightforward, as described in Amazon MSK Express brokers. However, if you have an existing MSK cluster, you need to migrate to a new Express based cluster. In this post, we discuss how you should plan and perform the migration to Express brokers for your existing MSK workloads on Standard brokers. Express brokers offer a different user experience and a different shared responsibility boundary, so using them on an existing cluster is not possible. However, you can use Amazon MSK Replicator to copy all data and metadata from your existing MSK cluster to a new cluster comprising of Express brokers.