AWS Big Data Blog
Category: Analytics
Using AWS Lambda for Event-driven Data Processing Pipelines
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Persist Streaming Data to Amazon S3 using Amazon Data Firehose and AWS Lambda
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Streaming data analytics is becoming main-stream (pun intended) in large enterprises as the technology stacks have become more user-friendly to implement. For example, Spark-Streaming connected to an Amazon Kinesis stream is a […]
Automating Analytic Workflows on AWS
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Analyze Data with Presto and Airpal on Amazon EMR
Songzhi Liu is a Professional Services Consultant with AWS You can now launch Presto version 0.119 on Amazon EMR, allowing you to easily spin up a managed EMR cluster with the Presto query engine and run interactive analysis on data stored in Amazon S3. You can integrate with Spot instances, publish logs to an S3 […]
How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Introduction to Python UDFs in Amazon Redshift
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services When your doctor takes out a prescription pad at your yearly checkup, do you ever stop to wonder what goes into her thought process as she decides on which drug to scribble down? We assume that journals of scientific evidence coupled […]
Using BlueTalon with Amazon EMR
This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. Amazon Elastic MapReduce (Amazon EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]
Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR
This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]
Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library
Kevin Deng is an SDE with the Amazon Kinesis team and is the lead author of the Amazon Kinesis Producer Library How do you vertically scale an Amazon Kinesis producer application by 100x? While it’s easy to get started with streaming data into Amazon Kinesis, streaming large volumes of data efficiently and reliably presents some […]
Connecting R with Amazon Redshift
Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Amazon Redshift is a fast, petabyte-scale cloud data warehouse for PB of data. AWS customers are moving huge amounts of structured data into Amazon Redshift to offload analytics workloads or to operate their DWH fully in the cloud. Business intelligence and analytic teams […]


