AWS Big Data Blog
Category: Analytics
Using Attunity CloudBeam at UMUC to Replicate Data to Amazon RDS and Amazon Redshift
Matt Yanchyshyn is a Principal Solutions Architect at AWS. Brad Helicher, Director of Cloud Business at Attunity, also contributed to this post. Attunity is an APN Big Data Competency Partner. Introduction University of Maryland University College’s mission is to provide a quality education at an affordable cost to busy professionals, mainly adults who are juggling […]
Ensuring Consistency When Using Amazon S3 and Amazon Elastic MapReduce for ETL Workflows
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that […]
Statistical Analysis with Open-Source R and RStudio on Amazon EMR
Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Big Data is on every CIO’s mind. It is synonymous with technologies like Hadoop and the ‘NoSQL’ class of databases. Another technology shaking things up in Big Data is R. This blog post describes how to set up R, RHadoop packages and RStudio […]
Using Amazon EMR with SQL Workbench and other BI Tools
This is a guest post by Kyle Porter, a Sales Engineer at Simba Technologies. Jon Einkauf, a Senior Product Manager for Amazon Elastic MapReduce and AWS Senior Technical Writer Jeff Slone also contributed to this post. —————- Note: Ports have changed on EMR 4.x,. Before walking through this post, please consult the EMR documentation to […]
Using Amazon EMR and Tableau to Analyze and Visualize Data
Rahul Bhartia is an AWS Solutions Architect Introduction Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Pig and Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. […]
Using Amazon Redshift to Analyze Your Elastic Load Balancer Traffic Logs
Biff Gaut is a Solutions Architect with AWS Introduction With the introduction of Elastic Load Balancing (ELB) access logs, administrators have a tremendous amount of data describing all traffic through their ELB. While Amazon Elastic MapReduce (Amazon EMR) and some partner tools are excellent solutions for ongoing, extensive analysis of this traffic, they can require […]
Getting Started with Amazon EMR Bootstrap Actions
Steve McPherson is a Senior Manager for Amazon Elastic MapReduce Note: This post was updated 2/8/16. The Presto bootstrap action documented in the original post has been deprecated because EMR now offers a Presto-Sandbox as a full-fledged EMR application. For details, see the EMR sandbox. Amazon Elastic MapReduce (EMR) is a fully managed Hadoop-as-a-service platform […]
Hosting Amazon Kinesis Applications on AWS Elastic Beanstalk
Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon Kinesis provides a scalable and highly available platform for ingesting data from thousands of clients. Once data is available on a Kinesis stream, you can build applications to process the data using the Kinesis Client Library (KCL). KCL provides a framework for managing many […]
Best Practices for Micro-Batch Loading on Amazon Redshift
NOTE: Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to Amazon Redshift. For more information, please visit the Amazon Kinesis Data Firehose documentation page, “Choosing Amazon Redshift for Your Destination.” February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New […]
Building a Recommender with Apache Mahout on Amazon Elastic MapReduce (EMR)
This is a guest post by Andrew Musselman, who as chief data scientist leads the global big data practice from the technical side at Accenture. He is a PMC member on the Apache Mahout project and is writing a book on data science for O’Reilly. Accenture is an APN Big Data Competency Partner. This post […]


