AWS Big Data Blog
Category: Analytics
Create a Healthcare Data Hub with AWS and Mirth Connect
As anyone visiting their doctor may have noticed, gone are the days of physicians recording their notes on paper. Physicians are more likely to enter the exam room with a laptop than with paper and pen. This change is the byproduct of efforts to improve patient outcomes, increase efficiency, and drive population health. Pushing for […]
Decreasing Game Churn: How Upopa used ironSource Atom and Amazon ML to Engage Users
This is a guest post by Tom Talpir, Software Developer at ironSource. ironSource is as an Advanced AWS Partner Network (APN) Technology Partner and an AWS Big Data Competency Partner. Ever wondered what it takes to keep a user from leaving your game or application after all the hard work you put in? Wouldn’t it be great […]
Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning
Air travel can be stressful due to the many factors that are simply out of airline passengers’ control. As passengers, we want to minimize this stress as much as we can. We can do this by using past data to make predictions about how likely a flight will be delayed based on the time of […]
Serving Real-Time Machine Learning Predictions on Amazon EMR
The typical progression for creating and using a trained model for recommendations falls into two general areas: training the model and hosting the model. Model training has become a well-known standard practice. We want to highlight one of many ways to host those recommendations (for example, see the Analyzing Genomics Data at Scale using R, […]
Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight
February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Ben Snively is a Solutions Architect with AWS Speed and agility are essential with today’s analytics tools. The quicker you can get from idea to first results, the more you can experiment […]
Run Jupyter Notebook and JupyterHub on Amazon EMR
NOTE: Please note that as of EMR 5.14.0, JupyterHub is an officially supported application. We recommend you use the most recent version of EMR if you would like to run JupyterHub on EMR. In addition, EMR Notebooks allow you to create and open Jupyter notebooks with the Amazon EMR console. We will not provide any […]
Respond to State Changes on Amazon EMR Clusters with Amazon CloudWatch Events
Jonathan Fritz is a Senior Product Manager for Amazon EMR Customers can take advantage of the Amazon EMR API to create and terminate EMR clusters, scale clusters using Auto Scaling or manual resizing, and submit and run Apache Spark, Apache Hive, or Apache Pig workloads. These decisions are often triggered from cluster state-related information. Previously, […]
Building an Event-Based Analytics Pipeline for Amazon Game Studios’ Breakaway
All software developers strive to build products that are functional, robust, and bug-free, but video game developers have an extra challenge: they must also create a product that entertains. When designing a game, developers must consider how the various elements—such as characters, story, environment, and mechanics—will fit together and, more importantly, how players will interact […]
Using SaltStack to Run Commands in Parallel on Amazon EMR
Miguel Tormo is a Big Data Support Engineer in AWS Premium Support Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. Amazon EMR defines three types of nodes: master node, core nodes, and task nodes. It’s common to […]
Joining and Enriching Streaming Data on Amazon Kinesis
Are you trying to move away from a batch-based ETL pipeline? You might do this, for example, to get real-time insights into your streaming data, such as clickstream, financial transactions, sensor data, customer interactions, and so on. If so, it’s possible that as soon as you get down to requirements, you realize your streaming data […]









