You can use Amazon Managed Service for Apache Flink for many use cases to process data continuously, getting insights in seconds or minutes rather than waiting days or even weeks. Amazon Managed Service for Apache Flink enables you to quickly build end-to-end stream processing applications for log analytics, clickstream analytics, Internet of Things (IoT), ad tech, gaming, and more. The four most common use cases are streaming extract-transform-load (ETL), continuous metric generation, responsive real-time analytics, and interactive querying of data streams.
Streaming ETL
With streaming ETL applications, you can clean, enrich, organize, and transform raw data prior to loading your data lake or data warehouse in real time, reducing or eliminating batch ETL steps. These applications can buffer small records into larger files prior to delivery and perform sophisticated joins across streams and tables. For example, you can build an application that continuously reads IoT sensor data stored in Amazon Managed Streaming for Apache Kafka (Amazon MSK), organize the data by sensor type, remove duplicate data, normalizes data per a specified schema, and then deliver the data to Amazon Simple Storage Service (Amazon S3).
Continuous metric generation
With continuous metric generation applications, you can monitor and understand how your data is trending over time. Your applications can aggregate streaming data into critical information and seamlessly integrate it with reporting databases and monitoring services to serve your applications and users in real time. With Amazon Managed Service for Apache Flink, you can use Apache Flink code (in Java, Scala, Python, or SQL) to continuously generate time series analytics over time windows. For example, you can build a live leaderboard for a mobile game by computing the top players every minute and then sending it to Amazon DynamoDB. You can also track the traffic to your website by calculating the number of unique website visitors every 5 minutes and then sending the processed results to Amazon Redshift.
Responsive real-time analytics
Responsive real-time analytics applications send real-time alarms or notifications when certain metrics reach predefined thresholds or, in more advanced cases, when your application detects anomalies using machine learning (ML) algorithms. With these applications, you can respond immediately to changes in your business in real time such as predicting user abandonment in mobile apps and identifying degraded systems. For example, an application can compute the availability or success rate of a customer-facing API over time and then send results to Amazon CloudWatch. You can build another application to look for events that meet certain criteria, and then automatically notify the right customers using Amazon Kinesis Data Streams and Amazon Simple Notification Service (Amazon SNS).
Interactive analysis of data streams
Interactive analysis helps you to stream data exploration in real time. With ad hoc queries or programs, you can inspect streams from Amazon MSK or Amazon Kinesis Data Streams and visualize what data looks like within those streams. For example, you can view how a real-time metric that computes the average over a time window behaves and send the aggregated data to a destination of your choice. Interactive analysis also helps with iterative development of stream processing applications. The queries you build continuously update as new data arrives. With Amazon Managed Service for Apache Flink Studio, you can deploy these queries to run continuously with auto scaling and durable state backups enabled.