Skip to main content

Amazon Managed Service for Apache Flink FAQs

General

Open all
Companies are ingesting data faster than ever because of an explosive growth of real-time data sources. Whether you are handling log data from mobile and web applications, purchase data from ecommerce platforms, or sensor data from IoT devices, ingesting data in real time helps you learn what your customers, organization, and business are doing right now.

Getting started

Open all

Key concepts

Open all
Amazon Managed Service for Apache Flink supports applications built using Java, Scala, and Python with the open source Apache Flink libraries and your own custom code. Amazon Managed Service for Apache Flink also supports applications built using Java with the open source Apache Beam libraries and your own customer code. Amazon Managed Service for Apache Flink Studio supports code built using Apache Flink–compatible SQL, Python, and Scala.

Managing applications

Open all

Pricing and billing

Open all

Building Apache Flink applications

Open all

You can start by downloading the open source libraries including the AWS SDK, Apache Flink, and connectors for AWS services. Get instructions on how to download the libraries and create your first application in the Amazon Managed Service for Apache Flink Developer Guide.

You write your Apache Flink code using data streams and stream operators. Application data streams are the data structure you perform processing against using your code. Data continuously flows from the sources into application data streams. One or more stream operators are used to define your processing on the application data streams, including transform, partition, aggregate, join, and window. Data streams and operators can be connected in serial and parallel chains. A short example using pseudo code is shown below.

DataStream <GameEvent> rawEvents = env.addSource(

New KinesisStreamSource(“input_events”));

DataStream <UserPerLevel> gameStream =

rawEvents.map(event - > new UserPerLevel(event.gameMetadata.gameId,

event.gameMetadata.levelId,event.userId));

gameStream.keyBy(event -> event.gameId)

.keyBy(1)

.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))

.apply(...) - > {...};

gameStream.addSink(new KinesisStreamSink("myGameStateStream"));

Amazon Managed Service for Apache Flink supports all operators from Apache Flink that can be used to solve a wide variety of use cases including map, KeyBy, aggregations, windows, joins, and more. For example, the map operator allows you to perform arbitrary processing, taking one element from an incoming data stream and producing another element. KeyBy logically organizes data using a specified key so that you can process similar data points together. Aggregations performs processing across multiple keys such as sum, min, and max. Window Join joins two data streams together on a given key and window. 

You can build custom operators if these do not meet your needs. Find more examples in the Operators section of the Amazon Managed Service for Apache Flink Developer Guide. You can find a full list of Apache Flink operators in the Apache Flink documentation

You can add a source or destination to your application by building upon a set of primitives enabling you to read and write from files, directories, sockets, or anything that you can access over the internet. Apache Flink provides these primitives for data sources and data sinks. The primitives come with configurations like the ability to read and write data continuously or once, asynchronously or synchronously, and much more. For example, you can setup an application to read continuously from Amazon S3 by extending the existing file-based source integration.

Yes. Amazon Managed Service for Apache Flink applications provide your application 50 GB of running application storage per KPU. Amazon Managed Service for Apache Flink scales storage with your application. Running application storage is used for saving application state using checkpoints. It is also accessible to your application code to use as temporary disk for caching data or any other purpose. Amazon Managed Service for Apache Flink can remove data from running application storage not saved through checkpoints (such as operators, sources, sinks) at any time. All data stored in running application storage is encrypted at rest.
With snapshots, you can create and restore your application to a previous point in time. You can maintain previous application state and roll back your application at any time because of this. You control how many snapshots you have at any given time, from zero to thousands of snapshots. Snapshots use durable application backups and Amazon Managed Service for Apache Flink charges you based on their size. Amazon Managed Service for Apache Flink encrypts data saved in snapshots by default. You can delete individual snapshots through the API or all snapshots by deleting your application.

Building Amazon Managed Service for Apache Flink Studio applications in a managed notebook

Open all

You can start from the Amazon Managed Service for Apache Flink Studio, Amazon Kinesis Data Streams, or Amazon MSK consoles in a few steps to launch a serverless notebook to immediately query data streams and perform interactive data analytics.

Interactive data analytics: You can write code in the notebook in SQL, Python, or Scala to interact with your streaming data, with query response times in seconds. You can use built-in visualizations to explore the data, view real-time insights on your streaming data from within your notebook, and develop stream processing applications powered by Apache Flink.

Once your code is ready to run as a production application, you can transition with a single step to a stream processing application that processes gigabytes of data per second, without servers.

Stream processing application: Once you are ready to promote your code to production, you can build your code by clicking “Deploy as stream processing application” in the notebook interface or issue a single command in the CLI. Studio takes care of all the infrastructure management necessary for you to run your stream processing application at scale, with auto scaling and durable state enabled, just as in an Amazon Managed Service for Apache Flink application.

You can write code in the notebook in your preferred language of SQL, Python, or Scala using Apache Flink’s Table API. The Table API is a high-level abstraction and relational API that supports a superset of SQL’s capabilities. It offers familiar operations, such as select, filter, join, group by, aggregate, and so on, along with stream-specific concepts, such as windowing. You use % to specify the language to be used in a section of the notebook and can switch between languages. Interpreters are Apache Zeppelin plugins, so you can specify a language or data processing engine for each section of the notebook. You can also build user-defined functions and reference them to improve code functionality.

You can perform SQL operations such as the following:

  • Scan and filter (SELECT, WHERE) 
  • Aggregations (GROUP BY, GROUP BY WINDOW, HAVING) 
  • Set (UNION, UNIONALL, INTERSECT, IN, EXISTS) 
  • Order (ORDER BY, LIMIT) 
  • Joins (INNER, OUTER, Timed Window – BETWEEN, AND, Joining with Temporal Tables – tables that track changes over time)
  • Top-N
  • Deduplication
  • Pattern recognition

Some of these queries, such as GROUP BY, OUTER JOIN, and Top-N, are results updating for streaming data, which means that the results are continuously updating as the streaming data is processed. Other DDL statements, such as CREATE, ALTER, and DROP, are also supported. For a complete list of queries and samples, see the Apache Flink Queries documentation.

Apache Flink’s Table API supports Python and Scala through language integration using Python strings and Scala expressions. The operations supported are very similar to the SQL operations supported, including select, order, group, join, filter, and windowing. A full list of operations and samples are included in our developer guide.

You can configure additional integrations with a few more steps and lines of Apache Flink code (Python, Scala, or Java) to define connections with all Apache Flink supported integrations. This includes destinations such as Amazon OpenSearch Service, Amazon ElastiCache for Redis, Amazon Aurora, Amazon Redshift, Amazon DynamoDB, Amazon Keyspaces, and more. You can attach executables for these custom connectors when you create or configure your Amazon Managed Service for Apache Flink Studio application.

Service Level Agreement

Open all

You are eligible for an SLA Service Credit for Amazon Managed Service for Apache Flink under the Amazon Managed Service for Apache Flink SLA if more than one Availability Zone in which you are running a task, within the same AWS Region, has a Monthly Uptime Percentage of less than 99.9% during any monthly billing cycle. For full details on all the SLA terms and conditions as well as details on how to submit a claim, visit the Amazon Managed Service for Apache Flink SLA details page.