EMR Studio is an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark.
It is a fully managed application with single sign-on, fully managed Jupyter Notebooks, automated infrastructure provisioning, and the ability to debug jobs without logging into the AWS Console or cluster. Data scientists and analysts can install custom kernels and libraries, collaborate with peers using code repositories like GitHub and BitBucket, or execute parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow, AWS Step Functions, and Amazon Managed Workflows for Apache Airflow. You can read orchestrating analytics jobs on Amazon EMR notesbooks using Amazon MWAA to learn more. EMR Studio kernels and applications run on EMR clusters, so you get the benefit of distributed data processing using the performance optimized Amazon EMR runtime for Apache Spark. Administrators can setup EMR Studio for analysts to run their applications on existing EMR clusters or create new clusters using pre-defined AWS CloudFormation templates for EMR.