Skip to main content

Guidance for Media Super Resolution on AWS

Machine learning powers video resolution upscaling

Overview

This Guidance demonstrates how to use a type of artificial intelligence (AI) called "generative AI" to convert videos from low-resolution into high-definition. Many media companies have extensive archives of older video content originally encoded in now outdated lower resolutions, like standard definition. Modern display technology can now support sharper ultra-high-definition formats like 4K resolution. However, manually remastering expansive archives is extremely labor-intensive. You can configure this Guidance to solve that challenge; it uses generative AI that can magnify and extrapolate missing details in low-quality videos to increase the resolution. This prepares even grainy, dated footage for today's high-resolution screens and 4K television standards that consumers now expect when watching content.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

This Guidance complements your operational needs through end-to-end visibility and scalable automation across the video upscaling pipeline. For example, Amazon CloudWatch metrics and DynamoDB task tracking provide process oversight to monitor performance, help you identify issues, and troubleshoot root causes. Lambda facilitates automated, zero-downtime deployments, so infrastructure updates happen seamlessly without manual overhead. Additionally, ParallelCluster automates consistent infrastructure provisioning across environments using infrastructure as code (IaC) for simplified change control. And lastly, Amazon Elastic Container Registry (Amazon ECR) centralizes machine learning (ML) model containers for easy large-scale inference deployments using a unified script.

Read the Operational Excellence whitepaper 

Robust security protections span the entire workflow while simplifying access control. Input and output Amazon S3 buckets use presigned URLs or AWS Identity and Access Management (IAM) roles to grant temporary access, keeping data locked down. Authenticated users connect through Amazon Cognito to the private ALB. ParallelCluster seals ML inferencing within an isolated VPC, reachable only through Systems Manager. Also, administrators can restrict access to the FSx for Lustre file system with VPC security groups, and the data can be encrypted through AWS KMS. Lastly, AWS CloudTrail centralizes activity logging for audit visibility.

Read the Security whitepaper 

The serverless frontend architecture provided by Elastic Load Balancing (ELB), Fargate, and DynamoDB provides high availability within a Region. In addition, you can deploy the transcoding and ML compute nodes across multiple Availability Zones (AZs) in a highly available manner. You can also deploy the Slurm controllers in a primary or secondary model across multiple AZs for resilience against failures. The FSx for Lustre file system stores data in cost-optimized storage for short-term, process-heavy workloads, such as transcoding. However, the source and final material are stored in Amazon S3 for high durability.

Read the Reliability whitepaper 

This Guidance achieves scalable performance efficiency by using ParallelCluster, which auto-scales GPU resources to match dynamic batch processing needs, avoiding overprovisioning costs. Just-in-time job placement further optimizes infrastructure utility by intelligently assigning video workflows across the heterogeneous cluster. Lambda functions scale the invocation count for video frame extractions, while provisioned concurrency guarantees low-latency responses. Finally, using a shared FSx for Lustre file system across the cluster provides low-latency with read and/or write access for individual video frames.

Read the Performance Efficiency whitepaper 

The event-driven architecture behind this Guidance means that compute and network resources are consumed only when needed. Additionally, Lambda only bills by the millisecond of processing time, while ParallerCluster compute nodes can be configured to reduce capacity to zero when there are no jobs in the queue, saving you compute costs.

Read the Cost Optimization whitepaper 

Achieving sustainability requires optimizing resources and infrastructure. The ParallelCluster Slurm scheduler enables intelligent job placement matching specific workload needs. Video extraction and transcoding tasks are efficiently assigned to cost-effective CPU nodes, while GPU fleets are reserved just for compute-intensive upscaling. This minimizes over-provisioned resources, speeds time to output, and lowers energy demands.

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.