Automated highlights and short-form solution with DIGIBASE

This blog post was co-authored by Thomas Kramer, DIGIBASE.

As video consumption habits evolve, audiences now engage with content across various platforms and formats, especially on social media. To reach more viewers, media companies and content owners are providing near real-time highlights of live events on social platforms to maximize ad profits, as well as preparing short-form videos to drive traffic to their main platforms. This requires the ability to quickly generate multiple versions from a single video.

However, manually reviewing and editing highlights is inefficient and costly—especially as the amount of content grows. Increasing video volume often means hiring more staff, which can become a significant burden for companies.

We will share at a high-level a solution that Amazon Web Services (AWS) and Partner, DIGIBASE, built to combat this challenge. It is a natural language-based solution to automate short-form video creation. By leveraging AWS services (Amazon Transcribe, Amazon Rekognition, and Amazon Bedrock), this solution offers a customizable platform. It can flexibly address the distinct needs of customers who want to create highlights and short-form videos from any video footage, at scale, using generative AI technology.

Customer considerations

While repetitive tasks or content reviews can be delegated to AI, as an editor, it is still necessary to make decisions about instructing or having the AI repeat certain tasks to ensure the quality of the content.

Additionally, since each company has different requirements, the solution allows for customization of specific features through pre-service consultations. For example, when videos are submitted, the solution in the backend can remove L-bar advertisements automatically or exclude videos containing certain keywords from the final output.

Solution overview

The process for using the solution is very straightforward for users, as the backend operates asynchronously. Figures 1-3 illustrate the basic workflow and usage of the solution. UI and additional features are customizable.

Screen capture of a user interface displaying the “Source Content” selection step in a four-step workflow. A box is highlighting the Source Content: Introducing Amazon Bedrock_Amazon Web Services.mp4. This is an active dropdown menu listing video files for selection. There is a Source File Language section, which user decides which language to process video analysis. On the right, when each step is completed, the corresponding red status text turns green to indicate completion.

Figure 1: Sample UI image of Source Content selection step in a four-step workflow.

Explanation of each step

Source Content: Select the video and language to process.
Configuration Settings: Choose the desired video output (for example, 9:16 ratio, 1080p-720p).

Amazon Bedrock Prompt Settings interface showing step 3 of a 4-step workflow. The screen displays prompt type options with 'Actions' selected from three choices: Actions, Dialogues, and Scenes. Below shows an active system prompt for media content analysis that instructs finding the most important and dramatic segment in content. The interface includes toggle for edit prompt and a confirm prompt button at the bottom.

Figure 2: Manage prompts with Amazon Bedrock Prompt Management in Prompt Setting step.

Prompt Settings: Select a saved default prompt type in the UI. These prompts are managed separately for each team, or user, through Amazon Bedrock Prompt Management. For non-IT users, they can click buttons (such as Action, Dialogue, Scenes) which are preconfigured. However, they can always directly edit the prompt.

Screenshot showing a web interface for submitting and managing jobs. The top section contains a green “Submit” button for starting a job. Below, in the “Results” section, a previously completed job is listed with its date, time, and status marked as “COMPLETED.” To the right, a highlighted green “Submit Again” button allows the user to resubmit the job, bordered by a red outline for emphasis.

Figure 3: Submit again to optimize AWS operation costs.

Submit the job. An AWS Step Functions workflow provides notification when processing is complete, then access the results.
To re-generate content from existing pre-generated content, users can click the Submit Again This allows the solution to process only essential requirements (such as Amazon Bedrock and AWS Elemental MediaConvert) while skipping unnecessary steps, thereby reducing costs and accelerating processing time.

Technical overview

The overall concept of the solution is one-source, multi-use. Customers can quickly and repeatedly generate content from a single source while optimizing infrastructure costs. Users interact with the solution through a graphical interface (see previous Figures 1-3) or REST API. When a user submits a job, the solution divides processing into multiple independent tasks using a modular design.

Overall Architecture of the solution, which describes from Clients side to AWS infrastructure. An Amazon API Gateway accesses an Application Load Balancer to then access the backend server. An AWS Step Function workflow is utilized which contains different AWS Lambda functions to create the State Machine. The backend also has access to the S3 bucket and Amazon DynamoDB. A full description follows in the blog body.

Figure 4: High-level architecture.

Once a job is submitted, the backend handles job scheduling, queue management, and initiates an AWS Step Functions workflow called a state machine. Each stage leverages different managed services such as AWS Elemental MediaConvert, Amazon Transcribe, or Amazon Bedrock.

Event-driven notifications decouple long-running tasks, minimizing idle compute resources and reducing infrastructure costs. The workflow is fully asynchronous, and users are notified when processing is complete. This architecture supports efficient, high-volume batch processing and delivers a seamless user experience.

State machine workflow

The main advantage of using a state machine in AWS Step Functions is the ability to split a job into smaller, independent tasks that can be executed separately. The workflow defines the initial step and its input parameters. It then passes the output of each task as the input to the next. This workflow can be simple (a linear sequence of tasks) or complex (with parallel execution and branching based on output variables, asynchronous notifications, and automated default or failure states).

Separating tasks into individually executable steps makes the overall solution highly flexible and maintainable. Each step is typically implemented as an AWS Lambda function, so tasks can be updated or versioned independently without affecting the entire application.

At runtime, AWS Step Functions automatically allocates compute resources for one or many parallel workflow executions. Additionally, the AWS Step Functions console offers a graphical view of each execution, making it effortless to monitor and troubleshoot workflows.

AWS Step Functions workflow diagram showing a parallel processing architecture. Starts with '10-Encoding', branches into two parallel paths through 'ParallelBranch': left path includes 'CheckAudioTranscription' leading to either '25-audio-transcription' or 'SkipAudioTranscription'; right path shows 'CheckTwelveLabs Processing' leading to either 'RunTwelveLabs InitWorkflow' or 'SkipTwelveLabs InitWorkflow'. Both branches converge at 'ExtractFirstItem'."

Figure 5: Partial Graph view of an execution workflow.

Steps in a state machine are typically small, short-running tasks that each perform a specific part of the overall job. If tasks are independent, they can run in parallel branches, which improves the total processing time.

For longer-running activities (such as video encoding or audio transcription), Step Functions tasks submit jobs to the appropriate managed service and immediately return, helping to reduce compute costs. Figure 6 illustrates part of the workflow for processing a video using the AWS Step Functions workflow.

The AWS Step Functions workflow begins with AWS Elemental MediaConvert as the entry point.

From there, the process splits into two main parallel branches:

Video analysis path

- Amazon Rekognition starts segment-detection to detect shots and technical cues.
- TwelveLabs Pegasus 1.2 on Amazon Bedrock runs for advanced video understanding.
- The results go through a video processing step for refinement or transformation.

Audio analysis path

- Amazon Transcribe converts audio from the video into text.
- The transcribed text is processed by Anthropic Claude Sonnet 3.5 for further analysis

After processing, the data is sent to Claude Sonnet 3.7 to finalize the segments. The finalized content is then returned to AWS Elemental MediaConvert for final encoding and delivery.

AWS Step Functions workflow diagram showing video and audio processing. The workflow starts with AWS Elemental MediaConvert, which branches into video analysis using Amazon Rekognition and TwelveLabs Pegasus 1.2, followed by video processing. Results are then passed to Claude Sonnet 3.7 for framing (customizable) before returning to MediaConvert. In parallel, audio is transcribed through Amazon Transcribe and processed by Claude Sonnet 3.5. Each step is connected, illustrating a comprehensive AWS-based solution for video and audio data processing.

Figure 6: High-level architecture for processing videos.

Expected outcome

In many industries, customers have different needs, but there are some key features that are important for anyone creating short-form videos. This solution provides these essential features along with short-form video content.

Following are some examples of customized features the solution provides, which are applied to the Video Processing and Framing (customizable) instances shown in Figure 6:

Thumbnails at specific timestamps
Timecode delivery for particular scenes or situations
Machine learning-based ad removal (linear, L-bar)
Selectable transition effects
Subtitle formats
Batch jobs option

These features are often complicated and time-consuming to develop on your own, but the solution makes them straightforward to access and manage.

Summary

Creating short-form videos and highlights in the media industry is often a time-consuming, repetitive task that demands substantial human effort—especially at scale. Many customers have shared that scaling this workflow is a challenge, both in terms of cost and operational efficiency.

To address these challenges, AWS with DIGIBASE built a serverless, AI-powered solution that automates the creation of shorts and highlights—dramatically reducing manual work and enabling faster, more reliable production.

This solution can support media companies who are ready to produce more content, faster and at a lower cost, while confirming quality and consistency. If you’d like to learn more about how AWS-powered AI can transform your video production workflow, contact an AWS Representative for details.

AWS for M&E Blog

Automated highlights and short-form solution with DIGIBASE

Customer considerations

Solution overview

Technical overview

State machine workflow

Expected outcome

Summary

Further reading

Resources

Follow

Learn

Resources

Developers

Help