AWS for M&E Blog
Streamlining content compliance: Automating media analysis with Amazon Nova
Transform content compliance from a bottleneck to a competitive advantage. Learn how media organizations can reduce content review time using Amazon Bedrock, and Amazon Nova foundation models. Automate the analysis of content across multiple rating systems while maintaining accuracy and reducing costs.
Introduction
Every minute counts in today’s competitive streaming landscape. As content libraries expand and international markets multiply, the challenge of ensuring content compliance has grown. Content teams face:
- Escalating volumes of content
- Increasingly complex and disparate rating systems across regions
- Pressing need to maintain accuracy while accelerating time-to-market
Compliance officers spend countless hours reviewing content frame-by-frame, juggling multiple rating systems, and documenting findings—all while racing against release deadlines. This approach isn’t just time-consuming; it’s expensive, prone to inconsistency, and increasingly unsustainable as content volumes grow.
The recent class of generative AI models includes Amazon Nova, a new generation of state-of-the-art foundation models available exclusively in Amazon Bedrock. Amazon Nova offers understanding models that accept text, images, and video inputs and generate text outputs.
These powerful models can be utilized to revolutionize content compliance workflows by bringing generative AI to content moderation at scale. Using these models, we can:
- Create an intelligent content moderation system that can analyze content
- Automatically flag potential issues with the associated timestamps
- Adapt to multiple rating systems simultaneously
Compliance officers can then focus their expertise on edge cases and nuanced decisions that truly require human judgment.
Through the lens of analyzing music videos, we’ll explore how to build this automated content compliance system, examining the technical architecture, implementation steps, and real-world benefits. This solution offers a blueprint for scaling content compliance, while reducing cost, without sacrificing accuracy or speed.
Understanding the challenge
Content compliance in the media industry has reached a critical inflection point. Some customers manage libraries of hundreds of thousands of hours of content across multiple territories. This scale makes traditional frame-by-frame manual review increasingly impractical.
Adding to this complexity is the proliferation of rating systems and content standards worldwide. Content providers must simultaneously navigate multiple frameworks. Each system has its own nuanced criteria, cultural considerations, and compliance requirements, creating a multi-dimensional challenge for content teams.
Human moderators, despite their expertise, face inherent limitations in this environment. Consistency becomes difficult to maintain across long review sessions, and fatigue can impact judgment, particularly when dealing with sensitive or challenging content. The psychological toll of continuous exposure to potentially disturbing content also raises important concerns about reviewer wellbeing.
The financial implications of maintaining purely manual review processes are equally significant. Beyond the direct labor costs, the time required for comprehensive manual review can delay content releases, impacting revenue opportunities and market competitiveness. For organizations expanding into new territories, scaling up manual review teams for each market becomes prohibitively expensive.
These challenges create a clear imperative for transformation. Media organizations need a solution that can:
- Process high volumes of content efficiently and consistently
- Adapt to multiple rating systems simultaneously
- Support human reviewers rather than replace them
- Reduce operational costs while maintaining or improving accuracy
This is where the combination of Amazon Nova foundation models and Amazon Bedrock offers a compelling path forward, enabling a hybrid approach that amplifies human expertise with generative AI capabilities.
Solution overview
This solution contains the following key components:
- Amazon Bedrock provides access to leading foundation models through a single API, enabling flexible development of generative AI applications.
- Amazon Nova Lite and Amazon Nova Pro are multimodal understanding models. Amazon Nova Lite and Amazon Nova Pro support a 300k context window and can analyze videos, documents, and images, making it ideal for comprehensive content analysis. Amazon Nova models understand the publicly available media taxonomies that are used to generate compliance analysis.
Traditional computer vision models, such as Amazon Rekognition, are used for image classification, label detection, object segmentation, text detection, face recognition, and content moderation. While computer vision models excel at certain tasks, we will instead use foundation models (FMs) for deep image and video understanding. FMs are capable of handling a variety of tasks in a single shot, and allow us to craft multi-use-case prompts that are capable of understanding media compliance taxonomies.
In Step 1, we pass in the music video (video), accompanying lyrics (text document), and a carefully crafted prompt with the necessary ratings and compliance taxonomies (text). Amazon Nova processes the request with low latency, and returns a compliance analysis with confidence scores. We convert the compliance analysis to a standard report, and use it to determine if we need a deeper frame-by-frame analysis (Step 2).
Step 1: Video level analysis using Amazon Nova
In order to achieve higher accuracy for video understanding, both video and audio should be used as context to the model. Amazon Nova multimodal understanding models currently require the audio transcript to be passed in separately from the video. In cases where the audio transcript is not available in text format, Amazon Transcribe may be used to provide the transcription. Amazon Transcribe is a fully managed automatic speech recognition service that provides high accuracy transcriptions.
In the following example prompt, we use the Amazon Bedrock Converse API to pass the video and transcript to Amazon Nova. This is to assess regional compliance against the multiple rating systems. We will also be requesting a general content moderation analysis and executive summary:
Analyze the attached video and lyrics. Provide a detailed analysis including:
Content Rating Analysis:
1. Region A Rating Assessment
- Suggested rating (Adult 18+, Teen 13+, Child 7+, General)
- Confidence score (0-100%)
- Justification for rating
- Content elements influencing rating
2. Region B Rating Assessment
- Suggested rating (Mature, Youth, Family, All Ages)
- Confidence score (0-100%)
- Key Classification issues
- Content advisories
3. Region C Rating Assessment
- Suggested rating (Restricted, Teen+, Child+, Universal)
- Confidence score (0-100%)
- Content warnings required
- Justification for rating
4. Content Moderation Flags
- Violence
- Adult Content
- Hate Speech
- Self-Harm
- Profanity
- Drug Reference
- Alcohol Reference
(Include confidence score for each flag)
Content Analysis:
5. Theme and Messaging
6. Visual Content Assessment
7. Lyrical Content Analysis
8. Target Audience
9. Potentially Sensitive Content Areas
Format the ratings and flags with confidence scores in the response.
Provide a brief executive summary (2-3 sentences) of the overall content analysis and rating recommendation.
An example response can be seen in the following. The response shows that Amazon Nova has performed the assessment and flagged that there are compliance considerations. It has also provided content moderation flags with confidence scores and an executive summary.
Content Rating Analysis:
1. Region A Rating Assessment:
Suggested rating: Adult 18+
Confidence score: 95%
Justification for rating: The video contains profanity, suggestive themes, references to alcohol and drug use, and some suggestive content. The lyrical content includes themes of crime, drinking, and struggle, which are not suitable for younger audiences.
Content elements influencing rating:
- Profanity (Confidence score: 90%)
- References to alcohol and drug use (Confidence score: 75%)
2. Region B Rating Assessment:
Suggested rating: Mature
Confidence score: 90%
Key classification issues: The video includes explicit language, and references to alcohol and drug use, which are not suitable for a younger audience
Content advisories: Viewer discretion advised for mature themes
3. Region C Rating Assessment:
Suggested rating: Teen+
Confidence score: 85%
Key classification issues: The video includes explicit language and references to alcohol and drug use, which may not be suitable for younger teens.
4. Content Moderation Flags:
- Violence: 10%
Depictions of violence in the video and lyrics.
- Adult Content: 60%
References to sexual content and mature themes.
- Hate Speech: 10%
No direct hate speech, but some lyrics may be interpreted as aggressive.
- Self-Harm: 5%
Indirect references to personal struggles and loss.
- Profanity: 80%
Frequent use of explicit language in the lyrics.
- Drug Reference: 75%
References to drug use and dealing in the lyrics.
- Alcohol Reference: 80%
References to partying and nightlife, including drinking alcohol.
Content Analysis:
5. Theme and Messaging: The video revolves around themes of betrayal, blame, and the consequences of reckless behavior. The lyrics suggest a narrative of mutual involvement in wrongdoing and the fallout from it.
6. Visual Content Assessment: The visuals are dark and moody, with flashing lights and silhouettes that enhance the dramatic and intense atmosphere of the content.
7. Lyrical Content Analysis: The lyrics contain strong language, references to alcohol and drug use, and suggest a narrative of mutual blame and involvement in wrongdoing.
8. Target Audience: The target audience is likely adults, given the mature themes and language. The content may not be suitable for younger viewers.
9. Potentially Sensitive Content Areas:
- References to drug use and alcohol consumption.
- Strong language.
- Suggestive themes.
Executive Summary:
The video contains strong language, references to drug and alcohol use, and suggestive themes. It carries a medium risk for brand safety and requires appropriate warnings and restrictions for broadcast. The content is likely to cause offense and is best suited for an adult audience.
The results of the analysis can be output in JSON format, which can then be parsed to create a compliance report for use within your organization.
Following is an example compliance report created by parsing the JSON file:
If any of the flags exceed your thresholds, the detailed frame analysis can be triggered as a secondary step.
Step 2: Frame level analysis using Amazon Nova
This stage is only invoked if needed, to further explore any compliance issues discovered in the previous stage. With each frame analyzed, we can generate a timeline analysis visualization, which shows a time series and detailed analysis of specific compliance event.
Frame-by-frame analysis follows two steps:
-
- Extract frames from the video at a given frequency. For example, by using AWS Elemental MediaConvert to extract one frame per second.
- For each frame extracted:
- Skip the frame if it is too similar to the previous frame. One way to assess this would be using perceptual hashing between frames—reducing costs.
- Analyze frames for moderation flags.
Use the following prompt, along with passing the frame image to Amazon Nova:
You are an expert content moderator. Analyze the image for:
- Violence or graphic content
- Adult or inappropriate content
- Hate or offensive symbols
- Self-harm indicators
- Drug or alcohol references
Provide detailed analysis with confidence scores.
The timestamp of any moderation issues can be calculated by knowing the rate of extraction of frames and how many frames into the sequence of processing we are. Flags and timestamps can be collated to generate a timeline analysis visualization, which shows a time series of specific compliance events.
Following (Figure 5) is an example compliance report created by parsing the timeline analysis JSON file.
Benefits and results
Music videos pose unique challenges for content review due to their fast-paced nature, multiple themes, and complex visual elements. Music videos are fast-paced with frequent shot changes, they may present multiple themes in a short period of time, and are visually powerful. These elements make human review of content more error prone. We tested the visual understanding capabilities of Amazon Nova Lite and Amazon Nova Pro with music videos across regional ratings. There are many variables that contributed to our output, and this analysis is meant to be directional and not definitive. We encourage you to run your own tests on your specific content. The following section describes observations from our tests.
Key Observations:
-
-
- Both Amazon Nova models demonstrated caution, rating categories conservatively
- Amazon Nova Pro showed higher accuracy in detecting nuanced content elements
- Amazon Nova Lite proved sufficient for standard cases with greater than 83 percent accuracy
- Amazon Nova Lite was sufficient in many test cases, though there were cases where Amazon Nova Pro provided more detailed descriptions of scenes (without being prompted to describe these aspects). Consider using Amazon Nova Pro when more detail is required for:
- Facial expressions
- Advanced plot or theme analysis
- Spatial understanding
- Specific location of objects
- A human reviewer neglected to identify elements such as:
- Fast flashing lights
- Certain self-harm themes
- Partial nudity in the background due to strong blurring in out-of-focus areas
- Suggestive content displayed on body tattoos
-
Best practices and considerations
When implementing content compliance automation with Amazon Nova models, organizations should consider both cost and performance optimizations. Our testing revealed that analyzing a typical four-minute music video using Amazon Nova Lite consumed 76,497 input tokens and 496 output tokens, resulting in a cost of $0.0047 based on current pricing. This three-stage approach (content preparation, video analysis, and frame analysis) enables independent scaling and targeted problem-solving without disrupting the entire workflow.
For maximum effectiveness, prompts should be structured hierarchically with clear categories and specific examples, with media content presented before analysis instructions. For more prompting best practices, refer to the Prompting best practices for Amazon Nova understanding models.
Organizations should leverage Amazon Bedrock Prompt Management for version control and updates. Implementing human-in-the-loop workflows for content within specific confidence thresholds is recommended. You should also include feedback mechanisms that allow moderators to correct AI decisions and build valuable datasets for future improvements. Regular performance reviews and automated metrics should inform prompt refinements, enabling fast turnaround updates independent from the application.
By following these best practices, organizations can optimize their content compliance automation, ensuring efficient, accurate, and cost-effective operations while maintaining the flexibility to adapt to changing requirements.
Conclusion
Leveraging Amazon Nova foundation models and Amazon Bedrock for content compliance workflows represents a paradigm shift for media organizations. This generative AI-powered approach not only addresses the immediate challenges of scale, complexity, and cost but also positions content teams to thrive in an increasingly global and content-rich landscape.
Organizations can expect faster content review, consistent ratings, and significant cost savings. Begin by assessing your current workflow and implementing a proof of concept before scaling to full deployment.
Ready to transform your content compliance workflow? Contact an AWS Representative to know how we can help accelerate your business.
Visit the following links to learn more about additional media and entertainment industry use cases: