AWS for Industries

Siemens unlocks their video stock with Amazon Bedrock Data Automation

Siemens, a leading global technology company, provides its customers and users with a comprehensive information portal online. Siemens needs to enable their website users to quickly find relevant products, guides, and assistance. This ultimately drives product sales and customer satisfaction, which are crucial for the company’s revenue and reputation.

For example, look at how Siemens uses Amazon Bedrock to transform their user’s search experience. It allows a user to search and get structured responses or summarized results across many public Siemens sources, from product catalogs to press releases. As another example, note how context can be added to the video experience following Hannover Messe 2025, like chapter segmentation and summaries, automatic translations, the names of notable speakers or the automatic multi-language subtitles. To experience it yourself, look at one of the highlight videos on the web experience.

The user experience in both examples depends on the reliable extraction of relevant information from the underlying documents, websites, and catalogs, as well as audio and video recordings. With millions of documents as sources, a structured and flexible extraction workflow is key. To address the information extraction challenge, Siemens uses Amazon Bedrock Data Automation (BDA), a managed service that streamlines and automates generative AI workflows involving documents, images, video, and audio. BDA allows Siemens to create templates of information they want to extract from a set of documents and then to run these extractions at scale without the need to manage servers, capacity, or interactions with AI models.

In this blog, we show how Siemens uses Amazon Bedrock Data Automation on video data to enrich their user’s experience and improve the Siemens Search results. Siemens uses tens of Terabytes of recordings from Siemens and their partners at events like Hannover Messe. In their raw format, these videos have limited use for customer interaction. There is limited information on each recording available, such as speaker names and topic, as well as information like slide content that is stored separately. To help a user looking for specific information, such as a product use case mention during a panel discussion, we need to augment the videos with more context. We will show how we use AWS services to make the video experience more engaging by extracting additional information from the videos to direct the user to relevant parts of a video, as well as adding context such as video with summaries, chapters, and subtitles.

Technical Solution

Given the large volume of media, Siemens needs to analyze their recordings automatically to extract detailed information on speakers, products, or topics mentioned in spoken or written text, as well as the overall context of a conversation. This information needs to be combined and made available to the website users. This is implemented as an automated video processing workflow by combining several managed services on AWS.

1. Input and output video storage on Amazon Simple Storage Service (S3)

To store the unprocessed videos and the final result videos, the solutions use Amazon Simple Storage Service (S3). The integration of Amazon S3 with other AWS services makes it easier to automatically start processing the video inputs and make them available for the video portal. With automated lifecycle management, Amazon S3 automatically archives or removes videos after processing, providing flexible file processing.

2. Workflow Orchestration using AWS Step Functions

To orchestrate the multiple video file processing pipeline steps, the solution uses AWS StepFunctions, a serverless workflow orchestration service. For each video, an AWS Step Function Execution manages the processing steps, enables parallel extraction of information, provides error handling with retry capabilities, and creates a visual representation of the workflow to aid in monitoring the progress.

3. Speaker recognition and text extraction with Amazon Rekognition

Many videos used in this solution contain talks or interviews with well-known personalities, such as Siemens Leadership or celebrities. Using the celebrity detection feature in Amazon Rekognition, Siemens identifies speakers that are part of the well-known persons collection. Identification data is passed forward to be combined with other outputs later in the workflow.

4. Chapter Segmentation, content summarization, and search context generation using Amazon Bedrock Data Automation

Amazon Bedrock Data Automation is a feature of the managed generative AI service Amazon Bedrock. It processes the video to extract meaningful insights from the content. It generates video summaries capturing the key points and themes of the entire video and segments the video into separate chapters, each with a contextual summary. This is crucial for finding relevant parts of a video for a user later. As a user, you can see this segmentation as chapter markers and summaries in the resulting video.

In addition to segmenting the video and providing summaries, BDA extracts key information for further analysis and additional context. It automatically extracts visible elements, such as logos or text shown on slides or in a text bar indicating the name or function of a speaker.

5. Multi-language subtitle generation using Amazon Transcribe and Amazon Translate

To improve the user experience, the solution uses Amazon Transcribe, a managed service to convert speech to text using a multi-billion parameter speech foundation model. The service creates a transcript from the video’s audio which is then fed into Amazon Translate which translates it into subtitles in multiple languages.

6. Video Format Optimization with AWS Elemental Media Convert

AWS Elemental MediaConvert processes the source video file to refine it for web delivery. It creates a standardized output format that maintains video quality while ensuring compatibility across viewing platforms.

7. Content Consolidation and Delivery using AWS Lambda

After extracting and converting the video, the results are combined into the final output. An AWS Lambda function serves as the integration point for all this information. It organizes the final combined outputs (video file and json file with metadata), runs final checks, and stores them to the output S3 bucket. This optimized content is now ready for web distribution, featuring all the enhancements from the workflow.

How AWS helps Siemens address key challenges

The solution combines many advanced AI capabilities to extract the information Siemens needs to improve user experience. It creates summaries and segmentation for video clips and scenes, as well as adds text from video and extracts context like high-quality transcripts or speaker names to find content quickly. The ability to view subtitles in a user’s native language, as well as the ability to select relevant chapters make the content more relevant and accessible.

Tens of GB of media arriving at irregular intervals and running extractions at this scale without the right tools is a time and effort-consuming task, requiring extensive AI knowledge. In addition, customization is key for Siemens to create glossaries that correspond to the business context or create the powerful plausibility checks and guardrails that achieve a high quality of extracted information. AWS helps overcome the challenges of scaling, lack of AI skills, and customization with the advanced services that act as composable building blocks.

Scaling to Terabytes of data cost-efficiently and with low operations overhead

By using serverless services, like AWS StepFunctions and Amazon Bedrock, video processing runs on a single document or on thousands without the need to create servers, manage updates, or scale capacity. With no licensing fees, Siemens pays only for what they use, lowering costs and improving flexibility.

Access to advanced generative AI capabilities

Using managed AWS services makes it easier to use and combine leading generative AI tools without the need for advanced AI skills. Using a graphical step function editor, a developer solves most of the AI-based processing challenges by simply calling a service. For example, extracting scenes, summaries, text, and audio transcriptions require a single request to Amazon Bedrock Data Automation. All the complexities of these tasks, like running video analysis to find scene cuts, choosing suitable AI models for the summaries, or training image recognition models to extract text are automatically performed for the user, accelerating the use of advanced AI capabilities. In addition, as new features are integrated into the managed service, the solution automatically stays up to date.

Finally, modular AWS services allow Siemens to customize virtually any step of the processing pipeline and customize them to their specific needs. For example, Siemens added functionality like speaker recognition with Amazon Rekognition, language translation with Amazon Translate, and extensive plausibility checks using AWS Lambda. With more than 200 AWS services that integrate with AWS StepFunctions it is easier to add even more customization in the future to provide additional context, like the official job title of a speaker from databases, or additional AI services that cross-check the plausibility of information extracted.

Summary

This blog shows how Siemens uses generative AI on AWS to efficiently unlock the content of their video collections and enrich their website user experience.

We discussed how AWS services work together to simplify the creation and operation of a complex video processing pipeline. We also covered how managed services, like AWS StepFunctions and Amazon Bedrock Data Automation, allow the use of advanced AI capabilities, performing scene segmentation and contextualization, while paying for the services used.

This example shows how AWS customers can extract value from their valuable data with minimal operational burden of scaling, without orchestrating the underlying advanced AI capabilities and without the need to build extensive in-house AI knowledge.

Contact your AWS Account team or use this contact-us form to learn more and achieve similar results.

Dr. Helge Aufderheide

Dr. Helge Aufderheide

Helge Aufderheide is passionate about using new technologies to innovate in traditional industries from manufacturing to railways. Coming from a physics background and moving into consulting, he has a strong focus on understanding real-world problems from business to engineering and then using automation, data analytics, (generative) AI or other smart technology architectures to solve them.

Christoph Lumme

Christoph Lumme

Christoph Lumme is an Enterprise Technology Architect with 20 years of web and e-commerce experience at Siemens. He specializes in secure, high-performance cloud solutions and delivering innovative, future-proof architectures.

Fabian Fischer

Fabian Fischer

Fabian Fischer is an Enterprise Information Technology Architect based in Erlangen, Germany, with over two decades of experience at Siemens. He is highly skilled in AWS cloud solutions, covering areas such as cloud security, infrastructure automation, and CI/CD practices. Fabian is also at the forefront of exploring emerging technologies, including Generative AI and Large Language Models (LLM). Passionate about both architecture and hands-on coding, he blends strategic vision with technical expertise to drive cloud adoption and innovation across the organization.

Dr. Markus Schweier

Dr. Markus Schweier

Dr. Markus Schweier is a Generative AI & ML Specialist at Amazon Web Services (AWS). He primarily focuses on large enterprises in the Automotive and Manufacturing industries, helping them develop innovative products, services, and solutions. With over 9 years of experience in digital transformation consulting and building scalable AI solutions, Markus advises customers on their AI adoption journey. His background is in Production Engineering, and he holds a Ph.D. in Engineering from the Technical University of Munich.