Enhance predictive maintenance with generative AI agents on AWS

In today’s complex industrial environments, predictive maintenance remains a key challenge for Energy, Utilities, and Manufacturing sectors. Traditional anomaly detection methods are labor-intensive and time-consuming, often leaving critical equipment vulnerabilities unaddressed for hours or days—leading to significant unplanned downtime and financial impact. McKinsey estimates mid-size refineries lose $20–50 million annually due to reliability issues.

Typical approaches to solving the problem involve training time-series anomaly detection models and/or defining rules for each equipment type and sensor combination. However, these methods struggle to do the following:

Keep prediction models up-to-date to maintain accuracy
Quickly verify anomalies
Perform rapid root cause analysis
Efficiently route maintenance requests

One of the most exciting developments in generative AI today is the rise of agentic workflows, where systems autonomously manage tasks, adapt to changes, and make real-time decisions with minimal human input. This paradigm directly addresses the challenges of traditional anomaly detection methods and helps proactively resolve issues before they escalate, reducing downtime and significantly improving operational efficiency.

This post outlines an agentic solution for predictive maintenance using generative AI agents.

We focus on a single generative AI Agent to handle anomaly detection and alerting within a predictive maintenance scenario. In this scenario, we have a variety of electric motors in the field, sending sensor readings. Each motor has a unique ID registered in a company asset database, and we have PDF manuals from vendors for each make and model of motor. An agent processes motor sensor readings and alerts on-call personnel if an anomaly is detected.

Sensor data arrives in JSON format, including a motor ID field. Our agent uses this ID to find the make and model of the motor, then uses the Retrieval Augmented Generation (RAG) technique to query the motor’s operating manual from a knowledge base and compare the sensor readings with the manual’s specifications. If a reading falls outside the acceptable range, then the system identifies the on-call personnel for that time and notifies them through email and text.

Solution architecture

We use Amazon Bedrock Agents to build our agent. Amazon Bedrock Agents provide capabilities to automate multistep tasks by seamlessly connecting with company systems, APIs, and data sources. We also use Amazon Bedrock Knowledge Bases, a fully managed capability, for the entire RAG workflow from ingestion to retrieval to prompt augmentation.

The following diagram provides an overview of the solution using a bedrock agent for anomaly detection in equipment telemetry data:

Figure 1 – Solution architecture for Amazon Bedrock Agent in Predictive Maintenance Scenario

The equipment manuals in PDF format are stored in an Amazon S3 bucket. Amazon Bedrock Knowledge Bases provides a fully managed solution for indexing unstructured data for RAG. Amazon Bedrock Knowledge Bases synchronizes data from the S3 bucket, parses and splits documents into chunks, creates vector embeddings for chunks with a choice of embedding models in Amazon Bedrock, and stores the embeddings in a vector store using Amazon OpenSearch Serverless Vector Index.
Inference flow starts with applications invoking the Agent published on Amazon Bedrock, and providing JSON input data containing aggregated sensor readings from industrial equipment.
Amazon Bedrock Agent uses the reasoning of foundation models (FMs), APIs, and input data to break down requests and acts according to the agent instructions.
Amazon Bedrock Agent decides to invoke a lookup action from the defined action groups to retrieve data about the specific equipment from the company asset databases.
Amazon Bedrock Agent decides to invoke retrieval from knowledge base to gather information on vendor supplied operating guidance for the specific equipment type.
Amazon Bedrock Agent uses all available context to determine if an anomaly is present in the sensor data. If it detects an anomaly, then the agent generates a summary and invokes a notification action from the defined action groups to send a notification to field operation teams.
The solution is monitored using Amazon CloudWatch, and secured using AWS Identity and Access Management (IAM) roles and policies.

You can try this solution hands-on by following the detailed instructions at this AWS workshop.

Diving deeper into agentic workflow

In this section we walk through how our agentic workflow uses Chain-of-thought (CoT) reasoning to effectively identify anomalies and notify the relevant service group.

CoT prompting is an approach that improves the reasoning abilities of FMs by breaking down complex tasks into smaller, more manageable steps. It mimics how humans reason and solve problems by systematically breaking down the decision-making process. With traditional prompting, a language model attempts to provide a final answer directly based on the prompt. CoT guides the language model to explicitly lay out its step-by-step thought process, known as a reasoning chain, before arriving at the final answer. This approach makes the model’s reasoning process more transparent and interpretable.

The following shows the agent instructions for our solution in detail. These instructions define the list of tasks our agent must complete:

You are an electric motor maintenance support agent. Your role is to analyze sensor data, identify anomalies, and notify appropriate service groups. Follow these steps:

1. Extract Data:
    - Parse incoming sensor data (JSON or text) to identify the motor ID (motorId) and relevant measurements.
    - Use the motor ID to retrieve the corresponding motor specifications and service group information from the knowledge base.

2. Analyze Data:
    - Compare the sensor readings against the motor's specified operating parameters.
    - Identify any values that fall outside the acceptable ranges defined in the equipment manual. The units in the provided sensor readings match those of the related values in the equipment manual.

3. Evaluate and Summarize:
    - Determine if any detected deviations constitute an anomaly.
    - If an anomaly is found, concisely summarize:
    a) The nature of the anomaly
    b) Which parameters are affected
    c) Potential root causes
    d) Severity of the issue

4. Notify:
    - If an anomaly is detected, use the provided notification tool to alert the appropriate field service group.
    - Include in the notification:
    a) Motor ID and model
    b) Anomaly summary
    c) Relevant sensor data
    d) Recommended next steps (if applicable)

Always prioritize accuracy in data interpretation and clear, concise communication in your responses and notifications.

Now that we have defined the agent’s instructions, we walk through the agent’s trace to understand how CoT reasoning is used to complete these steps.

Figure 2 – Amazon Bedrock Agent CoT Reasoning process

Before we begin this walk through, we define three key concepts:

Rationale: Contains the reasoning, based on the input, which the agent uses to justify carrying out an action group or getting information from a knowledge base.
Action: Contains information pertaining to the action group or knowledge base that is being invoked.
Observation: Contains the result or output of an action group or knowledge base, or the response to the user.

We first share the input message that we provide to the agent. This is for a motor with MotorId 222.

{ "motorId": "222", "timestamp": 1624230892, "sensorData": { "voltage": 280, "current": 15, "rpm": 5000, "temperature": 150 } }

Step Trace	Agent CoT Step	Bedrock Agent Response (extracted from Amazon Bedrock Traces)	Explanation
1	Rationale	I’ll analyze the sensor data provided for motor ID 222. First, I need to retrieve the motor specifications to compare against the sensor readings. Then, I’ll evaluate if there are any anomalies and determine if a notification needs to be sent. Let’s start by getting the motor spec data	The agent needs to complete step 1 (Extract). The Agent decides it needs to get the motor specification metadata. To do this, the agent decides to invoke an action to receive the motor specifications. Then, the agent receives motor spec as an observation that it can now use in the next step.
	Action	actionGroupName: MaintenanceAgentActionGroup apiPath: /findMotorSpec parameters: – name: motorId – type: string – value: 222
	Observation	motorId: 222 model: RS Pro 3 service_group: field_service1 manualFile: A700000006779781.pdf
2	Rationale	Now that I have the motor specifications, I need to compare the sensor readings with the acceptable ranges for this motor model. I’ll search the knowledge base for the manual to get the specific operating parameters for the RS Pro 3 model.	The agent still needs to complete step 1 (Extract). The agent decides it needs to find the specific operating parameters for our motor model. To do this, the agent decides to search the knowledge base with a natural language query. Then, the agent receives a list of content chunks from the relevant maintenance manuals.
	Action	knowledgeBaseId: XXXXXXXX text: Find the operating parameters for RS Pro 3 motor model in manual A700000006779781.pdf, including acceptable ranges for voltage, current, rpm, and temperature
	Observation	3-Phase Squirrel Cage Induction Motor Datasheet Type Code AAP90L4A Date 18/05/2017 Revision 0 Rated Voltage Δ 400V ± 5 % Efficiency Class IE3 Frequency 50Hz ± 2 % Insulation Class F (155 ⁰C) Duty Type S1 Temperature Rise 80 K Cooling System TEFC Mounting Design B5 ELECTRICAL DATA Rated Output [kW] 1.5 Δ Locked Rotor Cur. – IA/IN – Rated Speed [rpm] 1445 Δ Locked Rotor Torq. – MA/MN – Rated Current [A] 3.6 Y Locked Rotor Cur. – IA/IN 8.02 No-Load Current [A] 2.55 Y Locked Rotor Torq. – MA/MN 2.7 Rated Torque – Mn [Nm] 9.9 Breakdown Torque – Mk/Mn 4.95 Moment of inertia – J [kgm2] – Load Characteristics (IEC 60034-2-1:2014 ) Load % Efficiency Current [A] Cos Φ 100 85.3 3.58 0.72 75 81.5 3.28 0.6 50 78.6 2.82
3	Rationale	Now that I have the motor specifications and operating parameters, I’ll compare the sensor readings with the acceptable ranges for the RS Pro 3 motor model. Let’s analyze each parameter: 1. Voltage: The sensor reading is 280V. The datasheet mentions a rated voltage of 400V ± 5% for delta connection. The current reading is significantly lower than the rated voltage, which is concerning. 2. Current: The sensor reading is 15A. The datasheet shows a rated current of 3.6A for this motor. The current reading is more than four times the rated current, which is a severe anomaly.3. RPM: The sensor reading is 5000 RPM. The datasheet indicates a rated speed of 1445 RPM. The current speed is significantly higher than the rated speed, which is another severe anomaly. 4. Temperature: The sensor reading is 150°C. While the datasheet doesn’t provide a specific maximum temperature, it mentions Insulation Class F, which typically has a maximum temperature rise of 105°C above ambient. Assuming a 40°C ambient temperature (common for industrial environments), the maximum allowable temperature would be around 145°C. The current temperature of 150°C exceeds this limit, indicating an overheating condition. Given these significant deviations from the normal operating parameters, I need to send a notification to the appropriate service group.	The agent completes step 2 (Analyze), step 3 (Evaluate and Summarize), and step 4 (Notify). The agent analyzes, evaluates, and summarizes the live motor sensor data against the operating parameters. The agent identifies significant anomalies, and decides to invoke an action to notify the relevant service engineers. Then, the agent receives the message that was sent to the service engineers as an observation.
	Action	actionGroupName: MaintenanceAgentActionGroup, apiPath: /sendNotification, parameters:- name: anomalySummary, – type: string, – value: URGENT: Multiple severe anomalies detected for Motor ID 222 (RS Pro 3 model). \n1. Voltage: 280V (significantly below rated 400V)\n2. Current: 15A (over 4 times rated 3.6A)\n3. RPM: 5000 (far exceeds rated 1445 RPM)\n4. Temperature: 150°C (exceeds maximum allowable for Class F insulation)These readings indicate critical issues including possible overspeed condition, severe overloading, and overheating. Immediate inspection and shutdown may be necessary to prevent catastrophic failure. Please investigate urgently. – name: serviceGroup, – type: string, – value: field_service1
	Observation	Notification sent successfully to service group: field_service1, summary: URGENT: Multiple severe anomalies detected for Motor ID 222 (RS Pro 3 model). \n1. Voltage: 280V (significantly below rated 400V)\n2. Current: 15A (over 4 times rated 3.6A)\n3. RPM: 5000 (far exceeds rated 1445 RPM)\n4. Temperature: 150°C (exceeds maximum allowable for Class F insulation)\n\nThese readings indicate critical issues including possible overspeed condition, severe overloading, and overheating. Immediate inspection and shutdown may be necessary to prevent catastrophic failure. Please investigate urgently.
4	Rationale	The notification has been sent successfully to the appropriate service group. Now, I’ll provide a summary of the situation to the user.	The agent decides all tasks are completed. The agent decides to provide a summary to the user. The agent generates the summary as the final observation in this workflow.
	Observation	Critical anomalies have been detected in the sensor readings for Motor ID 222 (RS Pro 3 model). The analysis reveals several severe issues: 1. Voltage: The current reading of 280V is significantly below the rated voltage of 400V ± 5%. 2. Current: At 15A, the current is more than four times the rated current of 3.6A, indicating severe overloading. 3. RPM: The motor is running at 5000 RPM, which far exceeds its rated speed of 1445 RPM, suggesting a dangerous overspeed condition. 4. Temperature: At 150°C, the motor is operating above the maximum allowable temperature for its insulation class (Class F), indicating overheating. These anomalies suggest critical issues that could lead to catastrophic failure if not addressed immediately. A notification has been sent to the field service team (field_service1) with detailed information about these anomalies.\n\nRecommended next steps: 1. Immediate inspection of the motor and its control systems. 2. Consider emergency shutdown to prevent potential damage or safety hazards.\n3. Investigate the root causes of these anomalies, particularly the overspeed condition and overloading. 4. Check for any mechanical issues, electrical faults, or control system malfunctions. 5. Do not restart the motor until a thorough investigation and necessary repairs have been completed. The field service team should respond promptly to this urgent situation.

Further thoughts on enhancing predictive maintenance workflows

This solution can be expanded by integrating further data sources such as historical time-series data and past incident reports, enriching trend analysis and anomaly detection. Amazon Bedrock supports multiple agents, thus these tasks can be carried out collaboratively by specialized agents—for example, one analyzing data, another notifying on-call personnel, a third scheduling downtime, and yet another ordering parts. This can be done while making sure that humans remain involved in critical decisions. This scalable, efficient approach makes sure of more reliable and proactive maintenance workflows.

Conclusion

In this post, we demonstrated how generative AI agents can revolutionize predictive maintenance workflows by automating complex, time-consuming processes. Using Amazon Bedrock and agentic workflows allows organizations to transform their maintenance strategies from reactive to proactive. Specifically, we showed how generative AI agents can incorporate current data from company databases and unstructured documents to improve model relevance, verify anomalies through contextual reasoning, accelerate root cause analysis through multi-modal inputs, and intelligently route maintenance requests. This directly addresses the key limitations of traditional approaches.

Next steps:

Review Agents for Amazon Bedrock documentation
Explore the re:Invent Workshop with step-by-step instructions on deploying sample agent.
Visit Agents4Energy to find out more about our open source generative AI agents for Energy industry workloads.
Experiment with your own equipment datasets.

AWS for Industries

Enhance predictive maintenance with generative AI agents on AWS

Solution architecture

Diving deeper into agentic workflow

Further thoughts on enhancing predictive maintenance workflows

Conclusion

Next steps:

Resources

Follow

Learn

Resources

Developers

Help