Building autonomous water utility operations with agentic AI on AWS

Water utilities worldwide are facing challenges in managing infrastructure, reducing water waste, and making sure of reliable service delivery. The traditional approaches of manual monitoring and reactive maintenance are no longer sufficient due to increasing water demand, need for uninterrupted service, regulatory requirements, and customer satisfaction. In this post, we explore how Amazon Web Services (AWS) AI Services can transform water utility operations through autonomous systems that reduce losses, improve forecasting, and enhance customer service.

The challenge

Water utilities today face a trio of interconnected challenges that significantly impact their operations and bottom line. At the forefront is non-revenue water loss, a pervasive issue costing the water utilities billions annually. In the United States alone, an estimated 2.7 trillion gallons of treated water are lost each year due to leaks, faulty meters, and unauthorized consumption, translating to $6.4 billion in unrealized revenue (source). Traditional monitoring methods, relying on periodic inspections and customer reports, often fail to detect and respond to these issues promptly.

Compounding this problem is the struggle with accurate demand forecasting. In today’s dynamic environment, conventional methods based on historical data and basic statistical models fall short. These approaches fail to adapt to rapid urban changes, climate variations, and evolving consumption patterns. The consequences of inaccurate forecasts are severe: overproduction wastes resources, while underestimation risks service disruptions. This challenge directly impacts operational efficiency, service reliability, and costs.

Further exacerbating these issues are outdated customer service operations. Many utilities still rely on legacy systems that limit service hours and depend on costly manual processes, with each interaction averaging $7-10 (source). With 30-40% of utility staff approaching retirement and new employees needing months of training, maintaining service quality is an uphill battle. During emergencies such as main breaks, these limitations severely hamper effective communication and timely response to customer needs.

In response to these challenges, this post introduces how Agentic AI, powered by Amazon Bedrock, can revolutionize water utility operations. Through autonomous systems, we address two critical challenges: first, by implementing intelligent monitoring and predictive analytics to detect and prevent water losses; and second, by deploying AI-powered customer service solutions that provide 24/7 support while reducing operational costs. This integrated approach represents a paradigm shift in water utility operations, transforming how utilities manage resources, predict demand, and serve their customers.

Amazon Bedrock Agents: the intelligence behind autonomous water management

Amazon Bedrock Agents are AI-powered assistants that can understand context, make decisions, and take actions autonomously. In water utility management, they serve as the central intelligence, processing data from various sources and orchestrating responses to network conditions. The key components of an Amazon Bedrock Agent include the following:

1. Foundation models: Powering natural language understanding and decision-making capabilities, for example Anthropic Claude.

2. Knowledge Bases: Storing utility-specific information and procedures using Retrieval Augmented Generation (RAG).

3. Action groups: Defining the set of operations that an agent can perform, such as valve control or maintenance scheduling.

4. Agent instructions: Instructions that guide the agent’s behavior and responses in different scenarios.

5. Security boundaries and access control: Crucial aspects of securing Amazon Bedrock Agents and preventing unauthorized access or manipulation.

Agent configuration

For detailed steps on building and configuring Amazon Bedrock Agents, refer to the AWS documentation. The following is a high-level overview of the configuration process:

1. Create agent: Configure a new agent in Amazon Bedrock and choose the appropriate foundation model (FM) (such as Anthropic Claude) that powers the agent’s understanding and decision-making capabilities.

2. Configure Knowledge Base: Set up a Knowledge Base that ingests utility documentation (manuals, SOPs, and engineering specifications) from Amazon S3, enabling the agent to access relevant information for decision-making.

3. Define action groups: Establish sets of permitted actions aligned with operational needs, such as customer notifications, that define what actions the agent can take through integrated services.

4. Set agent instructions: Specify the agent’s purpose, behavioral guidelines, and operational boundaries to make sure of the appropriate responses to various scenarios and interactions.

5. Establish AWS Identity and Access Management (IAM) roles: Configure necessary permissions and security boundaries for the agent to interact with AWS services such as AWS IoT Core, Amazon S3, Amazon CloudWatch, and execute actions through integrated services.

For example, when identifying anomalies and notifying service groups, agent workflow uses Chain-of-Thought (CoT) reasoning to break complex decisions into smaller, manageable steps. This makes the agent’s logic transparent and its conclusions more accurate. This approach makes the agent’s reasoning more transparent and interpretable, leading to more accurate anomaly detection and targeted notifications. The following is the agent instruction:

You are a water utility maintenance support agent. 
Your role is to analyze water network data, identify potential losses, and notify appropriate personnel. Follow these steps:

1. Extract Data:
    - Parse incoming sensor data (JSON or text) to identify the meter ID, pressure readings, flow rates, and other relevant measurements
    - Use the meter ID to retrieve corresponding network specifications and service group information from the knowledge base

2. Analyze Data:
    - Compare the sensor readings against the network's specified operating parameters
    - Identify any values that fall outside the acceptable ranges defined in the utility specifications
    - Look for patterns indicating potential leaks, such as: 
      a) Unexpected pressure drops 
      b) Unusual flow patterns 
      c) Discrepancies between input and output measurements

3. Evaluate and Summarize:
    - Determine if any detected deviations indicate non-revenue water loss
    - If a potential loss is found, concisely summarize: 
      a) The nature of the loss (leak, meter error, unauthorized usage) 
      b) Affected network segments 
      c) Estimated volume of water loss 
      d) Severity of the issue

4. Notify:
    - If anomalies are detected: 
    a) For urgent issues: Create voice calls and SMS through Amazon Connect, Generate support tickets in the contact center 
    b) For all issues: Send email notifications through Amazon SES
    - Include in all notifications: 
    a) Location and type of suspected loss 
    b) Supporting data and analysis 
    c) Recommended immediate actions 
    d) Potential impact on service delivery

Autonomous water management system

In this autonomous water management system, sensors monitor the network and send data to AWS IoT Core. The system uses Amazon EventBridge for scheduled checks and uses a machine learning (ML) model inference endpoint deployed with Amazon SageMaker for anomaly detection. When anomalies are detected, Amazon Bedrock Agent evaluates the situation using its Knowledge Base and orchestrates responses—creating tickets, sending notifications, and optionally triggering automated controls when authorized. The workflow is monitored through CloudWatch and secured by IAM.

AWS architecture for autonomous water management using Amazon Bedrock Agents

The solution architecture demonstrates how AWS services work together to enable autonomous water utility operations using AI agents. This solution combines IoT connectivity, real-time data processing, and AI-driven decision making to create an intelligent system that can monitor, analyze, and respond to water network conditions automatically.

Figure 1: Solution architecture: autonomous water utility management using Amazon Bedrock Agents

The referred solution works as follows:

1. The technical documentation (Manuals, SOPs, and Engineering Specifications) is uploaded to an S3 bucket, which serves as the source material for the Amazon Bedrock Knowledge Base. This Knowledge Base ingests and processes the documentation from Amazon S3, creating a structured repository that the Amazon Bedrock Agent can query to make informed decisions and provide accurate guidance for water system operations.

2. Water meters and flow sensors (IoT devices) continuously collect data on water flow, consumption, and system status. These devices are securely connected to AWS IoT Core, which serves as the entry point for IoT device data into AWS.

3. Within AWS IoT Core, IoT topics receive data from the devices. IoT rules are configured to route this data appropriately, such as error messages and telemetry data.

4. Telemetry data is streamed through Amazon Data Firehose, which reliably captures, transforms, and loads the data into S3 buckets for persistent storage. This makes sure of long-term storage and enables historical analysis.

5. EventBridge is configured with rules that trigger AWS Lambda functions at scheduled intervals.

6. Lambda processes a) water meter data from Amazon S3, and b) invokes ML inference endpoints for anomaly detection. The ML inference endpoint is deployed using Amazon SageMaker Unified Studio, which is the integrated development environment for training and deploying ML models.

7. For anomalies detected through ML model inference, Lambda invokes the Amazon Bedrock Agent for orchestration and intelligent decision-making.

8. The Amazon Bedrock Agent, guided by its agent instructions, acts as the central intelligence of the system. It reviews inputs from various sources and determines appropriate actions:

a. It queries its Knowledge Base to retrieve more context, historical information, or procedural guidance relevant to the current situation.

b. When necessary, it creates tickets in the integrated contact center system through Amazon Connect.

c. For urgent matters, the AI agent can send out notifications through AWS End User Messaging or Amazon Simple Email Service (Amazon SES).

9. Communication channels are established to notify and update:

a. Site Contact Operators about system status, actions taken, and any issues requiring human intervention.

b. Customers, when necessary, about service updates or disruptions.

10. CloudWatch and IAM monitor the entire system and manage access controls, making sure of security and proper operation of all components.

11. Optionally, when anomalies are detected and remediation is needed, the Amazon Bedrock Agent can orchestrate automated responses through a secure workflow:

a. The Amazon Bedrock Agent determines the appropriate action (such as valve shutdown) from predefined action groups.

b. The action request flows through an orchestration Lambda function that validates it against security policies and operational rules.

c. Upon validation, the Lambda function executes the IoT device job, such as triggering a valve shutdown or configuration change.

d. All actions are logged and monitored for audit purposes, making sure of safe and accountable device control.

This comprehensive architecture addresses the key challenges faced by water utilities: non-revenue water loss is reduced through real-time leak detection and automated remediation, demand forecasting is improved through intelligent analysis of multiple data sources, and customer service is modernized through automated notifications and 24/7 system monitoring. Combining IoT sensors, AI-powered decision making, and automated workflows allows utilities to transform from reactive to proactive operations while maintaining security and human oversight.

Best practices for implementing Amazon Bedrock Agents

When implementing this solution, consider these key guidelines: Define clear operational boundaries for AI agents, making sure of safe autonomous operations. Start with low-risk automation tasks and gradually expand to more complex operations. Maintain human oversight for critical decisions and establish robust approval processes. Regularly update the Knowledge Base with new procedures and insights to keep the system current. Continuously monitor agent performance and decision accuracy, using these insights to refine the system. Following these principles allows utilities to build a reliable, secure, and adaptable water management system that aligns with their specific needs and risk tolerance levels.

Looking ahead

The future of water utility operations lies in autonomous systems powered by AI. These systems not only improve operational efficiency but also help utilities become more resilient to challenges such as climate change and aging infrastructure. The combination of IoT sensors, ML, and AI agents creates a powerful platform for modernizing water infrastructure management.

In this post, we explored how AWS services can help modernize water utility operations, from IoT-enabled monitoring to AI-driven customer engagement. Look for partner solutions in the AWS Solutions Library in the coming months. In the meantime, you can build this solution by reviewing the guidance, and subsequently reaching out to your AWS account team who can help engage AWS trusted partners to build this solution for your specific needs.

AWS for Industries

Building autonomous water utility operations with agentic AI on AWS

The challenge

Amazon Bedrock Agents: the intelligence behind autonomous water management

Agent configuration

Autonomous water management system

AWS architecture for autonomous water management using Amazon Bedrock Agents

Best practices for implementing Amazon Bedrock Agents

Looking ahead

Resources

Follow

Learn

Resources

Developers

Help