AWS Cloud Operations Blog
Announcing General Availability of AWS DevOps Agent
Today, we’re announcing the general availability of AWS DevOps Agent. AWS DevOps Agent is your always-available operations teammate. It resolves and proactively prevents incidents, optimizes application reliability and performance, and handles on-demand SRE tasks across AWS, multicloud, and on-premises environments.
Operations teams spend countless hours investigating incidents, correlating data across multiple tools, and manually triaging alerts. This operational toil takes engineers away from innovation and strategic work. AWS DevOps Agent eliminates this burden by investigating incidents as an experienced DevOps engineer would. It learns your applications and their relationships, works with your observability tools, runbooks, code repositories, and CI/CD pipelines, and correlates telemetry, code, and deployment data across all of them. In preview, customers and partners using AWS DevOps Agent report up to 75% lower MTTR, 80% faster investigations, and 94% root cause accuracy, enabling 3–5x faster incident resolution.
Since the preview launch, several organizations across industries have integrated AWS DevOps Agent into their operational workflows. They’ve connected it with Amazon CloudWatch, and partner tools like Datadog, Dynatrace, New Relic, Splunk, GitHub, GitLab, ServiceNow, and Slack. We’re excited to add built-in support for Azure, Azure DevOps, PagerDuty, Grafana, and additional integrations as part of this GA launch.
How AWS DevOps Agent Works
AWS DevOps Agent represents a new class of frontier agents—autonomous systems that work independently to achieve goals, scale massively to tackle concurrent tasks, and run persistently without constant human oversight. AWS DevOps Agent works alongside your operations team across the full incident lifecycle—from detection through investigation, recovery, and prevention.
Autonomous incident response
AWS DevOps Agent begins investigating the moment an alert comes in, whether at 2AM or during peak hours. This reduces mean time to resolution (MTTR) and quickly restores your application to optimal performance.
AWS DevOps Agent incident response investigation journal
Proactive incident prevention
AWS DevOps Agent moves teams from reactive firefighting to proactive operational improvement. It analyzes patterns across historical incidents to deliver targeted recommendations. These recommendations prevent future incidents and strengthen process and system resilience.
Prevention dashboard displaying recommendations by category
On-demand SRE task handling
AWS DevOps Agent leverages its deep understanding of your environment. This enables you to dive deeper into your application environment beyond just asking questions. You can create, save, and share custom charts and reports.
On-demand SRE chat interface with conversational AI assistant for querying infrastructure
What’s New in General Availability?
The GA release expands AWS DevOps Agent’s capabilities based on customer feedback. It makes incident response more scalable, flexible, and intelligent across diverse operational environments.
Expanded Use Cases
Azure Support: AWS DevOps Agent now extends beyond AWS environments to investigate incidents in Azure workloads. The agent correlates data across multicloud deployments. It provides unified incident response whether your applications run on AWS, Azure, or both.
On-Premises Support: AWS DevOps Agent now extends incident investigation to your on-premises applications using the Model Context Protocol (MCP). The agent discovers on-premises resources by analyzing metrics, logs, and code to build a comprehensive topology. This provides unified incident response across AWS, Azure, and on-premises environments.
On-Demand SRE Tasks: Use the conversational AI assistant to query your application architecture and analyze system health using natural language across AWS, multicloud, and on-premises environments. Ask about resources, system metrics, alarm status, deployment history, and incident patterns. Get instant contextual answers and create custom charts and reports to save and share with your team.
Triage Agent: The Triage agent automatically assesses incident severity and identifies duplicate tickets. When the agent detects duplicates, it links them to the main investigation with a ‘LINKED’ status. Linked tasks won’t start automatically, helping you reduce noise and consolidate your team’s efforts on the primary incident.
Enhanced Intelligence
Learned Skills: AWS DevOps Agent learns from your organization’s investigation patterns, tool use, and topology. It builds skills based on how your team resolves specific types of incidents. Over time, the agent becomes more effective at handling your unique operational challenges.
Custom Skills: Add investigation procedures, best practices, and organizational knowledge specific to your systems. Create workflows once and use them automatically across all relevant investigations. Skills can be targeted to specific agent types (On-demand, Incident Triage, Incident RCA, Incident Mitigation, Evaluation). This reduces context consumption and improves focus.
Code Indexing: The agent now indexes your application code repositories. This enables it to understand code structure, identify potential bugs during investigations, and suggest code-level fixes as part of mitigation plans.
New Integrations
Building on existing integrations with Datadog, Dynatrace, New Relic, Splunk, GitHub Actions, GitLab CI/CD, and ServiceNow, we are adding additional integrations:
- PagerDuty: Native integration for automatic incident response triggered by PagerDuty alerts
- Grafana: The built-in Grafana MCP server connects to any Grafana instance, including self-managed, Grafana Cloud, and Amazon Managed Grafana. Once connected, the agent accesses all data sources configured in that instance, such as Prometheus, Loki, and OpenSearch. This enables open-source telemetry monitoring and system introspection.
- Azure DevOps: Integration with Azure Pipelines to track deployments and code changes in Azure environments
- Amazon EventBridge: Investigation events available via Amazon EventBridge for custom automation workflows
- New APIs: Updated AWS CLI, AWS SDK and AWS MCP Server support
These integrations enable AWS DevOps Agent to work seamlessly within your existing operational toolchain.
Enterprise-Ready Capabilities
Regional Expansion: AWS DevOps Agent launches today with global reach across six AWS Regions. Starting in North America with US East (N. Virginia) and US West (Oregon), expanding to Europe with Frankfurt and Ireland, and extending to the Asia Pacific region with Sydney and Tokyo, this worldwide availability brings the agent closer to your workloads wherever they run. This geographic distribution helps you meet data residency requirements while reducing latency for your operations teams.
Private MCP: Connect to private MCP servers to integrate with additional tools. This enables AWS DevOps Agent to securely access your internal tools, data, and workflows. It delivers more accurate insights and automates actions using real context from your company. No confidential traffic routes over the internet.
Security: AWS DevOps Agent supports customer managed keys and direct identity provider (IdP) integration with Okta and Microsoft Entra ID for operator portal access.
Localization: AWS DevOps Agent responds to the browser locale setting, including translating agent responses. This enables global teams to interact with AWS DevOps Agent in their preferred language.
Customer Success Stories:
Early adopters are already seeing significant operational improvements
Western Governors University
Western Governors University (WGU), a leading online university serving over 191,000 students, was among the first organizations to deploy AWS DevOps Agent into production. As a large-scale Dynatrace user, WGU leverages the AWS DevOps Agent’s native Dynatrace integration, enabling Dynatrace Intelligence to automatically route problem records to the Agent for investigation and return enriched findings directly back into Dynatrace.
During a recent production investigation, WGU’s SRE team used the AWS DevOps Agent to analyze a service disruption scenario, reducing total resolution time from an estimated two hours to just 28 minutes—a 77% improvement in MTTR. The Agent quickly pinpointed the root cause within a Lambda function’s configuration, surfacing critical operational knowledge that had previously existed only in undiscovered internal documentation.
“It was able to provide the smoking gun, identified the Lambda was the cause. The investigation had almost flawless metrics that matched what we saw on the front-end. Yesterday was a huge victory, if we can continue to accelerate discovery, I can’t describe how much of a victory that would be for our organization.” said Angel Marchena (Director of Technical Operations).
With plans to leverage the AWS DevOps Agent Skills feature, WGU is on track to compress investigation time even further.
Zenchef
Zenchef is a restaurant technology platform that helps restaurants manage reservations, table operations, digital menus, payments, and guest marketing from a single commission-free system. With a focused DevOps team managing a shared production environment across departments, they faced a real test. A customer-facing issue surfaced during a company hackathon. Most engineers were heads-down on the event, and nothing significant showed up in monitoring to point them in the right direction.
Rather than pulling engineers off the hackathon, the team pasted the issue into AWS DevOps Agent. It worked through the problem systematically. It ruled out authentication as a lead, pivoted to ECS deployments, and ultimately traced the root cause to an IAM misconfiguration on the EC2 instance hosting GitHub. The full investigation wrapped in 20–30 minutes, roughly a 75% reduction compared to the 1–2 hours it would have taken manually. The findings were shared directly with the responsible engineer for a clean handoff.
“During the hackathon, we had nearly no available bandwidth to investigate – and we didn’t need it. We’re always trying to be a couple moves ahead, and this kind of proactive investigation just isn’t always possible otherwise. DevOps Agent is enabling new ways of understanding how our platforms behave.” said Theo Massard (Platform Engineering Manager).
T-Mobile
T-Mobile US, Inc. is one of America’s leading wireless carriers, providing mobile voice, messaging, and data services to over 140 million subscribers across the United States
“When AWS introduced AWS DevOps Agent, T-Mobile was at the table from day one. As a design partner, we saw how AWS DevOps Agent can significantly improve root cause analysis across production environments. Our real-world feedback directly influenced how the product evolved. Our infrastructure spans multiple clouds and on-premises environments, with application logs centralized in our on-premises Splunk deployment. AWS DevOps Agent’s ability to integrate seamlessly with Splunk and analyze logs across these diverse environments has been impactful as we continue to pilot the solution.” said Aravind Manchireddy (SVP, Technology Operations).
Granola
Granola is an AI-powered notepad that handles the heavy lifting of transcription and summarization, allowing customers to stay fully focused rather than distracted by manual note-taking. AWS DevOps Agent integrates seamlessly into Granola’s AI-powered incident management workflow, accelerating root cause analysis and reducing mean time to resolution.
“We’ve integrated AWS DevOps Agent directly into our incident response process, where it automatically triggers investigations on high-severity CloudWatch alarms,” said Eddie Bruce at Granola. “AWS DevOps Agent’s database investigation capabilities consistently outperform other tools we’ve evaluated, particularly for analyzing PostgreSQL logs and surfacing RDS performance insights. As we scale our SRE capabilities, AWS DevOps Agent has proven to be a reliable part of our incident management toolkit.” said Eddie Bruce (Product Engineer).
Read more customer success stories on the AWS DevOps Agent customers page.
Getting Started
AWS DevOps Agent is available today. Here’s how to establish value quickly:
Start with a Quick Win
- Create Agent Space: Navigate to AWS DevOps Agent in the AWS Management Console and create your first Agent Space.
- Connect your observability tools: Link your existing tools (Datadog, Grafana, Dynatrace, or others) to enable the agent to access your telemetry data.
- Run your first investigation: Either configure automatic incident response or use the web app to manually investigate an alert. Review the agent’s findings and provide feedback to improve its learned skills.
- Reinvestigate a recent incident: Choose a production incident your team investigated in the past 30 days. Use AWS DevOps Agent to investigate the same issue and compare the results. This immediately demonstrates time savings and accuracy improvements.
Accelerate Your Success
- Follow production best practices: See Best Practices for Deploying AWS DevOps Agent for guidance on integrating the agent into your operational workflows.
- Measure your impact: Track MTTR improvements, investigation time savings, and accuracy rates to quantify the value AWS DevOps Agent delivers to your organization.
- Expand systematically: Start with one team or service, demonstrate value, then expand to additional teams and use cases.
Pricing
With AWS DevOps Agent, you pay for the time the agent spends on operational tasks, billed per second. There are no upfront commitments. You can start and stop using the agent at any time. AWS Support customers receive monthly credits toward AWS DevOps Agent usage based on a percentage of their gross AWS Support spend. Percentages vary based on the support plan. For more information on pricing, visit the AWS DevOps Agent pricing page.
Conclusion
To learn more, visit AWS DevOps Agent and explore the User Guide. For questions or to discuss how AWS DevOps Agent can help your organization, contact your AWS account team. Sign up today.