Overview
GRAFT (G’s Resilient Agent Tool for Fault Tracing) is a specialized AI observability and self-healing solution designed for modern, agentic AI applications developed in Python. As the complexity of multi-agent systems increases, so does the need for continuous traceability, fault detection, and root-cause analysis. GRAFT addresses this challenge by embedding a foundation model-powered agent tool that autonomously monitors and understands the behavioral context of your code.
GRAFT is a containerized AI observability platform deployed within a secure AWS VPC, leveraging a modern monolithic architecture orchestrated via Kubernetes or Docker Compose. Developers interact with the system through a GitHub CI/CD pipeline, which pushes code to Amazon Elastic Container Registry (ECR) for versioned, secure deployments. The application stack includes MinIO for object storage, PostgreSQL for transactional logging, and ClickHouse for high-performance analytics on trace data. A CodeOps server that handles LLM-powered fault analysis and healing using Amazon Bedrocks, equipped with a GitHub MCP client, manages infrastructure code workflows. End users access the tool via an API Gateway, which routes traffic to a web server, Redis (for stateful interactions and fast caching), and an asynchronous task worker. This cloud-native architecture ensures scalable, fault-tolerant observability and intelligent debugging for agentic AI systems.
The tool integrates seamlessly with existing development workflows to trace execution paths, analyze stack traces, logic anomalies. Leveraging foundation models hosted via Amazon Bedrock, GRAFT interprets logs, detects failure signatures, and applies intelligent remediation suggestions or fixes. Whether it’s a missing coroutine, or logic fault, GRAFT brings smart observability and resiliency into every stage of your code lifecycle.
GRAFT empowers teams to shift from reactive debugging to proactive, intelligent maintenance. With support for real-time log analysis, self-healing recommendations, and trace-level runtime inspection, GRAFT makes observability not just transparent but intelligent, scalable, and autonomous.
Highlights
- Monitor, trace, and self-heal Python-based agentic systems using foundation model-powered agents that analyze execution logs, detect anomalies, and autonomously resolve issues in real time.
- Deploy seamlessly in AWS environments with a containerized microservices architecture powered by Kubernetes/Docker Compose, integrated CI/CD via GitHub, and scalable compute through Amazon Bedrock.
- Gain full-stack AI observability and debugging intelligence with support for asynchronous task orchestration, high-speed analytics via ClickHouse, and persistent storage using PostgreSQL and MinIO.
Details
Unlock automation with AI agent solutions

Pricing
Custom pricing options
How can we make this page better?
Legal
Content disclaimer
Support
Vendor support
As part of the end user agreement, there are flexible levels of support. Genpact's Resilient Agent Tool for Fault Tracing (GRAFT) is a containerized offering. Genpact will drive the initial implementation in the client environment, set-up and provide training. Production support comes in multiple tiers and is based on client's needs. Contact us: aiml.services@genpact.com