Accelerating enterprise code editing using AWS with Morph

Learn how AI infrastructure company Morph achieved 10,000 tokens per second for coding agents using AWS.

Benefits

10,000

tokens per second per request achieved

400

ms to complete a 15,000-token multifile refactor

50%

or more rise in productivity reported by one customer

Overview

Enterprises are increasingly adopting AI coding agents to transform their development workflows, creating demand for infrastructure that balances performance with rigorous security requirements. Morph, an AI infrastructure company, aims to transform how developers work by building coding agents with the speed and security required for production use at the largest enterprises. The company needed infrastructure that could deliver exceptional speed, enterprise-grade security, and cost-effective deployment. Morph used Amazon Web Services (AWS) to train and deploy its specialized models, achieving breakthrough performance.

About Morph

Morph builds small language models that power coding agents and AI applications. Its flagship product, Fast Apply, facilitates ultrafast code editing, serving AI companies and enterprise customers globally.

Opportunity | Using AWS to build production-ready AI agents for Morph

Morph’s flagship product, Fast Apply, is a specialized small language model that merges code updates from large language models, such as Claude, into original code files at exceptional speeds. Rather than competing with larger tools, Morph provides the middleware that makes those solutions fast, reliable, and cost-effective at scale.

As frontier AI models improve, the bar for competitive performance keeps rising. For Morph’s own needs, what started at 1,000 tokens per second quickly became inadequate. Morph needed to reach 10,000 tokens per second while maintaining first-pass accuracy and stable production performance. Managing memory bandwidth for concurrent users on a single GPU presented optimization challenges; off-the-shelf inference engines didn’t properly allocate bandwidth for stable, high-speed production workloads. Many of Morph’s enterprise customers required secure, on-premises deployment with strict legal and compliance guarantees, and they needed cost-effective single-GPU options rather than expensive multinode configurations that would price out most use cases.

Using AWS, Morph could gain access to the infrastructure, deployment tools, and security architecture needed to train high-performance models at scale and deploy them securely for large organizations. “AWS is infrastructure I can trust,” says Tejas Bhakta, founder and CEO of Morph. “I know AWS is going to be around—AWS has tried-and-tested solutions, and I’m not going to encounter hardware failures or edge cases with memory sharing.”

Solution | Training and deploying specialized models at scale

Morph trained its Fast Apply model using Amazon Elastic Compute Cloud (Amazon EC2) P5 Instances—high performance GPU-based instances for deep learning and high performance computing applications—powered by NVIDIA H100 GPUs. The team built a custom inference engine specifically optimized for code merging tasks, using speculative decoding techniques to maximize speed. “We trained all these models on AWS, and we started at around 1,000 tokens per second, and then kept doing optimizations,” says Bhakta. “We got up to 2,000, and then 4,000, and eventually we reached 10,000.”

“To achieve those speeds reliably in enterprise settings, we engineered a custom inference engine and GPU scheduling layer that maintained deterministic performance even under high-concurrency workloads,” says Dhruv Bhatia, founding researcher at Morph.

After the models were trained, Morph needed a secure way to deploy them to enterprise customers without exposing proprietary model weights. For inference deployment, the company adopted Amazon SageMaker AI, a fully managed service that brings together a broad set of tools to facilitate high-performance, low-cost AI model development for any use case. Morph also listed its models on AWS Marketplace, where customers can discover, deploy, and manage solutions for the cloud. With a single click, customers can deploy Fast Apply directly into their own secure environments. Meanwhile, Morph retains full protection over its intellectual property.

“Amazon SageMaker AI provides a security-provisioned instance where we can deliver our inference engine and weights, knowing they’re protected,” says Bhakta. “That reduces our go-to-market burden; we can publicly list on AWS Marketplace instead of doing bespoke deployments for each customer. Otherwise, we’d need to run containers in their infrastructure and ensure that safeguards and licenses are signed to prevent unauthorized training or access.”

Outcome | Accelerating developer productivity and enterprise adoption

With its models running on AWS infrastructure, Morph achieved over 10,000 tokens per second per request, a major leap from the 1,000 tokens per second where it began. The company can now complete a 15,000-token multifile refactor in under 400 milliseconds, compared with 20 seconds by traditional approaches. Single-file coding edits that previously took 2–5 minutes now finish in under 1 second.

Binance—a major financial services company with 5,000 developers—reported that its teams became 50–70 percent more effective using Morph’s Fast Apply model within their internal integrated development environment. These developers can code much faster than they could previously; large-scale refactors that previously required months now take just days, changing how quickly Binance can evolve its codebase.

For enterprises operating under strict legal and compliance requirements, Morph’s AWS infrastructure made coding agents viable. The combination of Amazon SageMaker AI security architecture and AWS Marketplace distribution built the trust necessary for widespread enterprise adoption. Morph now serves over 20 coding services and continues expanding its enterprise customer base.

Looking ahead, Morph plans to develop semantic search capabilities using embedding and reranking models to make coding agents work effectively on very large repositories. By building on AWS, Morph is positioned to lead the development of specialized models that make AI coding agents production ready at enterprise scale.

AWS Services Used

Amazon SageMaker AI

Amazon SageMaker AI is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) for any use case.

Learn more

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.

Learn more

Amazon EC2 P5 Instances

High performance GPU-based instances for deep learning and HPC applications.

Learn more

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Accelerating enterprise code editing using AWS with Morph

Benefits

Overview

About Morph

Opportunity | Using AWS to build production-ready AI agents for Morph

Solution | Training and deploying specialized models at scale

Outcome | Accelerating developer productivity and enterprise adoption

AWS Services Used

Amazon SageMaker AI

Amazon EC2

Amazon EC2 P5 Instances

Did you find what you were looking for today?

Learn

Resources

Developers

Help