AWS AI Chips

AWS Inferentia Customers

See how customers are using AWS Inferentia to deploy deep learning models.

NetoAI

NetoAI provides the TelcoCore suite—including TSLAM, ViNG, DigiTwin, and NAPI—to help telcos automate their complex, multi-domain operations and customer life cycle management. A cornerstone of this is our TSLAM LLM, the first open-source, action-oriented model for this sector. To build it, we needed to fine-tune a model on our massive 2 billion-tokens of proprietary dataset, and using Amazon SageMaker with AWS Trainium trn1 instances, we achieved remarkable cost savings and completed the entire fine-tuning in under three days. For production, AWS Inferentia2 and the Neuron SDK give us consistently low inference latency between 300-600ms. This end-to-end solution on AWS purpose-built AWS AI chips is key to our mission of delivering specialized, high-performance AI to the entire telecom industry.

Ravi Kumar Palepu Founder & CEO

SplashMusic

Training large audio-to-audio models for HummingLM is both compute-intensive and iteration-heavy. By migrating our training workloads to AWS Trainium and orchestrating them with Amazon SageMaker HyperPod, we achieved 54 percent lower training costs and 50 percent faster training cycles while maintaining model accuracy. We also migrated over 2 PB of data to Amazon S3 in just one week, leveraging Amazon FSx for Lustre for high-throughput, low-latency access to training data and checkpoints. With AWS Inferentia2-powered Inf2 instances, our inference latencies can be reduced by up to 10×, enabling faster, more responsive real-time music generation.

Tomofun

Tomofun, the Taiwan-headquartered pet-tech startup behind the Furbo Pet Camera, has been redefining how pet owners interact with their pets remotely. Furbo combines smart cameras with AI to detect behaviors such as barking, running, or unusual activity, and alerts owners in real time. The challenge was twofold: Tomofun needed to sustain cost efficiency for continuous pet behavior monitoring across thousands of devices, while maintaining model fidelity and throughput without rewriting large portions of the BLIP codebase, which had already been optimized for PyTorch. By migrating BLIP inference to Amazon EC2 Inf2 instances, Tomofun reduced their deployment costs by 83%.