AWS for Industries
How NetoAI trained a Telecom-specific large language model using Amazon SageMaker and AWS Trainium
At NetoAI, we specialize in telecom AI and voice AI, solving the unique data and language challenges of the global telecommunications industry. Telecom networks produce massive, specialized, regulated datasets that generic AI and large language models (LLMs) cannot fully address. Developing and deploying a domain-specific, open-source LLM – a telecom large language model (LLM) for telco operators and vendors – is essential for next-generation, AI-driven telecom solutions. In this post, we will outline how we built a Telecom-specific LLM using AWS Trainium and Amazon SageMaker.
Why choose AWS Trainium for NetoAI’s TSLAM – A telecom-specific, open-source LLM?
In developing TSLAM (Telecom-Specific Large Language Model) – the world’s first open-source LLM dedicated to telecom and telco language – we leveraged NetoAI’s proprietary dataset of telecom documents, RFC standards, network logs, and specialized vocabulary. We wanted to fine-tune a foundation model using LoRA (Low-Rank Adaptation) for optimal AI performance on telecom data. We quickly discovered that traditional GPU-based AI training lacked the cost efficiency and speed we required for real-time, large-scale LLM fine-tuning.
AWS Trainium, via Amazon EC2 Trn1 instances and Amazon SageMaker, transformed our training:
- Accelerated Model Training: Trainium’s deep-learning optimized hardware reduced our LLM training time for telecom datasets, allowing rapid iteration, experimentation, and continuous improvement of TSLAM.
- Lower Training Costs: Compared to top GPU-based approaches, AWS Trainium dramatically lowered both hourly and total expenses, making AI innovation sustainable for telecom companies.
- Scalable, Secure, and Managed AI Environment: Amazon SageMaker provided auto-scaling, secure data management, and model monitoring – vital for telecom AI deployments with strict privacy and compliance needs.
By combining AWS Trainium and SageMaker with NetoAI’s telco data science expertise, we built and open-sourced TSLAM – Telecom industry’s first LLM designed for telecom AI, voice AI agents, intelligent virtual assistants, and automated network operations. This lets telecom enterprises deploy domain-adapted LLMs, unlocking accurate, compliant, and scalable language AI for their industry-specific requirements.
Implementation and outcomes
The version of TSLAM detailed in this blog was fine-tuned from the Llama-3.1-8B foundational model using SageMaker and Trainium (trn1.32xlarge) instances and the same is also true for the finetuning job performed using the traditional GPU approach. The synergy of LoRA for efficient adaptation and Trainium for rapid, cost-effective compute proved transformative.
Utilizing AWS Trainium, we achieved a rapid completion time for a substantial training job, finishing the entire process in under three days. This accelerated performance translated directly into a highly cost-effective training solution
While the hardware comparisons are insightful, in this context the focus remains on the customer-centric gains rather than explicit competitive benchmarking. For those interested in further quantitative benchmarks, independent and internal studies may offer deeper insights.
The training process
To fine-tune TSLAM, we utilized the LoRA method, which adapts pre-trained large language models by updating a small subset of parameters, significantly reducing computational overhead. The training job was orchestrated through Amazon SageMaker SDK, which provides a PyTorch Estimator class for defining training jobs in the SageMaker managed environment. This offers a seamless interface for managing model training and hyperparameter tuning on AWS Trainium’s trn1.32xlarge instances. These instances leverage AWS’s custom Neuron cores, optimized for matrix operations critical to deep learning workloads
We used Amazon SageMaker SDK to run a fine-tuning script from within the Trainium instance spun up by an Amazon SageMaker training job, and we used AWS CloudWatch to monitor the training process to ensure it was proceeding as expected.
The dataset is loaded from S3 in JSONL format, formatted to match Llama-3.1’s chat structure, and processed with LoRA parameters to minimize memory usage. After extensive testing, we selected the best performing hyperparameters, enabling distributed training across 8 Neuron accelerators which are 16 neuron cores with bf16 precision for efficiency, and employed gradient accumulation to handle a batch size of 16 with limited memory.
NetoAI plans to leverage the latest Amazon EC2 Trn2 instances to scale TSLAM and advance next-generation generative AI applications.
The script also consolidates LoRA adapter shards post-training, merging them into the base model for deployment. AWS CloudWatch logs provided real-time insights into training progress, ensuring stability and performance.
The full script can be found here.
Figure 1: TSLAM Fine tuning and deployment using Amazon Sagemaker and AWS AI Chips
Deploying TSLAM on AWS Inferentia2
Once TSLAM was trained, the next critical step was deploying it for efficient, low-latency inference. For this, we turned to another of AWS’s purpose-built chips: AWS Inferentia2. AWS Inferentia2 is specifically designed to deliver high-performance inference at the lowest cost in the cloud.
To make our fine-tuned TSLAM model run optimally on Inferentia2, we used the AWS Neuron SDK to compile the model specifically for the accelerator. The compilation process, which utilized the same 8-core p arallelism strategy adopted during initial training, took about 20 minutes to complete. After deployment, inference latencies were consistently observed between 300 and 500ms, ensuring our solution remained highly responsive while maintaining cost efficiency at scale.
This end-to-end, purpose-built hardware approach meant our model was not only cost-effective to train, but also economical and performant in production.
TSLAM in action: Current use cases on the VING platform
Beyond demonstrating impressive capabilities in handling complex queries, TSLAM is already integrated into NetoAI’s VING platform, powering a suite of specialized agents that drive operational efficiency for our telecom clients. Key applications include:
- Rapid Root Cause Analysis Agent: Quickly diagnoses network faults and performance degradation issues by analyzing logs, alarms, and metrics to identify the source of problems.
- Customer Service Agent: Empowers support teams with deep technical knowledge, enabling them to resolve complex customer complaints and technical queries with accuracy and speed.
- Network Planning Agent: Assists engineers in designing and expanding network infrastructure by providing data-driven recommendations on capacity planning and resource allocation.
- Network Device Config Management Agent: Automates the generation, validation, and deployment of device configurations, reducing manual errors and ensuring compliance with network policies.
To ensure that the version of TSLAM we built on AWS was just as capable as the one that is currently being used on VING, we had to test it out to determine its accuracy when faced with real world problems. Below is a comparison between the Llama3.1 8B base model to the fine tuned TSLAM 8B.
Upon testing the model against the base model on which it was fine tuned, we found the following results:
Figure 2 Model accuracy in telecom domains
1 | A | B |
2 | Model | Average Score |
3 | TSLAM-8B | 86.2 |
4 | Llama-3.1-8B | 63.1 |
Our tests show that the fine-tuning process was a clear success. The new TSLAM-8B model we created earned an average score of 86.2%, which is a huge improvement over the 63.1% scored by the original base model. This jump means our fine-tuned model performs about 37% better. In short, the adjustments we made worked very well, making TSLAM-8B a much more capable and effective model for our specific tasks.
Future plans
Looking ahead, NetoAI plans to leverage the latest Amazon EC2 Trn2 instances to scale TSLAM and advancing next-generation generative AI applications.
With this powerful infrastructure, we aim to build bigger and better versions of TSLAM, pushing the boundaries of telecom-specific large language models and delivering more advanced, efficient, and real-time AI solutions for the telecommunications industry.
Conclusion
Leveraging AWS Trainium and Amazon SageMaker has been a pivotal enabler for NetoAI—letting us address telecom-specific challenges at scale while optimizing for speed and cost. The flexibility and performance we experienced pave the way for continued advances as AWS’s silicon ecosystem matures, with innovations like Trn1 and Inf2 ensuring robust infrastructure for the evolving telecom.