May 2024
Baseten Delivers Fast, Scalable Generative AI Inference with AWS and NVIDIA
Benefits
2X
faster delivery throughput for customers in production50%
decreasing in time to first token with TensorRT-LLMEarly access
to TensorRT-LLM through NVIDIA's Inception programOverview
Baseten is a San Francisco-based machine learning infrastructure company with a focus on model inference. Offering an advanced machine learning operations (MLOps) platform for model deployment, model serving, and model fine-tuning, customers come to Baseten to run large language models (LLMs) at scale reliably, performantly, and cost-efficiently. With LLM performance as a top priority, Baseten teamed up with
AWS Partner
NVIDIA and Amazon Web Services (AWS) to deliver measurable throughput and latency improvements—dramatically improving time to first token (TTFT).
About Baseten
Baseten makes going from machine learning models to production-grade applications fast and easy. With Baseten, data science and machine learning teams can build applications without backend, frontend, or MLOps knowledge.
About AWS Partner NVIDIA
Since its founding in 1993, NVIDIA has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI, and is fueling industrial digitalization across markets. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.
Our partnership with NVIDIA, plus their software and hardware stack, has allowed our customers to bring their ideas to market more quickly and cost-efficiently.
Amir Haghighat
Co-Founder and CTO, BasetenAWS Services Used
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages