Customers & Partners

Resources

EGS Resources

Explore Resources for Elastic Grid Service

Analyst Reports

Navigating Key Metrics for Growth and Success

Blog

Source for Trends, Tips, and Timely Topics

Documentation

The Blueprint for Mastering Tools and Processes

Customer Case Studies

Success stories from our valued customers and partners

News/Pubs

Bringing You the Top Stories as They Happen

Videos

Explore Our Library of Informative and Entertaining Clips

Whitepapers

Exploring Critical Topics with Authoritative Research

ROI Calculator

Easily Track and Maximize Your Investment Returns

Marketplace/Registrations

Avesha product registrations

Optimize Your AI with Elastic Grid Service (EGS)

Company

About Us

Discover Our Mission and Core Values

Careers

Join Our Team and Shape the Future Together

Events and Webinars

Connecting You to Trends, Tools, and Thought Leaders

Support

Helping You Navigate Challenges with Ease

FAQ

Contact Us

Whitepaper

Download Whitepaper

Please fill the form below to download a free copy of the whitepaper.

SmartScaler + Run:ai: Predictive Scaling That Outpaces KPA in Real-World LLM Inference

This performance report highlights how Avesha SmartScaler dramatically improves LLM inference behavior on the Run:ai platform—consistently outperforming Run:ai’s Knative Pod Autoscaler (KPA) under identical conditions. Using Nebius H100 8×GPU nodes running Llama-3.1 8B FP8 on NVIDIA NIM, SmartScaler scales earlier, stabilizes throughput faster, and processes significantly more tokens during burst periods. While KPA reacts to concurrency thresholds, SmartScaler predicts load ahead of time using RL models, GPU-aware telemetry, and real-time framework metrics. The result: lower waiting queues, faster ramp-up, and up to 3×+ higher token throughput during instantaneous bursts, even though both systems operate with the same pod capacity thresholds. This document provides the full architecture, scaling logic, benchmarking methodology, and side-by-side comparison of SmartScaler vs. KPA on Run:ai-managed clusters.

Author(s):

Avesha Team

Enter first name

Enter last name

Enter work email address

Smart Solutions for Smarter Kubernetes and AI/ML Operations

Terms and Conditions

Privacy Policy