Customers & Partners

Resources

EGS Resources

Explore Resources for Elastic GPU Service

Analyst Reports

Navigating Key Metrics for Growth and Success

Blog

Source for Trends, Tips, and Timely Topics

Documentation

The Blueprint for Mastering Tools and Processes

Customer Case Studies

Success stories from our valued customers and partners

News/Pubs

Bringing You the Top Stories as They Happen

Videos

Explore Our Library of Informative and Entertaining Clips

Whitepapers

Exploring Critical Topics with Authoritative Research

ROI Calculator

Easily Track and Maximize Your Investment Returns

Marketplace/Registrations

Avesha product registrations

Optimize Your AI with Elastic GPU Service (EGS)

Company

About Us

Discover Our Mission and Core Values

Careers

Join Our Team and Shape the Future Together

Events and Webinars

Connecting You to Trends, Tools, and Thought Leaders

Support

Helping You Navigate Challenges with Ease

FAQ

Avesha Resources / Blogs

Smart Karpenter: AI Driven Cost Optimization Kubernetes (K8s) deployments

Dheeraj Ravula

VP of Customer Success

Smart Karpenter_ AI_Driven_Cost_Optimization_Kubernetes_(K8s)_deployments.jpg

Introduction

Kubernetes (K8s) has emerged as the best orchestration solution for microservice deployments. However, fine tuning configurations to operate a cost efficient scaling that meets your SLO remains very challenging.

The configuration tuning needs to be adjusted at various levels

Microservice [pod]
Microservice service graph
The infrastructure layer [nodes/nodepools]

Kubernetes offers three primary autoscaling methods:

Cluster Autoscaling (Node scaling): Alters the number of nodes in a cluster, guided by node utilization metrics and the presence of pending pods.
Vertical Pod Autoscaling (VPA): Adjusts the CPU and memory resources within existing pods based on usage changes.
Horizontal Pod Autoscaling (HPA): Modifies the number of pods in response to usage changes.

These methods are falling short in addressing the complex demands of modern microservices applications and do not address application behavior and service graph dependencies. All of these methods are reactive and hence difficult to size correctly. The tuning for each microservice is achieved today with manual configuration for each environment and has to be constantly re-adjusted for each release, resulting in cloud cost overruns and reduction of feature velocity due to SRE/Devops burnout.

In this blog you will learn:

How to transform an existing kubernetes autoscaling configuration to “Smart Karpenter”, an intelligent AI powered node and pods autoscaling solution.

Solution Overview

Smart Karpenter combines the power of Karpenter to optimize dynamic node selection and bin packing with Smart Scaler’s AI-driven, predictive, application-aware autoscaling to significantly reduce costs and deliver improved SLO compliance.

Key differences by feature

Feature	Smart Karpenter	“Normal” HPA + Clusterautoscaler
Core Functionality	Combines Karpenter’s dynamic node provisioning with Avesha's Smart Scaler's AI-driven proactive autoscaling.	Focuses solely on reactive cpu/memory usage triggered application scaling and static node pool based node scaling.
Focuses solely on reactive cpu/memory usage triggered application scaling and static node pool based node scaling. Scaling Approach	Proactive and predictive, leveraging AI models to anticipate and optimize scaling.	Reactive, triggered by unscheduled pods or pending workloads.
Application Awareness	Understands application behavior and service interdependencies for smarter scaling.	Limited to cluster-level resource needs without application-specific insights.
Traffic Prediction	Forecasts traffic patterns to preemptively scale resources before spikes occur.	Scales reactively after workload demands increase.
Cost Optimization	Reduces over-provisioning by fine-tuning pod and node configurations based on AI insights.	Focuses on bin packing for efficiency but may over- or under-provision resources.

Implementation steps and technical details

Lets walk through taking an application configured using k8s autoscaling and converting it to a Smart Karpenter configuration.

Existing Scaling infrastructure

Below is a diagram of typical scaling configurations that leverage HPA and Cluster autoscaler using node pools.

Screenshot from 2025-04-30 15-50-12.png

The configurations are tweaked for each environment [Dev, Staging, Perf, Production..] for an application. This configuration autoscales in response to traffic to the microservices by reacting to the cpu and memory utilization of the existing pods. Since this is reactive, buffering mechanisms need to be configured at various levels to effectively handle spikes in traffic.

Some of the common techniques that teams use to handle spike in traffic

Set cpu/memory requests and limits higher than what is actually needed. Drawback for this approach is overprovisioning of cpu and memory resulting in wastage.
Set HPA scaling triggers to low for cpu and memory usage. The problem with this approach is capping the utilization to a low value and resulting in wastage.
Set the min number of pods to a value that can handle max spike in traffic. This results in egregious overprovisioning of resources for most of the time
Run evictable dummy pods on the nodes. This is intended to reduce the impact of node bring up time, but results in overprovisioning.

Lets look at a sample configuration for HPA and the deployment descriptor for a microservice. The cpu/memory request/limits settings and scaling thresholds in this file are often tuned for dev/staging/performance/production clusters manually for each microservice.

apiVersion: apps/v1
kind: Deployment
metadata:
 name: currencyservice
spec:
 selector:
   matchLabels:
     app: currencyservice
 template:
   metadata:
     labels:
       app: currencyservice
   spec:
     serviceAccountName: default
     terminationGracePeriodSeconds: 5
     containers:
       - name: server-currencyservice
         image: gcr.io/google-samples/microservices-demo/currencyservice:v0.4.1
         ports:
           - name: grpc
             containerPort: 7000
         env:
           - name: PORT
             value: "7000"
           - name: DISABLE_TRACING
             value: "1"
           - name: DISABLE_PROFILER
             value: "1"
           - name: DISABLE_DEBUGGER
             value: "1"
         readinessProbe:
           initialDelaySeconds: 30
           periodSeconds: 30
           exec:
             command: ["/bin/grpc_health_probe", "-addr=:7000"]
         livenessProbe:
           initialDelaySeconds: 30
           periodSeconds: 30
           exec:
             command: ["/bin/grpc_health_probe", "-addr=:7000"]
         resources:
           requests:
             cpu: 100m
             memory: 64Mi
           limits:
             cpu: 100m
             memory: 128Mi
     tolerations:
       - key: "non-app-pods-no-schedule"
         value: "true"
         effect: "NoSchedule"
# HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: currencyservice
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: currencyservice
  minReplicas: 1
  maxReplicas: 80
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 30

For node scaling, setting up node groups and cluster autoscaling can be set.

https://docs.aws.amazon.com/eks/latest/best-practices/cas.html

Transforming the above configuration to Smart Karpenter is a very simple process and starts with deploying the Smart Scaler agent into the application cluster. Once the agent is deployed, it will automatically start collecting k8s and application metrics for the application and pushing it to the Smart Scaler Platform.

Deploy the Agent

helm repo add smart-scaler https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod/
helm install smartscaler smart-scaler/smartscaler-agent -f ss-agent-values.yaml -n smart-scaler --create-namespace

The ss-agent-values.yaml file can be downloaded from the Screenshot from 2025-04-30 15-52-17.png on the https://ui.saas1.smart-scaler.io/agents page of the Smart Scaler platform after you sign up for an account.

This will work for most users, however if you have any questions during deployment, don't hesitate to reach us at support@aveshasystems.com

Screenshot from 2025-04-30 15-53-39.png

At this point Smart Scaler is operating in Observational mode. The application is still using the existing auto-scaling configurations. Using the metrics received from the agent, the scaling behaviors for each of the microservices is determined by AI on the Smart Scaler Platform. In addition servicemap dependencies are mapped out for the cluster.

Traffic Prediction and scaling decisions: Using the stream of metrics from the agent, in realtime, Smart Scaler’s AI models forecast application traffic and make precise scaling decisions for each microservice while keeping track of traffic and application behaviors to keep the errors to a minimum. These scaling decisions are sent back to the agent deployed in the cluster.

Using the Smart Scaler Management UI, customers can at this point observe how Smart Scaler would have scaled the microservices in observation mode.

Turn on Optimize mode

Now that Smart Scaler has built predictive patterns through observation mode, we can now update the configuration to begin taking those predictions and applying them to actually scale the microservices. The Smart Scaler platform provides the mechanism to convert HPA to get scaling instructions from the Smart Scaler agent. Nothing in the configuration below needs to be adjusted by the user anytime.

# SmartScaler HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: currencyservice-hpa
  namespace: demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: currencyservice
  behavior:
    scaleDown:
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
        - type: Percent
          value: 10
          periodSeconds: 60
      stabilizationWindowSeconds: 10
  minReplicas: 2
  maxReplicas: 30
  metrics:
    - type: External
      external:
        metric:
          name: smartscaler_hpa_num_pods
          selector:
            matchLabels:
              ss_deployment_name: "currencyservice"
              ss_namespace: "demo"
              ss_cluster_name: "workshop-boutique-cluster"
        target:
          type: AverageValue
          averageValue: "1"

Screenshot from 2025-04-30 15-52-44.png

Turning on Karpenter: At this point, the application autoscaling for pods is automatically optimized, Karpenter can be installed to help with optimal provisioning of nodes based using the steps below.

For installing karpenter on the EKS cluster  follow directions in 
https://www.eksworkshop.com/docs/autoscaling/compute/karpenter/configure
For configuring node pool with  karpenter on the EKS cluster  follow directions in 
https://www.eksworkshop.com/docs/autoscaling/compute/karpenter/setup-provisioner

Smart Karpenter Optimized Mode

At this point, the Smart Karpenter configuration setup is complete. Smart Karpenter will continue to monitor the application behavior in the cluster and scale the application in that environment efficiently. It will also automatically learn and refine the AI autoscaling as new versions of the applications are deployed and/or when traffic patterns change. For each environment, Smart Scaler applies some or all of the following optimizations

Spike Scaling for environments with large load
Event pre-scaling for environments that have recurring but non continuous spikes
Right sizing for environments with low traffic OR low overall cpu/memory utilization but high traffic
karpenter configuration for optimized node scaling

Screenshot from 2025-04-30 15-54-20.png

Benefits

Reduced cost of operating K8s clusters in each environment. In most cases Smart Karpenter provides >30% cost reduction and in some cases >70%
Reduced errors and latency for the services to meet SLO
Just-in-Time and “best” Node Availability: Ensures the right sized nodes are available exactly when needed.
Optimal Resource Utilization: Maximizes pod CPU and memory utilization without compromising performance.
Accurate Scaling: Provides precise pod count predictions for efficient node scaling.

Conclusion

Combining Avesha’s Smart Scaler with AWS Karpenter provides an unparalleled autoscaling solution, ensuring the highest efficiency at the lowest costs. This synergy between Smart Scaler and Karpenter revolutionizes Kubernetes autoscaling, enabling organizations to optimally leverage cloud and microservices architectures.

Learn More

Explore more about Avesha Systems and AWS Karpenter:

Avesha
AWS Karpenter
Marketplace Offerings
Smart Karpenter Workshop Studio A complete step by step workshop to build a sample application configured with HPA on a cluster configured with cluster autoscaling that receives reference load traffic, then switch HPA to Smart Scaling and cluster autoscaling to Karpenter, observe the before and after application performance and resource utilization.