Avesha Resources / Blogs
Dheeraj Ravula
VP of Customer Success
Kubernetes (K8s) has emerged as the best orchestration solution for microservice deployments. However, fine tuning configurations to operate a cost efficient scaling that meets your SLO remains very challenging.
The configuration tuning needs to be adjusted at various levels
Kubernetes offers three primary autoscaling methods:
These methods are falling short in addressing the complex demands of modern microservices applications and do not address application behavior and service graph dependencies. All of these methods are reactive and hence difficult to size correctly. The tuning for each microservice is achieved today with manual configuration for each environment and has to be constantly re-adjusted for each release, resulting in cloud cost overruns and reduction of feature velocity due to SRE/Devops burnout.
How to transform an existing kubernetes autoscaling configuration to “Smart Karpenter”, an intelligent AI powered node and pods autoscaling solution.
Smart Karpenter combines the power of Karpenter to optimize dynamic node selection and bin packing with Smart Scaler’s AI-driven, predictive, application-aware autoscaling to significantly reduce costs and deliver improved SLO compliance.
Key differences by feature
Feature | Smart Karpenter | “Normal” HPA + Clusterautoscaler |
Core Functionality | Combines Karpenter’s dynamic node provisioning with Avesha's Smart Scaler's AI-driven proactive autoscaling. | Focuses solely on reactive cpu/memory usage triggered application scaling and static node pool based node scaling. |
Focuses solely on reactive cpu/memory usage triggered application scaling and static node pool based node scaling. Scaling Approach | Proactive and predictive, leveraging AI models to anticipate and optimize scaling. | Reactive, triggered by unscheduled pods or pending workloads. |
Application Awareness | Understands application behavior and service interdependencies for smarter scaling. | Limited to cluster-level resource needs without application-specific insights. |
Traffic Prediction | Forecasts traffic patterns to preemptively scale resources before spikes occur. | Scales reactively after workload demands increase. |
Cost Optimization | Reduces over-provisioning by fine-tuning pod and node configurations based on AI insights. | Focuses on bin packing for efficiency but may over- or under-provision resources. |
Lets walk through taking an application configured using k8s autoscaling and converting it to a Smart Karpenter configuration.
Below is a diagram of typical scaling configurations that leverage HPA and Cluster autoscaler using node pools.
The configurations are tweaked for each environment [Dev, Staging, Perf, Production..] for an application. This configuration autoscales in response to traffic to the microservices by reacting to the cpu and memory utilization of the existing pods. Since this is reactive, buffering mechanisms need to be configured at various levels to effectively handle spikes in traffic.
Some of the common techniques that teams use to handle spike in traffic
Lets look at a sample configuration for HPA and the deployment descriptor for a microservice. The cpu/memory request/limits settings and scaling thresholds in this file are often tuned for dev/staging/performance/production clusters manually for each microservice.
apiVersion: apps/v1
kind: Deployment
metadata:
name: currencyservice
spec:
selector:
matchLabels:
app: currencyservice
template:
metadata:
labels:
app: currencyservice
spec:
serviceAccountName: default
terminationGracePeriodSeconds: 5
containers:
- name: server-currencyservice
image: gcr.io/google-samples/microservices-demo/currencyservice:v0.4.1
ports:
- name: grpc
containerPort: 7000
env:
- name: PORT
value: "7000"
- name: DISABLE_TRACING
value: "1"
- name: DISABLE_PROFILER
value: "1"
- name: DISABLE_DEBUGGER
value: "1"
readinessProbe:
initialDelaySeconds: 30
periodSeconds: 30
exec:
command: ["/bin/grpc_health_probe", "-addr=:7000"]
livenessProbe:
initialDelaySeconds: 30
periodSeconds: 30
exec:
command: ["/bin/grpc_health_probe", "-addr=:7000"]
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
tolerations:
- key: "non-app-pods-no-schedule"
value: "true"
effect: "NoSchedule"
# HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: currencyservice
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: currencyservice
minReplicas: 1
maxReplicas: 80
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30
For node scaling, setting up node groups and cluster autoscaling can be set.
https://docs.aws.amazon.com/eks/latest/best-practices/cas.html
Transforming the above configuration to Smart Karpenter is a very simple process and starts with deploying the Smart Scaler agent into the application cluster. Once the agent is deployed, it will automatically start collecting k8s and application metrics for the application and pushing it to the Smart Scaler Platform.
helm repo add smart-scaler https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod/
helm install smartscaler smart-scaler/smartscaler-agent -f ss-agent-values.yaml -n smart-scaler --create-namespace
The ss-agent-values.yaml file can be downloaded from the on the https://ui.saas1.smart-scaler.io/agents page of the Smart Scaler platform after you sign up for an account.
This will work for most users, however if you have any questions during deployment, don't hesitate to reach us at support@aveshasystems.com
At this point Smart Scaler is operating in Observational mode. The application is still using the existing auto-scaling configurations. Using the metrics received from the agent, the scaling behaviors for each of the microservices is determined by AI on the Smart Scaler Platform. In addition servicemap dependencies are mapped out for the cluster.
Traffic Prediction and scaling decisions: Using the stream of metrics from the agent, in realtime, Smart Scaler’s AI models forecast application traffic and make precise scaling decisions for each microservice while keeping track of traffic and application behaviors to keep the errors to a minimum. These scaling decisions are sent back to the agent deployed in the cluster.
Using the Smart Scaler Management UI, customers can at this point observe how Smart Scaler would have scaled the microservices in observation mode.
Turn on Optimize mode
Now that Smart Scaler has built predictive patterns through observation mode, we can now update the configuration to begin taking those predictions and applying them to actually scale the microservices. The Smart Scaler platform provides the mechanism to convert HPA to get scaling instructions from the Smart Scaler agent. Nothing in the configuration below needs to be adjusted by the user anytime.
# SmartScaler HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: currencyservice-hpa
namespace: demo
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: currencyservice
behavior:
scaleDown:
policies:
- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 10
periodSeconds: 60
stabilizationWindowSeconds: 10
minReplicas: 2
maxReplicas: 30
metrics:
- type: External
external:
metric:
name: smartscaler_hpa_num_pods
selector:
matchLabels:
ss_deployment_name: "currencyservice"
ss_namespace: "demo"
ss_cluster_name: "workshop-boutique-cluster"
target:
type: AverageValue
averageValue: "1"
Turning on Karpenter: At this point, the application autoscaling for pods is automatically optimized, Karpenter can be installed to help with optimal provisioning of nodes based using the steps below.
For installing karpenter on the EKS cluster follow directions in
https://www.eksworkshop.com/docs/autoscaling/compute/karpenter/configure
For configuring node pool with karpenter on the EKS cluster follow directions in
https://www.eksworkshop.com/docs/autoscaling/compute/karpenter/setup-provisioner
At this point, the Smart Karpenter configuration setup is complete. Smart Karpenter will continue to monitor the application behavior in the cluster and scale the application in that environment efficiently. It will also automatically learn and refine the AI autoscaling as new versions of the applications are deployed and/or when traffic patterns change. For each environment, Smart Scaler applies some or all of the following optimizations
Combining Avesha’s Smart Scaler with AWS Karpenter provides an unparalleled autoscaling solution, ensuring the highest efficiency at the lowest costs. This synergy between Smart Scaler and Karpenter revolutionizes Kubernetes autoscaling, enabling organizations to optimally leverage cloud and microservices architectures.
Explore more about Avesha Systems and AWS Karpenter:
Avesha Systems specializes in AI-driven cloud solutions, offering advanced autoscaling technologies for Kubernetes environments.
Contents
Copied