Customers & Partners

Resources

EGS Resources

Explore Resources for Elastic GPU Service

Analyst Reports

Navigating Key Metrics for Growth and Success

Blog

Source for Trends, Tips, and Timely Topics

Documentation

The Blueprint for Mastering Tools and Processes

Customer Case Studies

Success stories from our valued customers and partners

News/Pubs

Bringing You the Top Stories as They Happen

Videos

Explore Our Library of Informative and Entertaining Clips

Whitepapers

Exploring Critical Topics with Authoritative Research

ROI Calculator

Easily Track and Maximize Your Investment Returns

Marketplace/Registrations

Avesha product registrations

Optimize Your AI with Elastic GPU Service (EGS)

Company

About Us

Discover Our Mission and Core Values

Careers

Join Our Team and Shape the Future Together

Events and Webinars

Connecting You to Trends, Tools, and Thought Leaders

Support

Helping You Navigate Challenges with Ease

FAQ

Avesha Resources / Blogs

Trouble Scaling Kubernetes? Try the Smart Way

Jason Bloomberg’s article emphasizes the importance of K8s cluster autoscaling, highlighting advanced strategies for efficient resource management. It explores how AI can be used for Kubernetes cluster autoscaling, node autoscaling and pod autoscaling in Kubernetes environments.

Jason Bloomberg

Managing Partner, Intellyx

Take any list of the benefits of cloud computing, and at the top you’ll likely find massive horizontal scalability.

In many ways, horizontal scalability was the original justification for building public clouds in the first place. Set up virtual machines to scale out automatically, and then roll up massive numbers of such VM instances to take advantage of the economies of scale.

The result: public clouds could deliver massive horizontal scale far less expensively than organizations could on their own.

Such automatic horizontal scaling, or autoscaling, drives the primary IaaS value proposition. Scaling out horizontally in the cloud is a simple matter of setting each instance’s autoscaling parameters properly. The result is essentially infinite horizontal scale, limited only by the budget.

While cloud computing (public clouds in particular) offers many advantages over on-premises alternatives, cloud-based scalability can also be quite expensive. Setting up instances to autoscale at the drop of the hat can run up the cloud bill dramatically.

The rise of Kubernetes and cloud native computing in general have changed the nature of autoscaling, in large part to optimize how dynamic software infrastructure leverages cloud resources.

Cloud native computing with Kubernetes generally handles horizontal scalability quite differently than IaaS: the former at the pod level within Kubernetes, and the latter at the instance level within the cloud’s own environment configurations.

Scalability operations in the cloud are slow, on the order of minutes. Kubernetes autoscaling, in contrast, can take place in milliseconds at the pod level.

This horizontal pod autoscaling (HPA) is built into Kubernetes, automatically scaling pods up and down to deal with sudden increases and decreases in traffic.

And yet, while HPA provides greater agility and lower resource costs than cloud autoscaling can, it still has its drawbacks. Because HPA is reactive, even the few milliseconds it takes to respond to a spike in resource demands can lead to slowdowns or even momentary failures.

Don’t let poor autoscaling strategies give your users the dreaded 503 Service Unavailable error. Instead, scale Kubernetes the smart way by taking a proactive approach to sudden changes in demand.

The Problem with Reactive Autoscaling

In Kubernetes, horizontal pod autoscalers automatically change workload resources to automatically scale the workload to match demand.

Kubernetes performs this scaling by deploying more pods. Correspondingly, if the load decreases, Kubernetes will deprovision now-excess pods down to whatever the configured minimum pod number is for the cluster in question.

Horizontal pod autoscalers, in turn, periodically adjust the scale of its target deployment to match whatever metrics are relevant, including average CPU, memory utilization, or any metric the operator has configured the autoscaler to consider.

While these autoscalers can make their adjustments quickly, such adjustments are nevertheless reactive, or after the fact because they take action in response to the metric crossing a fixed threshold.

A sudden spike in load can lead to a momentary resource constraint before the autoscaling can adjust. Such constraints can lead to slowdowns, out of memory errors, and in some situations, that dreaded 503 error.

In order to avoid such problems, the traditional technique is for the operator to overprovision the Kubernetes clusters in question to ensure sufficient resources are available to handle such spikes.

While overprovisioning can decrease the chances of resource constraints, it is expensive to implement. One of the main economic motivations for moving from on-premises servers to the cloud and then from traditional IaaS to Kubernetes is to avoid such overprovisioning.

The last thing an operator wants to do is overprovision.

From Reactive to Proactive, ‘Smart’ Autoscaling

The best way to both optimize the costs and avoid the constraints of implementing horizontal pod autoscaling is to predict traffic demand ahead of time and scale up or down both application and infrastructure resources precisely based upon these predictions.

Reinforcement learning (RL) is an artificial intelligence technique that is well-suited for making such predictions.

The RL machine learning training method differs from both supervised and unsupervised learning, as it depends upon rewarding desired behaviors and/or punishing undesired ones. RL agents interpret data, take actions, and learn through trial and error in a simulator that leverages historical training data.

Smart Scaler from Avesha inputs both application and infrastructure performance data from the Prometheus open-source Kubernetes monitoring tool. Based upon these data, Smart Scaler uses RL to estimate the number of pods necessary for a given workload as well as likely traffic patterns that might lead to spikes or other changes.

These estimates then feed back into the Smart Scaler RL engine, which continuously optimizes the number of pods in a cluster in advance of any predicted changes in traffic to the workloads in each pod.

This continuous predictive autoscaling of Kubernetes resources improves upon the reactive HPA built into Kubernetes.

The Intellyx Take

Scaling Kubernetes requires automation, and automation in turn increasingly relies upon AI.

There are many different types of AI, and even within the machine learning arena, there are several learning approaches. Choosing the right approach is an important example of ‘the right tool for the job.’

Reinforcement learning is particularly useful in situations with many rapid cause and effect scenarios – and in the case of Kubernetes horizontal pod autoscaling, such scenarios are precisely the problem at hand.

HPA by itself is an important improvement over cloud autoscaling, but without the power of RL, HPA will never meet the needs of organizations that require the full power of Kubernetes scalability. As a result, Avesha Smart Scaler is an essential tool in any Kubernetes toolbox.