Customers & Partners

Resources

EGS Resources

Explore Resources for Elastic GPU Service

Analyst Reports

Navigating Key Metrics for Growth and Success

Blog

Source for Trends, Tips, and Timely Topics

Documentation

The Blueprint for Mastering Tools and Processes

Customer Case Studies

Success stories from our valued customers and partners

News/Pubs

Bringing You the Top Stories as They Happen

Videos

Explore Our Library of Informative and Entertaining Clips

Whitepapers

Exploring Critical Topics with Authoritative Research

ROI Calculator

Easily Track and Maximize Your Investment Returns

Marketplace/Registrations

Avesha product registrations

Optimize Your AI with Elastic GPU Service (EGS)

Company

About Us

Discover Our Mission and Core Values

Careers

Join Our Team and Shape the Future Together

Events and Webinars

Connecting You to Trends, Tools, and Thought Leaders

Support

Helping You Navigate Challenges with Ease

FAQ

Contact Us

Region

United States

Industry

Biotechnology, genome testing

Optimizing genomic workloads & slashing idle spend with Smart Karpenter

GeneDX accelerates precision-medicine research with event-driven AI/ML pipelines that run on Kubernetes across Oracle Kubernetes Engine (OKE) and Azure Kubernetes Service (AKS). Two research teams share a multi-cloud platform that must scale quickly to meet tight turnaround-time (TAT) requirements for genomic analyses.

Background

Team	Cloud	Pre-Smart Karpenter Workflow
Team A	OKE	Static node group sized for peak.
Team B	AKS	Identical pattern; idle nodes during troughs, slow starts during spikes.

Karpenter was not available on Oracle Cloud Infrastructure (OCI), so the OKE clusters had no concept of just-in-time nodes, also, scaling across the estate was reactive and over-provisioned.

Challenges

Slow surge response: 5-minute pod queue times during traffic spikes.
30 % idle cloud spend: node pools padded to avoid cold starts.
Manual threshold tuning: DevOps tweaked HPA/VPA every release.
SLO risk: rising TAT threatened clinical commitments.

Solutions

Smart Karpenter fuses Avesha Smart Scaler with Karpenter to predict pod demand in advance and provision the exact nodes required.

Capability	Impact at GeneDX
Predictive pod scaling: RL models analyze latency, RPS, and service dependencies	Pods launch before a spike; queues disappear.
Dynamic node provisioning: predictions drive Karpenter for right-sized nodes	No idle padding; nodes spin up/down in < 60 s.
Observation → Optimize rollout	Two-week shadow run before full AI control.
Continuous learning	Scaling stays accurate as workloads evolve.

Implementation Steps

Helm install in “observe” mode; zero YAML changes.
One-sprint traffic replay to train baseline models.
5 % → 100 % cut-over after three clean deployments.
Cost guardrails: policy caps daily node-hours; Smart Karpenter throttles non-urgent jobs when budget nears limit.

Results

KPI	Before	After Smart Karpenter	Delta
Average node CPU utilization	48 %	82 %	+71 %
Idle node-hours / month	1 900	520	−73 % waste
P95 pod queue time	5 m 10 s	< 45 s	6.8 × faster
SLO violations (job TAT)	12 / month	0	100 % compliance
Cloud compute spend	Baseline	−33 %	Savings fund new research lines

Customer Voice

“Smart Karpenter makes Karpenter proactive. Nodes appear before the load hits and disappear immediately after, cutting a third of our cloud bill while keeping turnaround times rock-solid.” - Director of Genomic ML Platforms, GeneDX

Conclusion

With Smart Karpenter, GeneDX:

Achieves predictive, hands-free autoscaling across OKE and AKS.
Eliminates idle spend while boosting utilization above 80 %.
Meets stringent diagnostic SLOs without manual tuning.
Gains a reinforcement-learning foundation for future hybrid-cloud growth.