We are thrilled to announce the official release of EGS version 1.16.0, now available as of June 5, 2025.
What is EGS?
Avesha EGS stands for Elastic GPU Service, a platform developed by Avesha to optimize the management of GPU and CPU resources for AI and machine learning workloads.
Key Features of Avesha EGS:
- Dynamic Resource Allocation: Automatically adjusts GPU and CPU resources in response to real-time workload demands, ensuring optimal utilization and minimizing idle times.
- Predictive Scheduling: utilizes historical data and real-time metrics to anticipate task completions, enabling proactive resource provisioning and minimizing pipeline delays.
- Multi-Cloud and Multi-Cluster Support: Seamlessly manages resources across various cloud providers and Kubernetes clusters, providing flexibility and scalability for diverse deployment environments.
- Advanced Observability: Offers comprehensive monitoring tools, including real-time dashboards and alerts, to provide insights into GPU performance and system health.
- Cost Optimization: Incorporates features like GPU time-slicing and spot instance utilization to reduce operational costs without compromising performance.
- Multi-Tenancy and Security: Supports namespace-based multi-tenancy with role-based access control, ensuring secure and efficient resource sharing among teams and projects.
Avesha EGS is designed to address the challenges of managing GPU-intensive workloads, offering a solution that enhances efficiency, scalability, and cost-effectiveness in AI operations.
Key Highlights in EGS 1.14.0
Cluster Selection for Inference Endpoints
When deploying an Inference Endpoint, users now have the flexibility to:
- Send workloads to a single cluster for targeted execution.
- Distribute workloads across multiple clusters to optimize resource utilization.
This feature enables users to customize deployments to fit their specific requirements seamlessly.
Inference Endpoint Bursting
We’ve introduced a new Bursting to Available Clusters option to maximize scalability:
- Enabled by Default: Workloads can be dynamically allocated to available clusters.
- Custom Control: Users can disable this feature to restrict workloads to their selected clusters.
This enhancement ensures a balance between resource scalability and user-defined workload boundaries.
Standard Model for Inference Endpoints
The EGS portal now features a Standard Model field, simplifying the deployment of Inference Endpoints:
- Users can select predefined models from a dropdown menu.
- To enable this, configure a ConfigMap to populate the model options.
This feature streamlines deployment and ensures consistency across Inference Endpoints.

Custom GPU Pricing
Manage costs effectively with the new GPU Cost section in the GPU Inventory page:
- Users can customize GPU pricing directly through the portal.
- All future cost calculations will automatically reflect the updated pricing.
This update empowers users to maintain precise control over GPU-related expenses.
AMD GPU Support: Beta Release
EGS now supports AMD GPUs in beta, in addition to existing NVIDIA GPU support:
- Greater Flexibility: Users can choose between NVIDIA and AMD GPUs based on workload requirements and compatibility.
- Enhanced Usability: The GPU Inventory page now displays an AMD icon for clear identification.
Note: Certain features, such as pricing and dashboard monitoring for AMD GPUs, are currently unavailable in this release.
What’s Next?
These updates highlight our ongoing commitment to providing enhanced insights and greater control for our users. Refer to the latest documentation for detailed guidance on utilizing these enhancements.
Have Questions or Suggestions?
We value your feedback and are always here to assist you. Whether you have questions or ideas for improvement, don’t hesitate to reach out!
Copied