Customers & Partners
FAQ

Avesha Resources / Blogs

Announcing EGS 1.16.0 Release

uma.jpg

Uma Tammanagoudar

Staff Technical Writer

Copied

Elastic_Grid_Service_version_1.16.0.jpg

We are thrilled to announce the official release of EGS version 1.16.0, now available as of Aug 22, 2025. 

What is EGS? 

Elastic Grid Service (EGS) is a multi-cluster, multi-site GPU orchestration and workload distribution platform for Kubernetes, enabling cross-cluster GPU scheduling with granular allocation and full lifecycle management.

It optimizes GPU utilization by providing centralized observability, quotas, and governance across environments, while supporting secure cross-cluster and cross-site service connectivity through overlay networking and VPN.

EGS also enables intelligent workload placement and distribution to handle overflow capacity, failover, disaster recovery, load balancing, and capacity chasing across clusters.

Key Features of Avesha EGS

  • GPU Resource Management
    EGS provides a centralized platform to manage GPU resources across multiple Kubernetes clusters. It enables unified dynamic allocation, quota enforcement, and monitoring of GPU usage. 
  • User-Friendly Interface
    EGS offers an intuitive user interface. This simplifies the management of GPU resources, making it easier for users to allocate, provision, monitor, and optimize their GPU workloads.
  • Workload Placement
    EGS intelligently places workloads across clusters based on GPR wait time, GPU availability, and priority. Minimizes scheduling delays and optimizes GPU utilization.
  • Workload Redistribution
    EGS dynamically redistributes workloads across clusters based on real-time resource availability and workload demand to ensure optimal GPU utilization and performance.
  • GPU Provisioning and Scheduling
    A user can provision GPU Resources to workloads with fair scheduling and quota enforcement across Workspaces. Pre-checks of node health are performed before allocation to reduce scheduling failures.
  • Multi-Tenancy and Workspaces
    Workspaces enable logical separation of teams, projects, or applications. Access and quotas are enforced through workspace policies. Role-based access control (RBAC) and service accounts are supported for secure access to clusters.
  • GPU Visibility and Monitoring
    A real-time dashboard displays GPU utilization, health, and status. EGS integrates with Prometheus for multi-cluster metrics collection and provides node-level and workspace-level usage views for both administrators and users.
  • Seamless Integration
    EGS integrates seamlessly with existing Kubernetes environments and supports various GPU types, making it a versatile solution for diverse AI workloads.
  • Cost Management and Optimization
    EGS offers detailed cost analysis and optimization features. It allows users to monitor GPU usage and associated costs for Workspaces and workloads. This helps in reducing overall expenditure on GPU resources.
  • Security and Access Control
    Security is enforced through RBAC for users, groups, and service accounts. Secure kubeconfig downloads are supported, along with audit logging for GPU usage and access activities.

Key Highlights in EGS 1.16.0

The latest EGS release introduces powerful enhancements designed to optimize workload deployment, improve resource utilization, and strengthen networking and observability across slice workspaces. This release brings intelligent workload placement and redistribution, an enhanced admin dashboard experience, major networking upgrades, and improved visibility into workspace network health.

Intelligent Workload Placement

EGS now provides enhanced Workload Placement capabilities, giving administrators precise control over how workloads are scheduled and deployed across worker clusters within a slice workspace.

With Workload Placement, you can define placement rules and constraints based on cluster characteristics, GPU availability, and deployment requirements. This ensures workloads are initially deployed to the most suitable clusters, improving performance, reliability, and cost efficiency.

When combined with workload redistribution, Workload Placement enables both intent-based deployment and continuous optimization, ensuring workloads start in the right place and adapt dynamically as resource conditions evolve.

Key benefits:

  • Fine-grained control over workload scheduling
  • Optimal utilization of GPU and compute resources
  • Improved application performance and stability
  • Seamless support for multi-cluster slice environments

Learn More Manage Workload Placement

Dynamic Workload Redistribution Across Worker Clusters

EGS now supports workload redistribution across worker clusters within a slice workspace, enabling smarter load balancing and better resource efficiency.
Workloads can be dynamically moved between clusters based on real-time resource availability and demand. This helps eliminate hotspots, reduce underutilization, and maintain consistent performance during changing workload conditions.

Key benefits:

  • Improved load balancing across clusters
  • Reduced resource contention and bottlenecks
  • Better scalability under variable demand
  • Increased overall infrastructure efficiency

Learn More Workload Redistribution Across Worker Clusters

Enhanced Admin Dashboard with the Continuum Map

 The EGS Admin Portal dashboard now includes the Continuum Map, a visual representation of GPU resource distribution across different tiers of the edge-to-cloud continuum.

This map enables administrators to quickly assess resource allocation, understand workload distribution, and identify potential capacity or performance bottlenecks across infrastructure layers.

Key benefits:

  • Intuitive visualization of GPU resources across tiers
  • Faster insights into infrastructure utilization
  • Early detection of imbalances and bottlenecks

Learn more: Add the Continuum Map

Workspace (Slice) NS Gateway Enhancements

The Workspace (Slice) NS Gateway has been significantly enhanced to improve flexibility, performance, and observability for distributed workloads.

What’s new:

  • CRD improvements for more effective resource management
  • Support for backend service export–based remote services
  • Timeout configuration for finer-grained traffic control
  • Enhanced logging and monitoring capabilities
  • Automatic weight calculation based on running workloads
  • New specifications to enable efficient East–West traffic routing

Learn more: Workspace (Slice) NS Gateway

Conclusion

This EGS release delivers meaningful improvements across deployment, optimization, visibility, and networking:

What’s new:

  • Smarter workload deployment through enhanced placement
  • Dynamic optimization via workload redistribution
  • Improved observability with the Continuum Map and network health views
  • Stronger networking capabilities with advanced Slice NS Gateway features

Together, these enhancements make EGS more resilient, scalable, and efficient for modern edge-to-cloud workloads.