Avesha Resources / Blogs
Cynthia Hsieh
VP Of Marketing and GTM | Startup Advisor | Investor
As AI workloads grow larger and more distributed, the need for scaling and bursting AI inference securely across clouds and clusters becomes critical. Yet scaling without strong security can open the door to vulnerabilities—especially now that inference happens across more dynamic, multi-cluster environments.
In early 2025, the OWASP Foundation released the Top 10 Security Risks for Large Language Model (LLM) Applications, highlighting new, urgent threats during AI inference and deployment. These include prompt injections, model denial-of-service attacks, training data poisoning, and insecure output handling—threats that can compromise data privacy, model integrity, and enterprise compliance.
At Avesha, we believe that scaling AI must come with security baked in from the ground up. Here's how KubeSlice helps enterprises burst and scale AI inference safely and efficiently.
When inference workloads burst across multiple clouds or clusters, security gaps appear in critical places during the inference phase itself — not just at input or data ingestion.
Here’s what happens if security is missing during scaling:
Inference Step | Risk | Description | Example Attack |
Model Execution (multi-cluster) | Prompt Injection at Scale | Malicious prompts injected into one node propagate errors across all clusters. | Input triggers all models to leak sensitive info |
Model Execution (multi-cloud) | Sensitive Information Disclosure | If clusters share models insecurely, inference output leaks private data. | Internal training data exposed in one region |
Model Execution (supply chain) | Supply Chain Risk | Fine-tuned models, LoRA adapters across clusters may be poisoned. | Backdoored adapter loaded in bursting node |
Model Execution (workload agents) | Excessive Agency | Poor access control lets inference "agents" call unauthorized APIs when bursting. | Inference job triggers cost-heavy APIs without approval |
Result Generation | System Prompt Leakage | If bursts clone models poorly, system prompts and internal configs leak during responses. | Exposed system rules via output |
Result Generation | Misinformation | Inferencing at scale without proper RAG validation leads to inconsistent, dangerous outputs. | Answer flips facts based on region data inconsistency |
Resource Scaling | Unbounded Consumption | Attackers flood bursting nodes to drain compute resources ("Denial of Wallet"). | GPU/CPU exhaustion across cloud clusters |
đź”’ Inference must scale. Risks must not.
KubeSlice, part of Avesha’s AI infrastructure automation platform, delivers a defense-in-depth architecture to protect AI inference pipelines across clouds, clusters, and edge locations.
KubeSlice logically segments Kubernetes workloads into isolated slices, applying Zero Trust principles:
Impact: If an inference node is compromised, the attacker cannot laterally move to other AI models, pipelines, or datasets.
Scaling inference across distributed clusters introduces security risks. KubeSlice counters them with:
Impact: Inference traffic between clusters is encrypted, authenticated, and protected against man-in-the-middle (MITM) attacks.
KubeSlice limits exposure by sharding Kubernetes environments:
Impact: AI inference pipelines stay resilient even under targeted attacks.
While KubeSlice secures the infrastructure, Kata Containers secure the runtime:
Impact: Even if a containerized inference job is compromised, attackers can't jump to the underlying node or other workloads.
By combining KubeSlice and Kata Containers, enterprises can:
âś… Enforce strong network segmentation across AI inference workloads.
âś… Protect against lateral movement and privilege escalation.
âś… Secure inference traffic across hybrid and multi-cloud environments.
âś… Maintain compliance and cryptographic integrity automatically.
As AI inference scales across hybrid, edge, and cloud environments, so do the risks.
Following the lessons from OWASP's 2025 Top 10 LLM Risks, enterprises must move beyond traditional cluster security—and adopt slice-based segmentation, encrypted communications, and runtime isolation as default practices.
At Avesha, we’re innovating the future of AI infrastructure—automating scaling while securing every layer of your inference pipeline.
🔒 Learn more about how Avesha KubeSlice can protect your AI workloads → www.avesha.io/products/kubeslice
📅 Schedule a Free Security Assessment →
Copied