Bursting to Prevent User Revolts

Burst workloads and the gaming experience
Bursting User Revolts
Avesha Logo
Avesha's Blogs

18 April, 2023

10 min read



Editor’s note: The following is the transcript from Avesha’s live event on March 16, 2023, Bursting to Prevent User Revolts: burst workloads and the gaming experience. It has been lightly edited for clarity.

Eric Peterson: 
Hi everyone. My name's Eric Peterson and I'm the Head of Engineering here at Avesha. Welcome to our new webinar series about the biggest changes impacting distributed computing. For some time, we've been seeing a growing need for applications to be deployed across multiple availability zones, multiple regions and multiple clouds. And now we're seeing that trend continue towards the Edge. In particular, we've seen latency sensitive applications such as gaming applications show an interest in moving closer to their users. And as the number of sites where those are deployed increases, it becomes more important to have a common framework to orchestrate applications deployed across all of those sites.

At Avesha, we see Kubernetes as the common orchestration platform for these distributed deployments. But something needs to connect all of these varied Kubernetes clusters into a common application friendly, extended multi cluster. This is what we do with KubeSlice. KubeSlice lets applications distributed across multiple clusters communicate easily with each other, as though they are all elements of a single extended multi cluster. And that is true whether the clusters are deployed in multiple sites within a cloud provider, across cloud providers or distributed to Edge providers.

A related issue is support for autoscaling as user demand varies. Within Kubernetes, it's common to scale pods and or nodes up and down within a cluster as needed. But it can be a challenge when the location that needs to host the expanded services is not within the same physical footprint. So being able to burst resource capacity that is dynamically grown as well as dynamically shrunk as needed, is key to meeting demand as end user load is bursting as well.

Today we're joined by Kavita Parihar and Dave Comery from Cox Edge, and we're going to discuss the benefits of flexibility when it comes to where and how distributed resources can be deployed. In particular, how distributing applications to the Edge can meet the growing need for on-demand low latency services with gaming applications being one such example. Kavita, Dave, welcome. Can you let everyone know what Cox Edge does and what you do there?

Kavita Parihar: 
Thanks for the warm welcome, Eric. I'm Kavita Parihar, leading the technology team at Cox Edge. Cox Edge is a full stack Edge cloud computing and services provider. We have placed our location at the Edge of the network and last mile to help bring workloads, applications, data and content closer to the end user and devices. You can experience faster processing time and better performance, which is very crucial for the next wave of digital transformation experience. Looking forward to our discussion today.

Dave Comery: 
Well said, Kavita. And I'm Dave Comery, and I work on the product offerings here at Cox Edge. We launched our service, it's coming up on about two years now since we've been in market and we've been expanding our reach not just into the Cox markets, but also nationally within the US and globally, at more than 60 locations. So definitely looking forward to the discussion today. Back to you, Eric.

Eric Peterson: 
Great, thanks. I know one of the challenges we hear a lot about is how to make it easy, to cost effectively distribute applications to sites where they can deliver low latency services. Kavita, maybe you can talk a bit about the benefits of distributing applications to the Edge.

Kavita Parihar: 
Sure, Eric. One of the main drivers for distributing application to the Edge is a growing demand for on-demand low latency and bandwidth and compute services. As more and more applications move to the cloud, users expect faster response time and higher level of availability. However, sending this data to and from centralized data center can cause latency issues, resulting in slower performance for end users. By moving the application closer to the Edge, organizations can reduce latency and improve overall performance.

Another reason that I can think of why distributing application to the Edge is important is the need for a real-time data processing. As the volume of data generated by IoT devices, sensors, and other sources increases, there is a growing need for real-time data processing capabilities. This is particularly important industries such as healthcare, finance and manufacturing, where decisions need to be made quickly based on the latest data. Additionally, the distribution of applications to the Edge can also help organizations to reduce the network bandwidth cost. By processing data closer to the end users, organizations can reduce the amount of data that needs to be sent back and forth between the end users and the cloud. This can help to reduce network condition and improve overall network performance.

Eric Peterson: 
Thanks, Kavita. And Dave, do you want to add anything about the map we're looking at?

Dave Comery: 
Sure, yeah. So what we're seeing on the map here is representation of an Edge distributed architecture. As Kavita says, getting data and interactions closer to end users is critical for improving user experience. So that's why we think about placing pockets of computing as close as possible to the gamers and where they may be sitting actually within the eyeball networks, again, those last mile networks.

So to pick the diagram apart a little bit, the orange circles you see here, these would be pockets of compute, either compute that's sort of always on or sort of as the bursting happens, the compute kind of gets spun up and allocated to the gaming users that are coming in at those locations. And this is done a couple of different ways. So you can see this notice of a global load balancer, so a concept of a global load balancer. So whether that's something DNS-based or any caste-based that allows that traffic to flow to the closest location, that's what's represented here.

So as that gamer comes onto the network, establishes connectivity with the game, they'll likely be redirected to a local replica of a game server that's maybe part of the matchmaking service that will help connect multiple gamers together within kind of a local area kind of proximity. Those burst locations generally will need to talk to either a distributed version or some type of centralized data store. That could be for user accounts, items, in-game currency like statistics, the essence of that user, how does that get out to that gaming server, there's this connectivity kind of mesh that's happening across these different locations.

And so this centralized data store, whether it's centralized or distributed, both models would apply, that can be on a diverse set of infrastructure. That could either be in a centralized virtualized, think about a Kubernetes type of a store that would sit there. It could be in cloud architecture. You see kind of the nod of the head towards cloud infrastructures that can also be tied into this mesh. So that's one of the nice things, Eric, we like about the Avesha solution is really democratizing that infrastructure so that anything can talk to anything over this secure mesh that gets formed between these different locations. So as that gamer's coming in and they're getting their data and they're sort of bringing that session onto the network, the connectivity's automatically established.

Now the second thing that might happen as that gamer connects could be either no local burst infrastructure is currently available or the burst infrastructure there is starting to get busy. So think about peak times. And as the load on those burst locations grows, that's where you may see the need to provision additional infrastructure, hence rate bursting. So those orange circles could actually grow and contract depending on how many gamers, what's the current load that's kind of coming into that location, they may also shed to a different location.

So imagine somebody in the northeast connects into a game server, a brand new site maybe spun up or maybe you flow some infrastructure into the cloud pop that's shown here to handle that gaming session. So there's a lot of flexibility and dynamic behavior that's all built into this overall architecture. Certainly, easier said than done, there's a lot of details here. But bringing the connectivity piece and making that automatic secure automatable is a big part of that. So you're not having to maintain VPN connections between all these different pops and locations that the infrastructure just kind of works from that perspective, which is great.

So that's like this unifying fabric. It's pretty essential. And Eric, I know this is your wheelhouse, so maybe you want to talk a little bit more about KubeSlice and what it brings.

Eric Peterson: 
Yeah. Thanks, Dave. So in this picture, KubeSlice is essentially the orange lines you see here connecting the various sites. You have clusters that are deployed north, south, east, and west, as well as maybe some big centralized backend. And in order for all of these things to interconnect, to do this without KubeSlice requires a lot of configuration and maintenance, setting up all the various tunnels, maintaining all the security for everything. But KubeSlice was created to make this kind of mesh connectivity easy to do. We deployed on the clusters, we established all the interconnections, and we basically make it something the applications don't have to worry about.

Really, the way we make it easy is first we start by introducing this concept of a slice. Between any two sites, there could be multiple applications deployed and maybe they're not all the same thing. So you may want to segment your applications and what you do is you create a thing called a slice. And then you onboard namespaces onto that slice. So whatever application you have or set of applications that relate to each other, you put them onto a common slice, then you go out to any other site where you intend to deploy all or part of that application and onboard namespaces onto that slice at that location as well. As soon as you do that, what the KubeSlice controller will do is it will establish all of the overlay network topology that needs to happen to make those sites be able to connect to each other and talk securely.

Once that plumbing's in place, any services that you have at any of those sites that you want to be reachable from the others can be advertised. Basically, you identify the things you want, cluster A to be able to see in cluster B and you'll advertise those services and then cluster A can reach them.

And really at that point, all the applications on a slice are up and on their own virtual topology so they can communicate as if they're reachable within an extended multi cluster. This way applications distributed between different AZs, different regions, different clouds, cloud to Edge or cloud to even private data center can all communicate seamlessly and without having to modify the applications.

Kavita Parihar: 
And Eric, that's what we love about the slides. The productivity gains that you realize by not having to re-architect your containers and microservices, they're invaluable. It really does provide a new way of looking at distributed application and providing this layer where the underlying infrastructure kind of melts away and I don't have to worry about it, meaning that I can design my application as if it was running within one big cluster with any number of namespaces and the application routing provided by the slides gets all the traffic to the right end point, regardless of which cloud or Edge I happen to be running in. This is really where we see distributed application development evolving too.

Eric Peterson: 
Cool. So we've talked about the application distribution aspects here, but let's talk a bit more about bursting. When we hear about the need to handle dynamic loads, the key factor is always cost. And of course, you can handle peak loads if you over provision everything for it, but that's not the goal. You want to be cost-effective when it comes to delivering these services. So serving these peak loads, but doing it in a cost-effective way and all doing it while maintaining a desired service level. That's where it becomes a challenge. Dave, maybe you can tell us what we're looking at here in terms of handling bursts.

Dave Comery: 
Yeah. The graph here, it's a representation of this daily diurnal pattern of load coming and going throughout the day. As you can see, you could just figure out what you need for the worst case provision for those peaks. Just nail up an infrastructure and kind of a way you go. But of course, if you do that, you're going to waste a lot of resource and end up paying for things you're not using when game load's not active.

Now on the other end of the spectrum, say we flip that around and say you're only going to use dynamic infrastructure, so you're not really building any baseline. Everything is scale up and down. Well on that end of the spectrum now, you're not really being able to take advantage of any committed sort of resources or committed infrastructure discounts that you may get from various providers. So there's sort of a balance in the middle there where if you can get the right amount of baseline compute that's hitting that sweet spot, it's sort of shown here, sort of cresting maybe just above the troughs of those valleys in the load curve.

And you kind of nail up that infrastructure, you optimize it, you make it really efficient, you get the cost out of it and keep that baseline running. Now your gaming system is sort of set on this foundation. And then the top side of the graph, as those game sessions start rolling in, you add resources, again back to the previous kind of chart and diagram, you would grow those orange circles that were there. That's kind of the representation of the green arc here, being those incremental scale-ups of infrastructure at those different sites and so forth.

So really, we think about this in two ways. You're good to have that baseline optimize for cost and make it really efficient, but then burst in the workloads as the gaming demand increases, scale up workloads. And as that game gaming demand alleviates, kind of scaling down the workloads.

Eric Peterson:    
And Avesha's been doing a lot of work on a complementary edition of KubeSlice called Smart Scaler. Smart Scaler is a reinforcement learning based tool that will observe the loading patterns for an application. It'll use the load to identify the patterns of resource usage associated with that application as it bursts up and down and build that knowledge into an RL model. It'll use that model to anticipate the needs of the application and proactively scale it's needed pod or node resources up and down, all while delivering the service with the configured SLA or quality level as specified by the user.

So this reduces operational costs by essentially having an AI that's overseeing your applications and relieving the DevOps teams from needing to continually monitor and tune applications as new versions are rolled out. It also provides insights on how the application is running, both the current application as well as sort of comparing it to historical versions. It can sort of highlight when changes, either good or bad, have occurred in the application over time and help you focus your efforts on addressing them. And ultimately, it can reduce overall cloud spend by basically helping you better optimize the utilization of the pods and nodes that are running your application.

Dave, can you talk maybe a bit about how you see your customers load calculations factoring into how they plan their infrastructure?

Dave Comery:    
Yeah, for sure. And it's a good point you raise about the predictive nature of scaling being the future. But whether or not you have predictive being able to calculate those loads is important. So being able to trigger those points. And of course, AI is going to trump anything you're trying to do by hand and manually sort of fitting these curves.

So with that said, we do see different customers taking different approaches here. It just depends on business objectives that they're going after. Some go for fully virtualized kind of a setup, so that's a bit of a more homogenous scenario. The customers like this because it's simple. There's some simplicity, it's very straightforward, kind of level playing field, whether that on the baseline or the bursting side. But some of our more sophisticated customers are looking at mixing infrastructures also. So maybe you've got some bare metal in the mix or cloud computing in the mix and that's making up your baseline. And then you're going to burst into the Edge workloads and those machines and bringing kind of max performance to those profiles. So layering on VMs means that you're able to handle the bursting and the pay as you go fashion while having that baseline sort of again sort of nailed up for cost-effectiveness.

Eric Peterson:    
Great. Thanks Dave and Kavita. Definitely here at Avesha, we're thrilled to be working alongside you and Cox Edge to bring our technology to bear on these advanced use cases, especially for gaming. But we've had a couple questions come in. Dave, maybe you can take the first one.

Dave Comery:    
Sure. I'll just read it off. This one, we'll send to you, Kavita. So when a customer identifies an Edge site near their end users, how do they create a cluster at that site and deploy their services?

Kavita Parihar:    
Thanks, Dave. Thanks for the question. The first step, as you said, is to choose your Edge education, right? You will need to determine where you want to deploy your cluster and identify the appropriate Edge location. This can be done very easily using the deployment map that we have on our Cox Edge dashboard.

Once you have chosen your Edge location, the next step is to create the cluster, and all of this can be done using the portal in a couple of clicks. This may need setting up network connectivity, configuring storage, and configuring the security setting. With your cluster up and running, in few minutes, you can now deploy your services to it, which means packaging your applications, deploying them to the cluster using tools like Kubernetes manifest or Docker, Compose file.

Finally, you will need to monitor and manage your services to ensure that they're running smoothly. This may involve setting up monitoring tools like Prometheus or Grafana and managing your services using our managed Kubernetes solution. Then of course, the application distributed to all these sites needs to be able to communicate both with each other and with the microservices that make up each application.

Eric, how does KubeSlice tie these clusters together so that the application can talk regardless of where they're deployed?

Eric Peterson:   
Yeah, so typically a customer starts with an application. It's deployed in a cluster somewhere. And they identify the parts of that application that they want to distribute. And again, those pods are typically already in some namespace. So what you do is you onboard that namespace onto a slice. Now at each of the various Edge sites where you want to deploy your applications, you onboard that namespace onto that same slice in those clusters and KubeSlice will do the work to stitch everything together securely and build that overlay network so that all the pods that are members of that namespace can talk to each other.

And then you take those applications that you want to distribute, deploy them in those namespaces and they can talk to each other or to the backend, however they want to do. And again, the services are all advertised across clusters by the KubeSlice infrastructure. In the gaming space, a common model is to deploy the front end game servers to the Edges while keeping any centralized backend services like a database in a backend cloud. That way, the parts that are latency dependent, that are serving the end user, are at the Edge sites and can do that quickly. And if they need to save games or they need to do anything that's sort of persistent storage, they can push that back to a central location.

Another thing I should point out is this thing we call our global load balancer, which works with external DNS providers to advertise reachability for all the services that have been distributed to the Edges and it helps to steer traffic towards the Edge sites that meet the latency demands of the users. So if you've taken an application that used to be centralized and distributed to end different Edge sites, you need external DNS providers to know that it's available at these end sites and be able to get the traffic to all of them.

Kavita, what do you see that's driving the need for burstable infrastructure within the gaming space specifically?

Kavita Parihar:   
Eric, I feel that one of the use case that's driving Cox Edge to deliver burstable instances in the gaming space is due to growing demand for scalable and flexible infrastructure to support online gaming, right? With increase in online gaming, gaming companies are facing challenges in managing their infrastructure to handle certain surges in traffic and usage. This is where burstable instances come in. They provide the ability to scale up resources when needed to handle the increased traffic and then scale back down when demand subsided. For example, imagine a gaming company that develops and operates an online multiplayer game. During peak usage time, such as during a major gaming tournament, the company needs to ensure that the infrastructure can handle the certain surge traffic. By using Cox Edge burstable instances, the company can quickly scale up their infrastructure to handle the increased traffic, ensuring that the game continues to run smoothly, which is a very good experience. Furthermore, with Cox Edge flexible pricing model for burstable instances, the company can avoid over provisioning resources and paying for unused capacity, reducing their overall infrastructure cost.

Eric Peterson:   
Great, thanks. Next question I have here, is there value in moving only microservices to the Edge? I guess I'll throw out my thoughts and then Dave, Kavita, you can chime in too. So like I said before, the example we've seen is it's not that you necessarily want to take this entire application that you have hosted in the cloud and move it out to one particular Edge. It's that there are elements of that, say some of the microservices that make up that application that are front facing, that you want a low latency response to those applications. So those pieces, what we do is we make it easy to distribute them out to the Edge sites so that the customer experience has improved while they're still able to talk to the backend services without any changes. Dave, Kavita, do you have any other thoughts on that too?

Dave Comery:   
Yeah, totally agreed. Definitely, decomposing your application. Thinking through that architecture, what's going to make sense to run closest to the end user versus what needs to remain centralized, thinking through disaster recovery scenarios. So if you lose a piece of the Edge or a piece of the centralized, how does the system recover from those scenarios? I think a good walkthrough failover architecture and those types of things is always recommended. So definitely be specific and think about how you're distributing your microservices would be the advice from here.

Eric Peterson:   
Great. Okay, next question. Can a distributed network help with head-to-head play? So in the conversations we've had, the first part of that is yes, where you're trying to deliver the lowest latency service between players who are playing head-to-head, but you need to combine that with your player matching algorithm. So as you identify a set of players who are going to play head-to-head, ideally, you're putting them onto a common Edge site where they can all get that low latency benefit. So that awareness of the distributed topology needs to be part of the matching algorithm, and that's actually something that we've been discussing that with some of our potential customers with our RL algorithm, and how we can sort of apply that towards matching players head to head and a aligning that with the distributed network. Dave, Kavita, any thoughts there?

Dave Comery:   
You got it well, Eric.

Eric Peterson:   
Okay. And last question is, how is this different from HPA? So I'm going to assume that's in regard to Smart Scaler. HPA, for anyone who's not familiar, horizontal pod auto-scaling is an infrastructure that's common in Kubernetes that typically looks at utilization of CPU or memory resources as are related to pods, and as thresholds are exceeded, will spin up additional pods or spin down pods. So typically, when you deploy an application and you're using HPA to scale it, you have a DevOps team that looks at how the application functions and how they can deploy it such that they can deliver it with a certain quality of service there. An obvious issue is if you say, when my CPU becomes 100%, I want to spin up another pod. Well, in the time between when you detect that and when the next pod comes up, you're losing services.

So essentially, you can't wait until you're fully maxed out, you have to set a lower threshold. And when you set that lower threshold, that's the point that sort of gives you the buffer in order to get the additional resources in place to handle the ad demand. Now what we see is that first off, it's a complex problem to figure out how do I do all of my HPA settings such that I can deliver my entire service at quality? Because again, typically a lot of these services are not like single microservices, they're a collection of microservices that are interacting with each other.

So what Smart Scaler does is it applies RL to first understanding the entire application. It looks at all of the microservices that make up the complete service. It identifies the loading patterns of all of them, and it will proactively spin up pods or nodes as needed to ensure that the service level guarantee that you've asked for is delivered. Now, a key benefit here is that by understanding the loading patterns and understanding the whole application behavior, we're able to better utilize those pods and nodes than you could by continually resetting the configuration for HPA.

So again, essentially what you've got is this RL engine that's learned the application and is sort of continually tweaking things in order to make sure that you're getting the most you can out of your resources. So hopefully that answers that question. And that's all the questions I have.

So I think it's time to finish up. Thank you, Kavita and Dave, for joining today.

Kavita Parihar:   
Thanks, Eric. It was nice talking to you both.

Dave Comery:  
Yeah, thanks for having us out, Eric. We definitely appreciate it.

Eric Peterson:  
And folks, thank you for joining us today. There will be a replay video and a blog post available shortly. And we're back next month for another live event. Hope to see you then.

Dave Comery:  
Cheers. Thanks, all.

Eric Peterson: