Avesha Blogs
16 November, 2023,
16 min read
Video Link: https://www.youtube.com/watch?v=ZHTIux_cGzg
Kunal:
Welcome back to another webinar and we're excited about this one. We're going to talk about Kafka with multi-cloud in a multi-cloud environment with KubeSlice, it's an amazing open source project. We have the creators here, Olyvia and Prasad, who are going to walk us through and we announce some nice questions and learn about nice things. In the end it's also going to be recorded. So if you're missing out, don't worry you can watch it later on and we've done some bunch of amazing content previously so I'll leave all the links in the description below, and it's streaming live on YouTube and Linkedin both so we should be having nice people joining in. Hello hello hello, a lot of folks, alright cool. So if we move forward, Olyvia if you would like to give yourself a quick intro then Prasad, and then Prasad you can go ahead with the presentation. Olyvia, you're on mute.
Olyvia:
Thank you Kunal, always good to be with you. I'm the VP of product marketing and user interface here at Avesha and excited to be here to talk about Kafka and multi-cloud and multi-cluster.
Prasad:
Thank you, thank you very much Olyvia and Kunal. It's always great to be with Kunal. Hey guys my name is Prasad, I am a chief product officer at Avesha and I just wanted to spend some time understanding how people are using KubeSlice - I mean people are using Kafka and then how we can help a deployment of Kafka on multi-cluster scenarios or stretch clusters in areas more than happy to chat around it.
Kunal:
Amazing. If you want to connect with Prasad and Olyvia, you can join the KubeSlice community in the Kubernetes slack Channel. You can say hi over there as well, you see we're getting some questions here so I'll collect those and we'll answer all the questions in the end so keep the questions coming. Prasad I believe you can share your screen.
Prasad:
All right I'm gonna share the entire window then bring up Avesha. Higher level if you see, data is the new oil in every enterprise. The landscape in data is many many things. Data gets ingested from different sources and then eventually there are streaming services like Kafka, playing a pivotal role in making sure the data is transported across for different applications downstream to be able to consume. Kafka has become a neural system, inside enterprises people are using Kafka for different use cases. When you look at it whether you want to have logs, whether you want to have database replications there are different use cases which are happening. One important aspect is as we go into the kubernetes environment, originally it all turned out to be more stateless workloads going in, but now data on kubernetes is increasing dramatically. Some of the statistics you can see is that there is a 71% growth from year over year from a databases standpoint, and messaging is growing at 41% as a huge uplift in how people are deploying these data services on kubernetes. Kafka plays a pivotal role in distributing messaging info natively. It is a distributed service and then how you use it in different ways is something which you see now even streaming. In enterprises, as I talked about, there are lots of applications which are for monitoring, whether it is for data sending across different edge use cases or multi-cluster use cases, people are sending through Kafka. Kafka has lots of consumers while the production of data is by the applications whether it is going into data warehouses or indexing for services and so there are lots of use cases in which people are using Kafka for that. I wanted to spend some time around what are the basics of Kafka, there is a producer which is producing all these data and then there is a bunch of clusters what is called Kafka clusters, so I don't want to confuse this with kubernetes cluster, in the notion of Kafka cluster, Kafka cluster is how you put Brokers together that forms a cluster. That is different from the kubernetes cluster. A kubernetes cluster is a bunch of nodes together and then through an orchestration how you put different things together, applications or workloads together. Then there are consumers, which are actually reading from these brokers and then being able to consume and then process the data this the control plane for Kafka cluster is done through a open source project called Zookeeper, which gives you a way of configuring different brokers and what it does and stuff like that I will go into details around it.
Olyvia:
So Prasad, if I may stop you here, you mentioned and I have, this data point that 40% of large organizations they use their messaging platform within kubernetes and they use Kafka; you're describing how is a kubernetes cluster to be perceived differently than a Kafka cluster. Can you explain that a little bit more, why are they using kubernetes for their messaging and how does Kafka come into this picture?
Prasad:
Good question Olyvia, most of the workloads are going cloud native, so you're running your cloud native applications, one: it is because you want to scale faster. Two: it is because you know you're decomposing your workloads from a monolith to a microservices realm. So now while we are doing this orchestration layer, from a kubernetes standpoint you don't want to have it outside of kubernetes to be able to do the Kafka cluster for distributed messaging standpoint. So what they're trying to do is take that all in a cloud native way and make them look like you know everything is cohesive from a management standpoint and as well as scaling standpoint. Now, Kafka was born much before the lots of other things so they have this notion of cluster because they bring together a whole bunch of resiliency features. In the brokers, that's the reason why they call it Kafka cluster; cluster from their standpoint is how brokers talk to each other and then distribute the data that is what Kafka cluster is, whereas kubernetes cluster is the infrastructure part of you know bringing all these infrastructure together that is the kubernetes cluster.
Olyvia:
That is very interesting, please proceed.
Prasad:
If you dig a little deeper, in Kafka is there are people who are producing data they call it producers and that there are people who are consuming data they are consumers and then who are Distributing the data or the brokers which are Distributing the data so the producers,into the brokers and consumers can consume from the broker. So these are all distributed in nature. That is the most important aspect of Kafka which gives you scalability. Now what are we producing and what are we consuming? It's the data now. How do you define the data? That is what is important data is let's say today we are having a conversation on on a subject called Kafka that becomes any messaging which happens around that is a Kafka topic, tomorrow if you want to talk about kubernetes kubernetes will become a topic so that's the kind of organization you define and that is the construct which they use called a topic. Now topic since the nature of Kafka is a distributed in nature you don't want to have topics sitting in one server or one instance of broker because that cannot scale you want to be able to have the topic distributed across different instance of broker so you have a way to partition that the topic can be partitioned across different distribution of brokers. That's a construct called Partition and what partition gives you is scalability and you don't have to know scale vertically but you can scale horizontally across different instances of broker so that you can consume it much faster.
Kunal:
So someone is asking why are we like what is the motivation behind deploying Kafka brokers in multiple clouds like are there any benefits and opposed to that also some challenges with this approach?
Prasad:
I will describe that, Kunal. Distribution is many reasons for this, right?
Kunal:
If it's coming in the presentation then don't break the flow.
Prasad:
I will give you a primer as to what Kafka is in some sense. At a high level what is the data? Data is there is a metadata like we talk about topics partitions and timestamps and different different metadata you can create and then the whole data is in the key value pair. It's easy from a body standpoint because it is a key value pair, so you define a key and then the corresponding values you can put there. We talked about how there are brokers and there are partitions, the partitions are- let's say we talk about Kunal’s webinar, this is one topic, Kunal webinar. Somebody might say I am interested in multi-cluster. Kunal's webinar may be partition one. I am interested in Kunal’s webinar. But more on policy that is partitioned two or maybe I want Kunal's webinar topic but Kafka is another partition, so that if everybody can subscribe to whatever partition they are interested in and then scale it horizontally. That's the advantage of it. Now from an infrastructure efficiency standpoint, all of these things are log files which are stored, you can segment them so that you don't have any kind of failures and all that stuff. So that is much easier. Now your question as to why do you want to have it on kubernetes, because kubernetes natively gives you the scalability aspects; kubernetes is where your workloads are. From a source and destination standpoint, why not put Kafka on kubernetes itself so that you can scale both ways and then be able to consume it. So that's the logic behind how people are using Kafka nowadays.
Olyvia:
To bring a real life example, Prasad, say when we think of uber, the drivers and riders exchanging messages or applications, just you know putting out some messages or we talked about Twitter at one point. Those are kind of very high intensive messaging application scenarios where Kafka would be used a lot.
Prasad:
I put an example of uber, a big data stack that fundamentally has the ingestion of Kafka. They ingest everything into Kafka and then distribute it to different applications because it's a pub sub methodology, you publish and then you subscribe to these topics or messages and then you bring into that. There are many use cases in many ways for translation- from an ETL standpoint or logging standpoint, in fact people are using a database replication standpoint as to how things are done. In Uber there are trillions of messages per day which have been used and then petabytes of data is getting stored into different use cases and people are- this is excluding replication. There are different ways from a resiliency standpoint. People do use replications, I think Kunal's question is why do you need multi-region multi-cluster or scaling. This is exactly what you asked for to provide business resiliency and communities survive outages. Now this is a neural system of that enterprise from an ingestion standpoint, it is the critical infrastructure for the downstream applications to be resilient on. So that's the reason why you have to put up with survival for any kind of disaster which is impacting business. Any cascading failures you don't want to have that other thing is that customer experience as you looked at Uber; Uber is an application where you're actually- Riders are closer to you or your cars are closer to you. You want to be able to only serve that region, but you don't want to have a car which is somewhere else that appears in yours, so from that standpoint you want to localize certain things. Then there is data integrity and consistency which need to be obtained from a standpoint of you know data being ingested in, so those are the things which you use for multi-region there are different you know deployment models. Whether you have a deployment model from region reporting to a center or center distributed to region or a complex way of every region talking to each other to be able to solve certain things; so there are different deployment models based on application needs. People are using Kafka from that standpoint and not necessarily being single cluster stretched. It could be multi-cluster fundamentally there. It's a very flexible architecture and it can survive lots of different models. Kunal did it answer your question?
Kunal:
Yeah. Let's move forward.
Olyvia:
In the diagram that you were showing with the different regions, so is it that all the producers, the consumers and the brokers that you talked about earlier are all in these separate regions and centers separately.
Prasad:
Yes. So each one is producing data in different regions and then you're centralizing because of adding any analytics you want to have. You want to bring all them together in a centralized area so that you can do much more analysis later. Let's say if you have a product catalog which you're distributing it to different regions. Based on the regionalization which you have, there is one other model for a product catalog to go into different areas per region. Let's say if you want to do billing and other stuff which need to be centralized that is one use case people do, and the different use cases have the ability to define it in a different way. So those are all the things which Kafka supports.
Kunal:
So there’s another question on the screen
Prasad:
That's an important factor. How does Kafka ensure fault tolerance in the event of broker failures? In a multi-cluster scenario you can have a multi-cluster and then there is something called replication, which I mentioned in here. People are replicating data in different places so that you can consume- switch over so you don't have a single broker failure, and that's the reason why you divvy up by partition. You divvy up by replicas. How many replicas you want to have and then there is consistency which is maintained; saying that I want to have- there are 0 1 and ISR techniques which says you write it and then you take care of the replication and consistency or when you write it I want to wait until I see the consistency is maintained across how many number of replicas. There are different schemes inside Kafka where you can play where to get the resiliency you wanted. From a broker standpoint how to break the broker or reliably have this thing- I wanted to introduce ourselves to Kubeslice, how do we help this end-over journey of Kafka which is becoming much more prevalent to enterprises? In the standpoint of different use cases which they are trying to do with KubeSlice, as we described before in the previous talks, it creates multi-cluster networking Services which would create a virtual cluster across multiple physical kubernetes clusters. Now what it does is it gives you an abstraction layer for service connectivity fabric across multiple clusters. Which is centered around namespaces, if you are defining Kafka brokers defined in a Kafka namespace and then zookeeper is in a Kafka name namespace you can define a slice only for that and then consumers and producers can be in different namespaces to consume this. You can have a virtual cluster that way Kafka can look as though the brokers are sitting in a same virtual cluster but physically separated out; so that any infrastructure failures it actually is resilient to that. Kafka gives you different techniques, it's almost like you take an Excel Microsoft Excel and then you write macros. You can use lots of different ways of distributing it. So that's how Kafka has, but what we are trying to do is we are helping you know the different use cases people are building on top of Kafka to be able to support the deployment models. Which I talked about before. One other use case is that let's assume when you go to a retail store don't assume that there is no Kafka. There in the retail store point of sale or any of the product catalogs they have events which are actually subtle or buy or inventory all those events are captured through Kafka. From a consumption standpoint and as well as Productions and producing the outcome standpoint so those are all the things which happen inside edge use cases. Which is like a retail store, now we all know that the networking is probably not as resilient in a store on a floor. So things can break, but you don't want to lose all the transactions, you have the consumption publish and subscribe models more carefully thought through in an edge use case. So Kafka plays an important role because as we talked about we have decoupled the producers and consumers. That's an advantage for the Kafka architecture because you have decoupled the producers and consumers. Consumers can consume whenever they want but producers can produce it. There are producers which produce and then immediately consumers want an instant gratification or instant processing that's also possible. Kafka gives you a notion of retention of topics. You can put the topics to retain it; let's say 24 hours or six hours or four hours, so that you can bring the consumers the data. That's the fundamental advantage of Kafka because you can decouple, it's not like an oltp system where your transactions have to happen immediately something happens, you can have this decoupling which gives you the advantage of consuming it whenever you want it. So that's the advantage from a Kafka standpoint.
Olyvia:
So what you're saying is that applications with lots of messaging, with lots of events they need to use Kafka. And for Kafka to perform better and for better resiliency and for security reasons, maybe you need to distribute. In distributing the networking the kubernetes networking is one of the challenges and that's where the KubeSlice service connectivity layer comes in. Then you were describing the advantages of Distributing Kafka and keeping the consumers separate from the producers but also having them in one virtual cluster to join for that performance and for that resiliency and for the security aspect of it. Is there a security angle as well?
Prasad:
Yes there is. Importantly, I'll come back to the question on the screen for a second on the batch processing versus streaming, but let me address Olyvia's question. Data is the most important thing for any enterprise, data needs to be secure, number one there is security in different layers. The security is at the data transport, security is at data at rest and security is who is accessing the data. What are the people who are accessing the data and then it is important to have a different level of security for every aspect of it. What we include, KubeSlice is a way of protecting the data. Which is transported across number one and as well as who accesses to what kind of data it is. Now, enterprises may have their own way of securing the data, when you store or enterprises may have an encryption within the data elements itself. We augment a defense in-depth approach. We augment data protection from a standpoint of access and as well as on Transit what needs to happen and how it provides security around it. Now somebody asked a question: how is it different from real-time streaming versus batch processing. Batch processing is what it used to be before, now people's immediacy of data is very critical. If you go to your store online transaction and then you want to buy a specific item and if it is a batch processing- if you process that what is the inventory out of it last night but 10 hours into it, the inventory is exhausted, but you go into the website and then you want to actually buy that and then later on come back and say hey that it doesn't exist, are you going to be happy? No, you want to have stream data, to be able to be correct and accurate at that instant of time so that's the reason why streaming is becoming much more relevant in a lot of our transactions aspect versus the batch data. Did I answer that question?
Kunal:
I believe so and I believe we have tons of questions now so we can answer those in the end after the presentation.
Prasad:
Alright, thank you.
Now what slice brings in, is a way of connectivity fabric for different physical infrastructure clusters so that you can distribute your Kafka clusters on the physical infrastructures. So that is the use case which we enable number one today if you were to do multi-cluster there are lots and lots of things we need to do. One is how do I do a stretch cluster or do I need to do a multi-cluster and then do a copy. Copy data is always a challenge, consistency matters, immediacy matters so those are all the things which are clear problems. That's the reason why KubeSlice helps them to alleviate some of those problems, so that we can reduce the copy data and give the ability to stretch the clusters across multiple physical clusters, which brings back to the advantages we talked about. Multi-cluster gives you resiliency, it gives you immediacy of the data when where you want it to be and it also gives you distribution of topics. When you talk about topics, when you talk about partitions you have many ways of slicing the data. Data itself, like the way we give you infrastructure slicing across different stripes of regions you want to have, also gives you a way of slicing the data by topics and partitions where they are available and where they can be copied. So those are fundamentally kind of a jigsaw puzzle to make sure it fits your environment.
That's the advantage with our slicing technology. Our KubeSlice enables you to manage different fleets of clusters to be able to put it in different infrastructures. Be it in the same cloud or be it in an on-prem because there is a lot of data where people want to put all those things in the data center itself because of compliance reasons and stuff like that. They want to have a certain set of topics which are going to be distributed in the public clouds that's also possible. That's the advantage of KubeSlice, bringing to Kafka at a high level, enables you to distribute the brokers rather than copying the brokers. You can distribute the brokers, which are essentially stretch clusters and it doesn't impede the fact that if you want to have multiple clusters in physical clusters which you want to connect the copying is much easier and it also helps different distribution because this whole framework is agile. You want to create a topic on demand and then you want to take the topic away and then you have a different retention strategy. All that stuff is possible with Kafka. As I said stretch cluster, Synchro - there are different ways of synchronizing data, there is a synchronous and an asynchronous replication. We enable both of them synchronous replication so that when you post one end you resume it on another end, consume it at another end and then reduce your RTO.
Recovery time to objectives is important from a business standpoint, lost business is one of the foundational problems, which a lot of people from a customer experience standpoint focus on. If you go to a website and the website is slow or the website is not performing what your action is you're not going to go back to that website because it's a lost opportunity. You don't like that behavior so that's the reason why people are looking for the resilient way of supporting the customers and the top line revenue is not impacted so in order for that to happen you want to have synchronous replication across different regions or clusters, Kafka enables that and KubeSlice gives that.
I will leave it with you and then take the questions. Advantages, KubeSlice with Kafka, don't be afraid of many clusters, you can have any number of clusters you want and then you can put it inside, KubeSlice will give you the fantastic framework to be able to communicate across in the service network standpoint. Decide if you need to scale clusters by data centers region clouds and all of the above. You don't have to worry about the scaffolding, you need to do from a connectivity service connectivity standpoint we automate that and then give that service connectivity. I take this very carefully. It is better to consume over a distance than to produce it. If I were to say every time I need to produce for every region, that means I am creating more copies. Consistency will be lost; so rather than one source of producers and then consuming it, maybe it takes a few minutes later you can still have that consumption much easier rather than to be able to produce the data and then make multiple copies of it. So that's the advantage with how KubeSlice brings to that. As I said before, if you do any code don't do any harm. That means you don't want to have lost data, because there are lots of actors who are working feverishly to steal data. You want to be able to protect the data, data is the most important asset for any enterprise. You want to be able to securely transport the data across the wire in different clusters and distribute these legalities, gdpr is one thing if you're using you know personalized information. You don't want to leave the European Union and then be processed in the American region, so you want to have legalities conforming to the compliance in particular regions. So you want to be able to protect the data there, so those are all important factors. You want to see how the topics are published, how partitions are established and how that broker is serving that kind of data; once again failure is paramount. From a customer experience standpoint you don't want to have failures, so you want to be able to have a failover solution and then be able to literally give you the benefit of continuous serving and you know what we call in Telco World Finance, availability so 99.999 availability is important. Another thing is copying data, every time you make a copy, consistency is lost because if you modify one you need to go and then make a copy modify any number of copies which you have set. So how do you reduce the number of copies? I'm not saying that you can avoid copying all the time but you need to make sure you can reduce the number of copies you want to have. That's the important thing which you want to keep in your system design when you are trying to do that.
Olyvia:
The way KubeSlice helps, Prasad, just to reinforce that because we can stretch the cluster you don't need to keep copying it in every cluster, so you can have it still in one location and the consumers can access it.
Prasad:
Yes, any questions?
Kunal:
Yes, plenty of questions I'll put all of them in the screen now and I'll be I'll be showing those I've collected some questions folks have asked
Prasad:
Alright. You asked a question: what monitoring and framework we have from KubeSlice. So here I made a demonstration for different systems. I created a slice called Kafka slice which is on AWS, GCP and Azure. So if you look at it, it tells you the help of how it is communicating across. It gives you where the gateways are and it tells you how the services are imported, and which services are there, and how many clusters are there, and how many namespaces are imported into it. Then it also gives you the overall dashboard. From a metric standpoint, how is it performing from a latency standpoint and then how are the resources utilized across different clusters? You can drill down per cluster basis you can have for namespace spaces and you can do it for three hours five hours, different ways of consuming the data. How many nodes are utilized? What is the CPU utilization for that? Then we also have a single pane of glass from a standpoint of how clusters are going in for each and every cluster standpoint and then get the help and different visibility aspect of Kafka clusters. So this is more infrastructure centric and what Kafka monitoring which you guys do or something beyond the scope of KubeSlice.
What is the significance of replication in Kafka? Kafka is an open source project. Like kubernetes, everybody is focused on a single cluster, Kafka has the same problem. One thing I wanted to share, when you replicate, most importantly, if you look at Kafka, messaging has an offset, where you are reading from and then what messages have you read. The most important thing is how do you know which message - so when you copy from all the files from one cluster to another cluster you can always copy but you know the maintenance of every message has an index as to where the what is the time frame which the index came from. It's a FIFA, first in first out kind of queue; where are the things here, some of those things are lost so then confluent and other companies have come up with different ways of addressing these. If you take an open source, there are deficiencies but if you take managed services from different people, it gives you that ability to maintain all those you know indexing and other things. The question is you will end up doing copies in some sense based on the application but to keep that consistency you need to have additional tools to make sure that the consistency is maintained.
Sometimes Kafka producers and our conscious consumer style KubeSlice handle this issue. KubeSlice is an infrastructure component giving the ability for brokers to communicate. So first thing I would say is that we know what we do best if it is a problem in the application, which is stalling or if it is an index problem you need to fix it in the Kafka sense but we give you the ability to you know communicate so that there are no glitches so that you would have less of these stalling issues. At the end of the day if there are problems inside the applications we cannot protect that.
What are the key considerations when selecting multiple cloud providers for deploying Kafka brokers. Thus KubeSlice helps manage these considerations, the advantage with KubeSlice is you need to figure out what cloud provider it is, we enable that. Think about cloud providers in many respects, when I was running a large SAAS platform, we would have something called an enterprise discount plan. How I consume is one important factor. Nowadays, parity in cloud is almost the cost. Everybody has some managed services and everything seems to be easy in many ways, but cost is a factor, so when you decide based on your cost factors you might say hey some specialized workloads- let's say if you go to Oracle, because I use a lot of Oracle, I want to use Oracle compute versus Google. When you are specialized in machine learning, kind of an application which is out there, Google may more generically compute. I might use AWS and if they have more Microsoft Centric applications I might use Azure. So if you have that kind of situation, we enable KubeSlice as a virtual cluster across all these multi-cloud environments; so we are agnostic about the distribution and we are agnostic about the clouds so that way you focus on your business logic to figure out which cloud you want to have.
Olyvia:
There can be scenarios where you have a Kafka ecosystem in one cloud for one set of applications and another like you mentioned for AI related workloads in gcp or maybe others in Oracle and they will communicate with each other?
Prasad:
So one use case which we see more of is IOT devices which are out there at the edge. They are producing or consuming data and the Kafka is running in a public cloud and the edge is a managed edge by different providers. That is one use case where we see- how do we make sure that consumers and producers can talk to the broker seamlessly so that's where KubeSlice makes a lot of things.
Olyvia:
Excellent, that's a clear multi-cloud scenario.
Prasad:
What strategies can be employed to ensure high availability and fault tolerance when running Kafka brokers across multi-cloud. So there are different patterns, one is you can do a stretch cluster: where you can have distribution of brokers in different places and then have brokers subscribe to the same topic so that you can have the topic in multiple places. Or you can kind of eventually have a consistent thing where you can say broker number one has the topic and broker number two is the replica of the broker number one. And so when you write a producer can do two different things; one is you can say I write it you take care of the replication or I write it and then make sure when you complete the replication let me know so that I can proceed further. So there are different mechanisms which Kafka enables you to have the absolute availability you wanted and fault tolerance you would want. KubeSlice gives you that ability to connect all those brokers together so that you can reduce the time it takes to actually respond in between these transactions so that's an advantage with KubeSlice.
The question is: are there similarities between string Z and KubeSlice in terms of Kafka cluster management? We are not Kafka cluster management, we are kubernetes cluster management. Where you deploy Kafka on top of a construct called slice which gives you a virtual kubernetes cluster so that you can deploy a Kafka cluster on top.
Yes, Apache Zookeeper is the most commonly used configuration management for Kafka clusters.
How does data replication synchronization between Kafka brokers in different cloud environments, are there performance implications to consider? It's an important question. Kubernetes, we give virtual clusters across multiple kubernetes distributions be it in clouds or be it in on-prem or this thing on public cloud, I'll stop there. Let me answer what the performance implications are. Performance implications are twofold, there is synchronous replication which happens across blockers and there is asynchronous replication which happens across brokers obviously if a cluster is in U.S East and U.S West. The speed of light for the transport of data there is 60 milliseconds or 70 milliseconds. Which it needs for data to move from the U.S east to 70 Us West because of the distance itself. The fiber technology needs the bits to flow through from one area to another area; that's the distance itself. If you take a flight from Delhi to Kanyakumari, that's the distance which you need to travel for the data to flow. That's the physical limitation of transporting data from one area to another area. Now if your application is not tolerant for that kind of latency you do a synchronous replication, whereas if your application consumes, saying hey, one minute I don't care for me to consume when somebody produce something within a minute I can consume it, then you can use synchronous replication and then you can literally consume it. We don't have to have a synchronous replication, so that's the kind of thing you need to consider when you are designing the system.
-If you're able to seamlessly deploy a Kafka cluster on top of the Kubernetes cluster, this is truly a game changer. Okay, thank you.
What are the protest implications of deploying kubernetes brokers in multi-cloud, are there strategies to optimize cost while maintaining performance and reliability? At the end of the day, how many brokers do you want to have? How many replicas do you want to have? The most important thing, a broker dimension is based on how many topics, how much data you are consuming, and how many partitions you need to have. It's not a straightforward answer, it is dependent on your data behavior. If you have lots of topics and if you have lots of practitioners, you want to maintain consistency across all the clusters. Then you have to have a weighted sizing of each of the brokers so that defines your strategy. Importantly, cost implications are what's your instant size. How big is your instance and how much is your attention? How many topics are you going to save for how many days, or is it a day? Those are all the factors which go into defining the cost implications.
Olyvia:
In the context of shared infrastructure, in that multi-tenancy slide, you had different tenants that could also save certain costs.
Prasad:
So the advantage for KubeSlice is the fact that you're creating a virtual cluster and sharing the cluster across multiple physical clusters. You can use the cluster for something else and then provide the isolation, which you'll need it, and protection you needed from that standpoint so then you don't have to create a special cluster physical kubernetes cluster for Kafka itself and then other workloads for another cluster so by doing this virtual cluster across physical clusters you reduce the cost as well,
So as cloud technologies and Kafka itself evolves, how does KubeSlice adapt features and they change to cloud providers offering. We are constantly evolving that the life currently there is so much technology coming in and there are different use cases, we are always looking for where we can give value from an innovation standpoint, and when we talk about distribution of workload we have an RL way of a reinforcement learning way of deploying where the workload needs to be so we are constantly evolving to figure out what's the best way from a machine learning standpoint, understand the behavior of an application and then scale it based on that. So there's a if you go to our website avesha.io there is a an articles around something called smart scalar you would understand it is all about you know you guys have heard chat gtp which is PPO, proximal policy optimization standpoint, which is what reinforcement learning is based on. So we understand the application behavior and then correspondingly scale applications from that standpoint so that's where we are binnovating more from a standpoint of how we introduce reinforcement learning into application behaviors.
Kunal:
We have a lot more questions, but we're about time, almost an hour and being cautious of everyone's time here as well, but everyone who registered I will email you the slacking guide so you can join there and keep asking your questions. I know we can be here for hours, people are asking so many nice questions. You can join the slack channel kubernetes channel. In there you can find the KubeSlice Channel and you can find Prasad and Olyvia there as well and I'm there as well. I'll share all the resources in the follow-up emails to everyone and thanks a lot for joining and for the nice questions. I'll also leave the links to all the resources in the description below. If you want to learn more about KubeSlice I made a video recently so you can check that out. I also did a blog post and I also did a case study, I'll leave it in the description below you can check it out.
Prasad:
Thanks Kunal you have a great following and really appreciate your effort.
Kunal:
Alright, thank you. Thanks everyone, thanks Prasad and Olyvia for joining.
Olyvia:
Thank you. Always appreciate it.
Prasad:
Thank you.
Building Distributed MongoDB Deployments Across Multi-Cluster/Multi-Cloud Environments with KubeSlice
Copied