Counting Sand

Kafka Event Streaming Part 1

Episode Summary

On the cutting edge of modern architectures lies event sourcing and streaming. But how do you do it right? Angelo is joined by fellow Kafka expert, Anna McDonald, the Principal Success Technical Architect at Confluent.

Episode Notes

Is Kafka a one-size-fits-all solution? Or does this event sourcing software have an inherent set of strengths? Join Angelo and Kafka guru Anna McDonald as they share use cases and swap stories about how Kafka has radically changed the field of computer science.

 

In a time crunch? Check out the time stamps below:

[00:54] - How did Kafka change the world?

[04:40] - What is so great about big data technology?

[07:00] - Outbox pattern 101

[10:45] - clinical decision support use case

[13:05] - Should I build it or buy it?

[17:05] - Is Kafka a one-size-fits-all for businesses?

[21:55] - Kafka tuning 101

[25:53] - A.I. for Kafka tuning

 

Helpful links:

https://www.confluent.io/

https://www.youtube.com/channel/UC37UjjtsxpZWS_0QGPKEHdA

 

Our Team:

Host:Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: On the cutting edge of modern architectures lies event sourcing and streaming. But how do you do it right? Today I'm joined by fellow Kafka expert, actually, she's a guru, Anna McDonald, the Principal Success Technical Architect at Confluent. And the friend of mine, I might add. I'm your host, Angelo Kastroulis and this is Counting Sand. So we've known each other for a while. We met back at the beginning of COVID when it didn't matter where you went to a meet up. And the first meetup I think we went to was in, well, hosted in Australia. Anna: Yes, it was in Australia. Angelo: Yeah. The Perth meetup. We’ve known each other mainly through Kafka, our different angles in it. I would love to know your thoughts on, what did Kafka bring to the world that is new and novel that wasn't there before. Anna: Yeah. So for me, the primary thing about Kafka and the reason why I actually poked my head up and was like, wait, is this a messaging system that I don't think is trash? Because I've never been a fan of messaging ever, ever, ever. And it's because it's a durable log. It was because the data didn't go anywhere. And you didn't have this whole broadcast pattern where you have to set up individual cues and there's all of this orchestration overhead, just yada yada, yada, yada. As soon as I saw that with Kafka, I was like, okay, this is money. That, and then you add on the volume, the sheer volume that Kafka can take at once. And as you know, Angelo, my passion is for event streaming, event patterns. And that opens up a whole new world when you talk about legacy, you know, just all of these events that are trapped in legacy databases. So, I saw that as kind of a vehicle for like, wow, we can actually use CDC because there’s, you know, this beautiful, amazing thing that can take that volume and it doesn't get rid of it when someone consumes the message. It was a perfect storm. And so I fell in love with Kafka in like 2017 and I've never looked back. Angelo: Nice. I think about Kafka and what it brought for me was a new way of thinking into the world. I mean, just like you said, there's been events for a really long time, but thinking of the problem and reframing it in a different way was one of the major things that it brought. And the other lesson, I think that learned from Kafka, is that it's a collection of ideas and technology that have been around, you know, like balloon filters have been around and so has partitioning and so has replication. That’s all been around, but putting it together, this idea of an immutable log that is distributed and if you could live with that and not query it, you can get really high throughput. And then you kind of put the query on another system that's designed for that. That I think one of the big innovations that Kafka kind of brought. So when I think about streaming, we kind of put this label on it as streaming. How would you define streaming? Anna: Yeah. So I would define streaming, and this to be honest, you've hit upon it. So it cracks me up because when I first introduced this concept, I was like, you know, look, this is real time. The people that had the hardest, kind of mental model shift with it, were people who had been programming in Java and actually for us, where I came from, those were the people that got to do the new shiny things. They had the biggest problem with a mental mind shift. They were like, no, wait, what? I'm like, there is no end. So to me when we talk about streaming, that sums it up in one word, there is no end. There is no certainty. This is not the state you're looking for, right? And you have to be comfortable with that when you code, when you develop, when you think about these things. It's continuous. You can't stop to catch your breath. And I think there's a lot of people who have spent a lot of years doing objectoriented development, where they're used to stopping and catching their breath. That doesn't work. And I have a functional brain and so I took to it and Kafka streams is, as you know, boom shakalaka, is the love of my life. It's very fluent and beautiful. And I actually find legacy developers by and large who haven't dove all in. And when I say legacy, I mean like mainframe, you know, SAS shout out, like people who are used to functional programming, actually just comes more naturally to them to do streaming. Angelo: Yeah, it makes sense. I mean, streaming is a part of the Lambda calculus. That idea that you can compute on an infinite end is very, very interesting. In fact, it kind of brings me to one of the things that I really love about all of the big data technologies, this isn't like a Kafka only thing. And that is that we move the computation to the data. That's the state-of-the-art. We don't move the data around anymore. We don't shuffle it and say, well, that doesn't mean we don't have best of breed systems. That makes sense. You have a search technology, you put search data there and you run searches against it. It's faceted. That's what it does. But if you want to run a computation on the data in the stream, put the computation in the stream. That’s different than saying, well, let's build a bunch of micro-services and they have separations of concerns, right? Anna: Yeah, but I would say I would caution us not to say either or in that, because, you know, building compounded streaming aggregates and stuff, you can do both of those. So that's what I always try to say to people who are objectoriented and are like, I don't understand. I'm like, but there's a happy medium. And you just have to kind of change your thinking. Right? And what I mean by that is, you know, what we see a lot of, and I think you've hit upon this perfectly. I always say, get the data into Kafka first. Don't give a crap, just get it into Kafka. Once it's in Kafka, then you have all these possibilities, right? Because again, you're right. Like your computation, all your transformations, all of those things are done after the fact they're right after that initial ingestion, it makes it so much faster. And because the way that Kafka works, right? You can have multiple consumer groups, you can have multiple services, all of this stuff running against one durable log, that overhead that would be there for not, you know, trying to do this pattern in a traditional pubsub, it's not there. It's not like two copies. Right. It's amazing. And I think it leaves open a lot of things, but you know, you look at something like the outbox pattern, right? That's kind of an example I think of marrying kind of traditional, aggregate calculations or saying, look, I've got a sock. I've got like all these things that have to be done. How do I marry those two worlds where at some point I need to take a breath. And I like to say, basically, a commit is the breath of a database, the transaction, right? That's my breath. When my transaction's done, I need to go, ah, it's done. How do I take that feeling with something that's never done, right? There's no end in sight. And I think there's a lot of room in Kafka and we're coming up with even better patterns now to help those, like routing slip patterns, all of these really exciting things and inventing that allow you to do those infinite streams, but still be able to know when it's okay to take a little breath. Angelo: Yeah, that's great. Some of the listeners might not know exactly what you mean when you say the outbox pattern. Can you describe it briefly? Anna: So it basically like, and it came about a lot with CDC and so I hope I don't butcher this, because I'm more of the non-traditionalist, but the outback's pattern is very important for traditional things and basically what you ended up doing is creating, you know, you let your database, you write down. One of the things you do is do triggers, right? So when everything's there, you actually going to trigger events out and that's kind of how you do your kind of domain safety, right? Like if you're doing some kind of data driven development. So you let the database do the work, set up those triggers to throw events when a commit is done across multiple tables. Because I think that's what people worry about. So I've done this thing in an upstream system. That thing, when it's done, it has a commit for like five tables. If we use just base CDC, that's going to be asynchronously five different, you know, messages flowing. One of the things that people do, and you can actually assemble them once they're into Kafka. Or the other thing you can do is actually wait until that stuff is done in the database and then trigger an event and say hey, everything we expected to be done is done. Now trigger an event. And so I think it's using another system at its heart. I probably butchered that, Gooner will kill me. I love, yeah, he, yeah. But, but Angelo: No, what do you love? Anna: Yeah, I was just going to say even more so what I'm passionate about is, I think one of the most difficult problems is when you have steps in order from these huge legacy processes. The routing slip pattern is like my new favorite thing. I tweeted that the other day, I am loving it. Angelo: Tell me something about it. Anna: Okay, so you take like, a vertical insurance, right? Insurance, by the way, it's very, very difficult for me to advise people to use actual event sourcing because it's an insane amount of overhead. Insurance is like a prime case for event sourcing. When you look at the way insurance policies are structured, you have to go back in time and then play everything forward, right? With kind of a new set of rules. Maybe it's a boat. Maybe it's a house. So it's actual time travel and there is nothing that gives you that like event sourcing. So that's kind of why I really liked that vertical because it's so complex. Whether or not it should be in your opinion on the insurance industry, that's not relevant here. Technically, it’s fascinating. So that's why I like it. I'm sure there are ways to disrupt that industry, but it is fascinating to me so I enjoy it cause it's a hard problem. So routing slip basically says, I have a, you know, depending on maybe what state I'm in, what type of policy, right? What are the regulations? I might have to call in a defined order. Ten different steps in order and I can't progress until the next one's done. How do I do that in Kafka? How do I do that in real time? How does that work? And usually people look at me like this, because they get very scared. And the routing slip pattern is actually really cool for that. And what it does is it basically embeds that whole idea of a routing slip inside of a message. So it says hey, here's the steps you need to take, here's the topics, right? That you need to go, and that's generated at the time the initial event is kicked off and as these steps finish, right, they're checked off the list. So, you know what current step you're on, you know, where your failure point is, if you need to restart, if you need to use some of our patterns like basically re-ingest topics, right? Redo topics. All of that stuff comes naturally with the routing slip pattern, and it allows you to have these very complex orchestrations in real time. And I love it. I'm a huge fan of it. It's event carried state transfer, like for routing things. So it's pretty awesome. Angelo: That's interesting. We've done something like that. I wouldn't say it's called that, but there's a use case in healthcare actually where, you may want to run some kind of clinical decision support rule. Now the event pattern makes sense because you would run it once, run the computation on it. But you also want to know why did you make that decision. That may be a whole different technology, right? Or a different streaming application that then runs something similar orthogonally on the same data from the beginning of the log. And it says, oh, it's because of this, you know, I got this condition or this medication that came in, that's what triggered this rule. So that's interesting. But, thinking about this receipt pattern, there is another side of clinical decision support rules, and that is, some rules are temporal. So by the time you look at them it makes it a challenge for pre computing. So if I could put data in a stream and then run a rule, the problem is the rule might be in the last 30 days. I can compute it at the time it comes in, but I can't compute it at the time someone looks at it in a downstream stream. So when they look at it, it's almost like quantum computing, right? You open the box and it looks different. So the temporal aspect then, would fit. Anna: Yeah. You know what you do for that? I would do a pointer pattern. Anything that is like that shaky as to changes like month by month as what should I do? Do a pointer pattern in your routing slip and then go look it up at compute time. You’d have to. Angelo: Yes, that's exactly what we did. Anna: I love it. I love that. Angelo: So we computed the parts we could. Yeah, we punted the temporal aspect, put it in a location, a point of where it is and then later on you pull it out. The interesting things about it is for example, with Drools, you know if you want to run a rules engine, Drools, if you Google this there's like 40,000 papers on this. It's the memory explosion problem. The more stuff you put in it, the more memory it needs until there's not enough memory even possible in the universe to fill a box. So you have to like shard your rules or something. We built a super fast computation engine, but what is even more interesting is you put that on stream. And now when someone looks at it later, it's a simple computation. So it's ultra fast. It's like microseconds fast. Because you did the heavy lifting early. So that's really cool. That's one of the things I like about these kinds of patterns, so that's pretty awesome. One of the things I think that we run into a lot and I'm sure you do too, should I build it myself or should I buy something, kind of concern, you know, where they might say, well, they're still looking for something off the shelf to pull in, and that applies to patterns too, right? They might be looking at just shoehorning something into the wrong pattern, just because this is what we have and this is what we do, but it takes a bit of courage to do something different. Do you have any thoughts on that? Anna: I do. Yeah. So, I think one of the things that I don't like, and I've never liked in development practices, and I don't know why people still do this. But people tend to assume, right? We are going to do this one thing now and it is going to have to be so flexible and it is going to have to be so generic, and by the time they're halfway done architecting it the whole thing has changed anyway, but not in a way that that would have covered at all. What they could have done is just wrote something that was easy to maintain on the premise that they're going to have to change it. And so my thing is, what I always tell people and my customers is, I say, you know, get your feet wet first. Just like you don't want to remodel a house before you live in it. If you're uncomfortable with this architecture, right? Why buy a full blown solution that might do what you need it to, it might not do what you need it to. Instead start small. I think at some point, if it's an entire, like don't build your own Pega. I think, you know, something like a Pega or something that's a domain specific entity, if that's your bread and butter, go for it. Connect is another good one, right? You certainly could write a custom, you know, consumer and producer micro-service that would do some of the stuff Connect does, but why, you know, there's a lot of value in getting that support from the community, getting those examples. So I think you have to have a really, really good idea of what you're trying to achieve, what the lifespan of it is. And also do your research. One of the things I love is researching new libraries. Just because something is the shiniest or has the best webpage has absolutely no bearing on how good it is. I just want to call a little shout out. There's a group they’re called Back Data, they have a great GitHub repo. If you've never seen it, they create some of the most useful utilities I've ever seen and I adore them. They have a fluent test library for Kafka streams, for example, that's beautiful and wonderful. They also have one of the best examples of the claim check pattern, where they put it in a serializer deserializer format. And you just point to an S3 bucket so you don't carry around 54 MB messages. These are small GitHub repos. They're brilliant, right? There's no huge marketing, you know, or if they have one I haven't seen it, maybe I just spend too much time on GitHub. But, you know, I tell people about them all the time because they're fantastic. And the way they're coded is really solid too. I'm a huge fan. So I would just say, give people the time to do research, right? Let your people find the best fit for you. Angelo: I couldn't agree more. I think that one of the things that we do is we bring research-driven development to the table. You know, go read the papers. There's a paper written on everything and there's always this new knowledge that comes in. Go read the new knowledge, implement it, and then you can kind of bring in some new stuff. And I do agree, you don’t want to just haphazardly implement everything or rewrite everything or have tons and tons of code. Code is a liability. The more code you have, really the more danger you're in. So you want to try to keep factoring away the code to get the smallest footprint possible, but there are some restrictions you should think about. If what you're doing is too slow, well then you have to build it. I mean, that's a requirement that you need performance and if you can’t get it, don't keep forcing this particular pattern or this particular technology. Do something different. Anna: Absolutely. I would say we hit it off because we both love research. I think that's probably why we got along from day one. Angelo: Yeah. Yeah. I think you even introduced me to some folks on the steering committee at Kafka and said, hey, you know, he's doing some cool stuff with KStreams, you guys should get together. So I agree. I think that's the really interesting piece. And then you mentioned earlier about doing proof of concepts, you know, doing a little POCs, doing small things to get wins. And then you start getting these wins and they compound, I also agree if you go crazy, kind of just saying, well, let's modernize everything and everything is going to be this event-driven micro-service architecture. I have to always say, everything? I mean, do you really think that all of your business fits into one thing? Not often, right. I mean, that's kind of rare that that one solution is going to be the magic bullet. Anna: I think that leads to failure, absolutely 100% of the time because guess what? You still have business to run, right? While you're trying to do this, you're gonna shut it down and cut over everything at once. No, what you need to do is figure out what is our highest value use case that's doable and achievable. And how do we implement that in such a way, if possible, that makes future use cases easier. That's why I always say, go big or go home. Maybe right now, you only have a use case for this small subset of data that's coming out of your database. Hook it all up, you gotta do it anyway. And then the next time when you're ready for something else, that data is already there. Like you said, build confidence and then got forth. Angelo: And that's the other thing I think about, the world we live in now with machine learning and artificial intelligence. One thing we've learned is capture everything and we'll figure out how to analyze it later. And I think that if we have a pattern like that, where you just capturing the data, you can replay these messages again with a new and updated consumer or stream application with a new and updated schema, replay them onto some new stream and guess what? You have all this historical data because you were just throwing away and ignoring it. That's okay. It was still embedded in the source of truth. And so that kind of pattern, I wanted to ask you about that because, in the old days, meaning not even a decade ago, the state-of-the-art was this Lambda architecture where you have a batch layer and a streaming layer and Kafka comes on the scene and it makes total sense. It fits in a streaming layer. But now what's preventing us, I mean, these are realizations you're making, what's preventing us from just putting everything in there, treating it as a source of truth and then making it very long lived? Anna: Yeah. And so I think the huge thing for me, until we had tiered storage, it was a performance concern. You could not colocate real time and batch data where someone would come in and read from the beginning because you’d blow out your page cache, dump your SLA. When tiered storage was implemented, it's a network read. So it's outside of page cache. So you have all your happy, you know, real time data stuff going, right. And then you can also pull that historical data without dumping anything, right. Without like ruining your SLA for those cases. I think that to me was huge. And also, again, just being able to store that quantity of data without having to store it on expensive disks, because money is a factor here. And so now that you can offload that stuff to something that's economically feasible and it's not going to blow out your SLA, there's nothing stopping you. And in fact, I would wager that since Kafka is one of the easiest ways to get data from A to B, I don't see a reason why not. Not that it's a great fit for every single person's infrastructure today. Some people do use it as just a data distribution. But I think with, you know, the things that I've mentioned, there is absolutely no reason for it not to be the total source of truth. And it's like you said, Angelo, like, I don't want people to think. I say, Kafka's a silver bullet for everything. It is not. Absolutely not. You let things do what they're best at. So like, if you need a super fast cache for serving up an amazing burst of, you know, 50 or 70,000. No, no, not Kafka. No right. You're going to put that in where it belongs in a cache. But in this case, I think Kafka is the perfect place for that kind of thing. Angelo: Yeah. I think the way that I see it, these technologies have their own strengths. They're really good at certain things and you use them for what they're good at, right? I mean, it's a no brainer. If you're going to do faceted search, use a search. If you're going to do caching, you use a cache technology. If you're going to do ad-hoc queries, you use a technology that’s for that, all that makes sense. But what's interesting to me is that the use case for Kafka is widening, because it's capabilities, like you said, tiered storage, just widened the use case. Again, it's not the right brush for every painting, but, it's able to do more things that you would have normally said, well, since we want to keep it small, let's set a small duration of how much this log is responsible for. And we'll offload that in some long-term storage. Now you can look at it and say, well, I don't need that other piece. I'm not changing the use case really, a bit. I'm saying I'm going to push some of that into this now. And so there's less moving parts, less liability, you know, it all kind of became a little easier to put all that together. The one thing that, and this is where a lot of my passion is, is in tuning these things. So Kafka has a lot of knobs, right? Do you have any thoughts on that? Can you talk a little bit about tuning Kafka and all of the knobs we have to turn. Anna: Yeah, well, you and I have talked about this at length, right? I think there's not a person alive who has spent any sort of significant time tuning Kafka who thinks, again, that it's like, yeah, we're done now. We'll never touch it again. And that's the challenge. I mean, just like anything else, your profile changes. Multi-tenancy throws a huge wrench in there. Colocating use cases. How many partitions do I have? What is it? You know, there's just so many things that, you know, you need to tune differently depending on your cluster load that I think this is another case where you really need some intelligence there. Right? You really need to kind of look at that. And there are things that do that now. There are algorithmic things in Kafka that will tune various things, right? Like, SPC or what's the other one? The LinkedIn one that we have. Cruise Control. There it is. I knew it was there. So there's these things that we'll do those types of things, but they're very much in their infancy, right? They are not on an application level. They're not on an actual broker setting level. And I think that's kind of where we need to go in the future. And I know that's where we need to go because I see a ton of people who do things that are currently, I always go like this. Let's not increase and decrease partitions on the fly. There's gotta be a better way. First of all, it doesn't work if you care about ordering keys. That's not a way to scale for a good majority of use cases. Second of all right now, currently you pay a big penalty because you need a metadata refresh to both produce and consume, right? So if you're looking to scale up in an instant or scale down in an instant, you’re not getting it. So it's not a perfect solution. That to me comes down to client modes and tuning. Let's make the clients themselves respond to that throughput increase, right? As opposed to putting it off on a broker. Angelo: Yeah, no, I'm glad you mentioned that. There are lots of things like that. I mean, that's our first inclination to say, well, let's tune the partitions cause that's the unit of scale. So let's see if we could tune that. But the reality is there's a lot more to tune than that. I mean, how many brokers should you have and how does a producer interact with it and how does a consumer interact with it? And in fact, the issue that the Apache foundation chatted with me about, the one that you got us in touch with, was the idea of how do you tune the workloads in KStreams? Anna: Absolutely. How do you have the ideal task workload? Angelo: I thought that was easier than what I thought. And I've been spending a year now researching it. It’s really hard. I think there's two reasons that make it really hard. The idea of stickiness is good. Having a little bit of stickiness to your application so that it has some affinity to where the data is because you might just be bouncing the box or something, but in reality, it's much more complicated than that. You have to be aware of where the state is in a cluster. And then you have to have an idea of what that state feels like. Not just like, oh, it's just there’s, these keys are over here, no, it's not that. Anna: It's not binary. Angelo: Well, how much is over on that box? Right? And then here's another thought. What are you doing with it? KStreams applications are extremely complicated because you don't know what they're doing with it. Yes, you can have some idea of the typology, but you have no idea of how computationally heavy any of the work that's being done in there. And you might say, well I'm going to bring this one online and assign it this work. But unlike something like Spark, which knows exactly how the computation is, you don't. And so you bring up a KStreams application, a streaming app, and you say, oh, it turns out that's a light one. I should've put it on the light box. This is a heavy one. I should have put it on a more powerful machine and I could have tuned that maybe. So these are really hard problems. And so when you think about the knobs, there's two approaches, I think. One is, to try to find all the little settings and try every combination and just get some experience and then go by your rules of thumb and then test. The other one is, what if we algorithmically replaced that whole thing. Because the knob is just some abstraction of some algorithm underneath. That feels like a really good place for AI. What do you think about that? Anna: I mean, first of all, I 100% agree and I will say that just like streaming makes sense to me because it's never ending and infinite. Tuning, in this way, makes sense to me because it's never ending and infinite. There's no state, there's no boring, this is the way it is, this is the way it's always going to be. It's always changing, which I think is why I'm so happy as a person in my work, because it's never going to be anything stable that you can sit on. You can never take a breath. You don't know what's going on and you have to plan for anything in a way that is optimal. And those are so interesting. Where you have pretty much almost no assurances can make no assumptions and you still have to go in there and kick butt. That to me is my favorite type of problem space. Well, we can make it work if this, this, and this. Why don't we make it work no matter what this, this, and this is. Let's do that one. I like that one. And that's the second part of what you're talking about. That's the part where we go in and we go, as long as our, you know, assumptions about this, this…we’re not going to do that. We're not going to assume anything. Right? We're going to go in and we're going to start. And I'm a math major, you know that too. So like obviously anything algorithmic is like, yay, happy place. But I do think we need to do that. And in order to do that, we need to make sure that, you know, as an Apache, like contributor, as like people who are working in these projects, that we have a commitment to observability. We need to be able to observe what's going on in order to be able to determine once you had those algorithmic things, right? What to do, and what's the benefit. And we have done that in Kafka Streams. There's been a lot more metrics that have gone in, which awesome sauce. But you've got to have observability. Otherwise this type of tuning will never work because what's going on is never the same. So you've got to be able to observe it at every point in time. Angelo: And I like that you can influence it. Now that's not for everybody, but you could write your own kinds of components that balance things, right? Like that's how the sticky, you know, affinity kind of came about. You can create your own if you happen to know more about your setup, for example, certain kinds of data, like for example, healthcare data, you know things about this data that no system could know, because certain things always come in pairs or certain things always kind of are related. You could build into that since you know a little bit about it. You can build your own things. Now again, that's not for everybody, because that's pretty advanced level of knowledge of the inner workings of the pieces. But, that to me is where AI can really play a piece because if you say, look, AI is really good at taking big, complicated things. And if we capture a lot of metrics about it, it can start seeing patterns. Like you change the data. That's something nobody ever does. You said this earlier about tuning Kafka and it's a constant job. It is, because your data changes the rate of your data coming in changes. You should rethink things a little bit, but nobody really does that because we're too busy. It's like the old days with SQL based databases. Right? You go, okay, we'll put an index on it. And then you don't think about it again until the latency has dropped so far out you don't meet your SLA anymore. Anna: I think that it's very exciting, right? And just full disclosure, this is my second job of all time. I worked at SAS for 16 years and I loved it and it was great there. It was just that I needed a change. And one of the reasons why I went to Confluent, is because they never rest. It's never good enough. And the first things that we see where it is this intelligent algorithmic, it's ops level, right? No one's doing apps yet, but in terms of brokers, you know, self-balancing clusters. It’s algorithmically looking at this and doing exactly what we're talking about. It's in its infancy, but it's there. Right? And we run our entire cloud on it. So it's in production. Very stable. We run everything on it and I'm excited because I think you and I, and not that I don't do ops now because I do, which I used to have to cheat to do ops because I was, you know, principal software developer, and now everyone's like, yeah, do it. So I've had fun doing ops, but my heart is with event streaming and applications and Kafka Streams. And I'm excited for us to try to look at this stuff, especially for Apache projects and things that will help people. An example is like, you know, I always feel good about open source projects that are used for good reasons. They used one to help small loans in Africa. They used a Kafka to do that. That was the only thing that worked at scale for them. I think these types of things where you don't have to be an expert and it'll tune automatically and make it work for a use case, that's going to help people who don't have the time and resources. Those people that we really do want to help. So I think there's a lot of passion I have for things that make it easier for people as well. Angelo: I think this is one of the reasons why we've been such good friends. We think so alike. I mean, I also have this same passion for systems, right? Where they tune themselves. I mean, back when I was working in the Harvard data systems lab, we studied auto-adjusting databases that can change and arrange themselves. Now that's even come a long way from just the stuff I studied. I studied joins and scans. I mean, there's so much more to it. There’s arrangement of data and all kinds of things, but the other side of it is, okay, why are you doing this? It is neat to solve the engineering problem, but ultimately there's a whole lot of good stuff you can do for the world to kind of put together these technologies. Anna thanks so much for joining us. This is only part one of our conversation. We’ll have part two next time. I’m your host Angelo Kastroulis, and this has been Counting Sand. Before you go, please take a moment to like, review, and subscribe to the podcast. We're also going to launch the YouTube channel we mentioned so keep an eye out for that. You can follow us on social media. You can find me at LinkedIn at AngeloK1. Feel free to reach out to me or start a conversation. You can also follow me on Twitter. My handle is AngeloKastr, or you can follow my company Ballista Group. You can also find the show on countingsandshow.com or your favorite podcast platform.