Counting Sand

Kafka Event Streaming Part 2

Episode Summary

Angelo welcomes back a friend and fellow Kafka enthusiast, Anna McDonald. Join them as they continue their discussion around Kafka event streaming, introduce a new pattern (CQRS), and discuss what they look for when hiring the right development team.

Episode Notes

Our Team:

Host:Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: Today, I'm happy to welcome back a friend of mine and fellow Kafka enthusiast, Anna MacDonald, who was on the previous episode. We're going to talk a bit more about Kafka. How we hire people and a pattern you should know about called CQRS. I'm your host Angelo Kastroulis and this is Counting Sand. I was mentioning, I do a lot in health care. It just is a natural place where we can do a ton of good. So, you know, thinking about that, it's one theme that I have had on my mind for a really long time. And that is how do we entice new people to enter into computer science? Do you have any thoughts on that? Anna: Yeah, I mean, I think we had a discussion about this, right? To me, when I look at my career and when I look at the demographics, it wasn't a pipeline problem, right? It’s that people were jerks. And they were like, hey, look, I could just get paid the same and not have to deal with a bunch of jerks. So they left. And not to say that everyone's a jerk, my best friends in the world are in computers. There are jerks everywhere it just seems like in computers, we are more permissive with allowing people, now it's getting better and I've never been permissive with that as anybody that you know will tell you. But I think it's more about making sure that the people that do come in. Because I mean, there's tons of, you know, women who are computer scientists. There are amazing, you know, computer scientists that are of all races, archetypes, like, you know everything you'd want, right? The point is to make whatever industry you're in, a place where it's not only welcoming, and this is a conversation I have a lot, but it lets people succeed. Don't just hire somebody, you promote them. I want to see personally, diversity at every single level of a company, every single level. And so I personally don't believe it's a pipeline problem at all in any way, shape, or form. I worked with a lot of at-risk kids in college and did mentoring, which is a great program. It's called Gear Up. Those kids are brilliant. Any one of them would have been an awesome hire. You know, it isn't to me like how do you get people interested? It's how do you support people in a way that's equitable and make sure that they get theirs. And when I say that, I mean, they get promoted, they get bonuses, they rise to all levels and feel like this is a career and this is a job where I can, this is, I can feel good about it. You know, and not to say we shouldn't do outreach. We should always do outreach. I think that's fun. I mean, it's just a fun, good thing to do. But I think doing that and doing nothing else will never work. So that's my opinion on that. Angelo: I appreciate your opinion. It makes a lot of sense. I think that it's similar to mine. My thought is that. It's hard for someone in this industry to see and feel like they matter, like the things that they're doing have an impact or whatever. To see value in it. And it's hard for them then to be valued. If you go into certain kinds of companies and they're, you know, they kind of treat you like a number, you kind of see that and then you exit. Like you said, you know, I'm not being treated right so you leave it. And it isn't all like that. I think we have to do a better job of making the things that are soul healing in technology, a little bit more of the goal and the other stuff kind of a little bit less of a goal. And that is like having a great place to work. It's great. Everybody says that, but not all places are like that. Having a difference that they make in little things, kind of getting back to this idea of computer science. I think computer science has the capability to change our lives. It is not a magic bullet. I know that and we shouldn't rely too much on technology because if you rely too much on tech, you know, like for example, in clinical decision support, I don't want the technology making the decisions. I want it to enable physicians to make a decision. Do you have any thoughts on how computer science can change our lives or the ways that we live? Anna: Yeah, I mean, in terrifying ways. There are some really good ones but there are so many bad ones. This is way outside my wheelhouse. So I am not very up on legislation. I follow a lot of people on Twitter who are, and I'm trying to learn more about it. But in my heart, right? I know so many times where computers have helped people. And this is going to sound, probably not like a ground shattering use case, but in terms of isolation for elderly people. So, it can get very lonely and I've set up many computers for elderly people. Very simple Linux computers, right? Solitaire is a huge hit. But just, you know, email and very easy ways to connect with your family and it's been huge, like. You know, technology has kept people together during the pandemic. You know, my kids couldn't see their grandparents for like eight months. I don't know what we would've done without FaceTime and Zoom. So I think that there are some wonderful uses of technology. There's this, and I follow her on Twitter, but we should put this in the show notes, her name's Eve, she's amazing. And she does advocacy for building in safeguards for abuse and stalking into tech. So she goes around to companies that are making these apps and she teaches people how to have a privacy first mindset and how to author and design applications to prevent abuse and any kind of stalking aspects. Because that to me is really, I've had it happen to friends of mine and by the way, I've seen it happen on both sides of the spectrum before anyone thinks I'm bashing anybody. While it certainly seems to happen more often one way than the other, right? Anybody can be abusive. So I think we should all get together and decide that when we write these apps to build in those controls, and I think an example is what Apple did. So they've got those tags. Have you seen those Angelo where you can tag things? And if you tag something and it's not near your iPhone, it beeps audibly. So if someone goes and puts it on the bumper of your car and they're stalking you or whatever, right. And you drive away, it'll beep audibly. And if you have iPhone, they'll tell you, hey, someone has a tag that's in close proximity to you. That was thought about when they designed the app. And the more we have those types of thoughts, the better I feel about technology solving everyday problems without increasing a vector for abuse. Angelo: Yeah, that's a really interesting point. I hadn't even thought about that. Yeah, I think there's a lot of ways we can look at things like Kafka for example, and say, you know, it can help us improve business and help us do a lot of things. I mean, there are certainly underserved parts of the world who can benefit from this kind of thing. But I think that there's a lot of just little things that we could do to kind of improve the way that we build applications in a way that we think about the whole structure so that we can have a better experience overall. And I think that that kind of solves the other problem of having new folks kind of stay. The way you kind of framed it, it isn't necessarily getting new folks in. It's getting them to stay. So they're entering because it's interesting and there are some problems to solve, but then they move along. So having value, I think that makes a lot of sense. Anna: I was going to just say like networking, like people don't understand this. Someone goes to a company, right? They like it. They tell their friends, their friends get hired. It's the way it works. People don't seem to understand that the same thing works when you have underrepresented people. You know what I mean? So if someone goes there and it's not a good place they're going to talk to their network. They're going to be like, do not work here. And so I think people need to understand it's organic as long as you make it a good place. Angelo: It makes sense. That's how we hire all of our people, is usually from someone who knows somebody. So that's really interesting, because each company tries to build their culture a certain way, you know, saying, our culture has a research-driven mindset. We want you to think like that. Not everybody's comfortable with that. I'll talk with some folks and they're extremely intelligent, but they'll say I don't really want to do that. That isn't part of my DNA and I would just feel stressed. And so you say, okay, well, that's wouldn't make sense. So communicating that kind of culture and you're right. The best way to do it is someone who knows somebody in a social kind of aspect will grow the entire industry, kind of move people in the right way. We’ve talked a little bit about some of the different patterns and designs in Kafka. For example, event sourcing. We talked about, Lambda. I know that was a really important one. We've talked about that in many episodes. I think that you've got this batch and you have a speed layer and Kafka fits nicely in the speed layer, but there are also some newer patterns, some emerging that weren't necessarily built for Kafka or event streaming but end up having a wonderful home there, for example CQRS, and that stands for command and query responsibility segregation. What do you think Anna? Have you heard of it? Anna: Oh yeah CQRS. Oh, absolutely. CQRS yes. I think CQRS is great! And just like shout out, I did a course, we have a course online about event sourcing and stuff like that, and I cover CQRS in it. It's on our website which…come on Anna, I'm horrible with anything that's not incredibly tech, developer.confluent.io? They'll kill me if I get that wrong. I think that's what it is. But there's a course on CQRS there, that I do with a cool hoodie that has the thumb holes in it. You should watch it. It's awesome. But yeah, so CQRS. Angelo: Nice, I will put it in the notes. Anna: And well, it's awesome because it's separation of concerns, right? I love that. I love logical separations of concern, not death by a thousand paper cuts. It's bad. It's like a metronome, right? You can go too far one way or another. But CQRS is great because you're naturally segmenting your computational intensive part of this from needing a fast reply. It's brilliant. I saw it, actually, the first time I was ever introduced to it, I think was at an O'Reilly conference. Angelo: Nice. So I like it too. It's a relatively new experience for me. But let's take a step back. So if you haven't become familiar with CQRS and what it is and what it does, I think it's worth a little degression so that our listeners have an idea of what we're talking about. When you start applying a pattern like event sourcing data is no longer easily queried. The idea is, as you apply it as the events come through, the ability to do some kind of ad hoc searching becomes difficult. You still need those kinds of things though. So as Anna, as you mentioned, there is a separation of concerns and in fact that's an important part of what CQRS is. So what happens is we separate the schemas of the data being inserted and manipulated. And then the schema of the data that's being read—the query. This gives you a very powerful side effect. They don't need to even look the same. For instance, I can insert my data in a certain way and then propagate it through a system to a read database. And then what I can do is use a different kind of schema to be able to read it. Maybe one that's already pre-joined or pre-computed or looks a certain way, removes fields I don't need. And so now you get a lot of very powerful side effects. The downside is it's kind of hard to maintain these moving pieces unless you are very careful about the way that you do it. So then I suppose another downside, a big one too, is that you cannot apply an RM or object relational model where you just manipulate these objects and it pushes the data back and forth to the database and it's all transparent to you. However, it does let you use a decouple, the technology behind what these two schemas look like. The read and the write. And so you can scale them separately and do some different things. Best of breed products for each of those. But you can use something like Apache Kafka as the backbone. You write messages to Kafka, Kafka then in its event stream has these messages recorded. Other systems downstream, a read optimized and maybe a write optimized system are subscribing to events and doing their thing. And you can kind of see now you have this beautiful cycle of the data coming and going through, and it's also extremely high performance. Now Anna, you hit on something really important. And that is separation of concerns. For a long time, I have felt separation of concerns is important because there are natural boundaries between things that we make. It makes maintenance and development a little bit easier, but there are other things that we don't talk about a lot and you hit on it. Because we might get into maybe some technology camp or another, for example, ORMs that I just mentioned make life really easy, but they're very difficult in practice. Because they do tie you, they've cut across so many layers that the concerns all become kind of blurred together, even though you would think it decouples the data layer. In fact, what it accomplishes is it does do that. It does decouple the data layer, but then it couples everything else to one model, both reading and writing, and you cannot scale them separately. The other, and I think kind of secretly powerful thing about separating concerns is that you can separate technology stacks and use them for the thing that is most interesting. What they're best at. For example, I have some computation and I want to write it in Rust or in C, this other component needs so operate inter-operate with these other data systems that are already written in Java or Scala. I can write that in that. So my write to performance can be in one technology my read can be in another or even data systems. The read can be done in a data system that is very optimized for reads, and my writes can be done in a database that is very optimized for writes, maybe even analytics at the same time. And Kafka can be the backbone. Deciding I think that I'm in one shop, I'm in a Java shop or I'm in a whatever shop that just never made sense to me. Anna: And the reason why is because of the first thing I started with, because people are like, we need to plan this so any change we make it will last for 57 years. You know what? It's not the mainframe. I'm sorry. You missed your chance. If you want to write something that's going to last for 30 years they are hiring Cobalt developers. Like they seriously are. You can still get a job doing that. I love Cobalt and I'm not, I want a native Cobalt client for Kafka. Just in case anyone's interested, you know, hit me up. We’ll work on it together. But I think, you know, in this day and age you want something to be flexible, right? You want something to be able to be exactly what you said, targeted in the language it's best at because new things come out every day and speed is everything nowadays. And plus who likes to develop something in a language it wasn't meant to be? That's makes you as a developer feel like crap. I want to develop something that's like, wow yeah, they thought of that. Because guess what? I'm using something for what it's for. That feels so good as a developer. So I’m with you. Angelo: Yeah. And I think making a camp saying, well, I'm a whatever developer, when we develop our craft, we should never you have one language under our belt, right? We should learn a new one every year because it's part of what we do. It's not that hard. It's not as though learning these other languages is so earth shatteringly hard and C is not that scary. It's got pointers, but yes, once you master that it's not super complicated. So, when you write something in a purely functional language, when you do that, you see that the things that you held dear in your object-oriented world aren't all that important and that you get things done in two different ways, that have two different characteristics. I think that that experience is extremely valuable. And so I think organizations, rather than saying, we are this, they should say we're a whatever makes sense to us. We're not afraid. Anna: I agree and unfortunately that is not how people hire. And it is a very uncomfortable concept for people to introduce into an organization. People tend to say we're a Java shop because then they know what their hiring pool has to look like, to me, I shouldn't say this is a fact, this is where I think it comes from. I think it comes from a workforce initiative rather than best practices. The places you see where this happens are very innovative and I can think of a couple of places that I won't mention, because their customers and I probably shouldn't mention it, but the places I know where that innovation is encouraged, right? They're an everything shop. They're like a, let us research, let's do what makes sense, you know, plead your case to me. Tell me why we should do this. And, you know, you got a reason, it's a good one? Let's do it. And those places, those people are happy. I mean, those are some of the happiest developers that I know. So I think I would be happy doing that. And I think you can have a preference. Like I do not like object-oriented programming languages. My brain is functional. I love Kafka Streams. Right? Technically that's in Java sorta, right. I mean, it's very functional. It's like the most functional shoehorned thing. But you know, the experience is good. You need to know these things like you need to understand and know the ins and outs I think. So I agree. Angelo: Yeah, way to take that full circle Anna. I think that's key. If you can have an organization like that or be in one that's, that's how I hire. I don't actually care what languages, you know, the question I normally ask is what if I gave you something you never saw before? How long will it take you to feel comfortable writing code in something like that? It tells you a lot because you have to be hungry and you have to want to learn things. It has to be, you know, part of your DNA. I want to also say, I think I talked about that in the season introduction, there was a quote by Urs Hölzle from Google one of the early Google leaders, he says something like hire ability over experience. And I do that every single time. I will take ability over experience 10 times out of 10 times. Anna: Can I tell you one other thing you just hit upon my most important litmus test when I was a developer. When I would look at a piece of code, one of the main things I would evaluate in terms of where something should go, how it should be orchestrated, how we should implement this is, how long is it going to take somebody who's never been in the service before to feel comfortable pushing a change to production. And I use that as a guide on how to do things, because that's what matters. Is somebody going to have to read an entire book in order to understand what you just wrote when this service really should be like a sink. Two classes. Like, I don't understand, like folders. Why do you have 17 folders? That that type of programming drives me nuts, nuts. And so I love the way you said that because I think we need to design things and picking the right tools makes it easy for that burden to be low, to feel confident pushing something to production, as opposed to something you’ve shoe horned in. But I love that that's how you evaluate, Angelo. I think we should code and evaluate that way. Angelo: Yeah, you know Anna, I'm glad you mentioned that. First of all, one of my senior engineers said something similar to me when he talked about now he was using the Axon framework and one of the projects we had to implement, CQRS, it wasn't using a Kafka backbone. It had a PostgreSQL database backbone and it had some nuances to it. We had to kind of push it on the edge and maybe we'll talk about that in a future episode. But one of the driving factors as to what he liked about the CQRS pattern was that this particular project had a constraint, you know, budgeting constraint and had to bring in new developers using kind of commodity technology. And so you said one of the driving things that led him to choose this architect, was that he felt it fit nicely in the ability to be able to bring in people who have never seen this before. Even ones that we consider new to the field who don't have years and years and years of experience like design patterns are really dense.You have to learn a lot. But this one let them come in. There were rails in place, guardrails, so to speak, and they could come in and be productive from day one and get a lot of work done. Okay, so we've talked a whole lot about separation of concerns. In fact, segregation is in the name of CQRS. It's about separating the responsibilities of the commands and the queries, which we have learned in big data as being critical. In fact, in Apache Cassandra you have to build tables around the way they query it. I mean, it's just, it's how the systems work. But I want to talk about something that we always run into in all sorts of big data technologies, but especially if you're going to do something in event streaming. It is mapping. It’s a whole thing in many technologies, but it's a huge performance concern. Translating things, or serialize them from one kind of thing to another is expensive. So for example, there are technologies out there that specialize in kind of doing this right? Protobuf, Avro. There are schemas intended to be very important and compact. I mean, we know Parquet was that, Apache Carbon, all of those, ORC, right? These different schemas and formats have a very important purpose. And so sometimes we need to move data from one kind of thing. You'll see JSON data, for example, in moving across the wire and then landing in Parquet because it then needs to run to analytics. Avro, we se in Kafka a lot. Do you have any thoughts on any of that? Anna: I do. And so that's why one of the best things you can do, and this is why I also love Kaka. You store your data in Kafka, you precompute your views and then you just update them as the data comes in real time and streaming. Right? So that first computation, if you were to try to collapse all of that and do a reduce, right? Our favorite thing in the world, MapReduce, it takes forever. If you precompute that and then allow the streams to come in, you can have as many views as you want. Spin up a new one and then I think that's kind of how you minimize a performance hit when you have somebody waiting on a result. You obviously have to deal with eventual consistency but we should all be always making sure that that's okay. Anytime you think something has to be synchronous, rethink that, you know, you can hit me up on Twitter. I will talk with you about it and make you really think about if that's true. Angelo: That is one of the longest running arguments I've had, that you have to design for performance, you can't optimize for it later. And what I mean by that is, for example, I was telling you about this clinical decision support tool, you know, at the moment of care. I you're going to query all this data, pull all this stuff across the wire, run computations, and you expect that to be done at the moment a doctor is view interacting with the patient? You've lost. It's already going to be longer, the latency, then a clinician is going to wait. It has to be instantaneous. In order for it to be instantaneous, you don't throw computing power at it. You have to pre-compute or you have to fetch data at other times in order to be able to solve. You cannot push this entire, for example, you can't go get the data from an electronic medical record, translate it into something else. You know map it from whatever it is to JSON data, which is notoriously slow, push it across the wire, translate it into something else, run the computation on it, translate it again into something else, put in a database, query it, and then return it to the user. Even if computation was zero. You will never overcome the amount of latency you gained by this entire chain. It's a myth of that being stateless is somehow important. It doesn't need to be stateless. It needs to be synchronous, but parts of it don't have to be. It's really all about latency. It’s about latency, it’s about performance. Anna: It’s where you keep the state. Angelo: Exactly. Anna: I always say, it will be stateless for your front end. It’s where you keep the state. Angelo: Well, yeah, you know, an argument that I had once we said, well, okay, so over time it grew. And so what did they do? They put caching over here. They put caching over here. They put caching over here. And I go, state, state, state. So why didn’t we just build a stateful, you know, make it where the data goes into the stream, stays durable. We compute on it and it moves. So the funny thing about it is I said, okay, okay, let's not call Kafka a data system. Let's call it a cache. Does that make you happier? Right? It's because it's the caching is state. Anna: Yep. Absolutely. Angelo: So, yeah, it goes back to this thought of, can we rethink old problems in a new way? And I think that that's what these kinds of technologies bring to the table. Well, Anna, thank you very much as always, I enjoyed spending some time with you. I’m your host, Angelo Kastroulis, and this has been Counting Sand. Thank you for joining us today. Before you go, please take a moment to like, review, and subscribe to the podcast. Also, you can follow us on social media. You can find me on LinkedIn @angelok1, feel free to reach out to me there. You can also follow me on Twitter, my handle is @angelokastr. Or you can follow my company Ballista Technology Group. You can also find the show on countingsandshow.com or your favorite podcast platform.