Counting Sand

Kafka Event Streaming Part 2

Episode Summary

Angelo welcomes back a friend and fellow Kafka enthusiast, Anna McDonald. Join them as they continue their discussion around Kafka event streaming, introduce a new pattern (CQRS), and discuss what they look for when hiring the right development team.

Episode Notes

 

Our Team:

Host:Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: Today, I'm happy to welcome back a friend of mine and fellow Kafka  enthusiast, Anna MacDonald, who was on the previous episode. We're going to  talk a bit more about Kafka. How we hire people and a pattern you should know  about called CQRS. I'm your host Angelo Kastroulis and this is Counting Sand. I was mentioning, I do a lot in health care. It just is a natural place where we can  do a ton of good. So, you know, thinking about that, it's one theme that I have  had on my mind for a really long time. And that is how do we entice new people  to enter into computer science? Do you have any thoughts on that? Anna: Yeah, I mean, I think we had a discussion about this, right? To me, when  I look at my career and when I look at the demographics, it wasn't a pipeline  problem, right? It’s that people were jerks. And they were like, hey, look, I could just get paid the same and not have to  deal with a bunch of jerks. So they left. And not to say that everyone's a jerk,  my best friends in the world are in computers. There are jerks everywhere it just  seems like in computers, we are more permissive with allowing people, now it's  getting better and I've never been permissive with that as anybody that you  know will tell you. But I think it's more about making sure that the people that do come in. Because  I mean, there's tons of, you know, women who are computer scientists. There  are amazing, you know, computer scientists that are of all races, archetypes,  like, you know everything you'd want, right? The point is to make whatever  industry you're in, a place where it's not only welcoming, and this is a  conversation I have a lot, but it lets people succeed. Don't just hire somebody,  you promote them. I want to see personally, diversity at every single level of a  company, every single level. And so I personally don't believe it's a pipeline  problem at all in any way, shape, or form. I worked with a lot of at-risk kids in college and did mentoring, which is a great  program. It's called Gear Up. Those kids are brilliant. Any one of them would  have been an awesome hire. You know, it isn't to me like how do you get people  interested? It's how do you support people in a way that's equitable and make  sure that they get theirs. And when I say that, I mean, they get promoted, they  get bonuses, they rise to all levels and feel like this is a career and this is a job  where I can, this is, I can feel good about it. You know, and not to say we  shouldn't do outreach. We should always do outreach. I think that's fun. I mean, it's just a fun, good thing to do. But I think doing that  and doing nothing else will never work. So that's my opinion on that. Angelo: I appreciate your opinion. It makes a lot of sense. I think that it's  similar to mine. My thought is that. It's hard for someone in this industry to see  and feel like they matter, like the things that they're doing have an impact or  whatever. To see value in it. And it's hard for them then to be valued. If you go into certain kinds of companies and they're, you know, they kind of  treat you like a number, you kind of see that and then you exit. Like you said,  you know, I'm not being treated right so you leave it. And it isn't all like that. I think we have to do a better job of making the things that are soul healing in  technology, a little bit more of the goal and the other stuff kind of a little bit less  of a goal. And that is like having a great place to work. It's great. Everybody says that, but  not all places are like that. Having a difference that they make in little things,  kind of getting back to this idea of computer science. I think computer science  has the capability to change our lives. It is not a magic bullet. I know that and we shouldn't rely too much on technology because if you rely  too much on tech, you know, like for example, in clinical decision support, I  don't want the technology making the decisions. I want it to enable physicians  to make a decision. Do you have any thoughts on how computer science can  change our lives or the ways that we live? Anna: Yeah, I mean, in terrifying ways. There are some really good ones but  there are so many bad ones. This is way outside my wheelhouse. So I am not  very up on legislation. I follow a lot of people on Twitter who are, and I'm  trying to learn more about it. But in my heart, right? I know so many times  where computers have helped people. And this is going to sound, probably not like a ground shattering use case, but in  terms of isolation for elderly people. So, it can get very lonely and I've set up  many computers for elderly people. Very simple Linux computers, right? Solitaire is a huge hit. But just, you know, email and very easy ways to connect  with your family and it's been huge, like. You know, technology has kept people  together during the pandemic. You know, my kids couldn't see their  grandparents for like eight months. I don't know what we would've done  without FaceTime and Zoom. So I think that there are some wonderful uses of technology. There's this, and I  follow her on Twitter, but we should put this in the show notes, her name's Eve,  she's amazing. And she does advocacy for building in safeguards for abuse and stalking into tech. So she goes around to companies that are making these apps  and she teaches people how to have a privacy first mindset and how to author  and design applications to prevent abuse and any kind of stalking aspects. Because that to me is really, I've had it happen to friends of mine and by the  way, I've seen it happen on both sides of the spectrum before anyone thinks I'm  bashing anybody. While it certainly seems to happen more often one way than  the other, right? Anybody can be abusive. So I think we should all get together and decide that  when we write these apps to build in those controls, and I think an example is  what Apple did. So they've got those tags. Have you seen those Angelo where  you can tag things? And if you tag something and it's not near your iPhone, it beeps audibly. So if someone goes and puts it on the bumper of your car and  they're stalking you or whatever, right. And you drive away, it'll beep audibly.  And if you have iPhone, they'll tell you, hey, someone has a tag that's in close  proximity to you. That was thought about when they designed the app. And the  more we have those types of thoughts, the better I feel about technology solving  everyday problems without increasing a vector for abuse. Angelo: Yeah, that's a really interesting point. I hadn't even thought about that.  Yeah, I think there's a lot of ways we can look at things like Kafka for example,  and say, you know, it can help us improve business and help us do a lot of  things. I mean, there are certainly underserved parts of the world who can  benefit from this kind of thing. But I think that there's a lot of just little things that we could do to kind of  improve the way that we build applications in a way that we think about the  whole structure so that we can have a better experience overall. And I think that  that kind of solves the other problem of having new folks kind of stay. The way you kind of framed it, it isn't necessarily getting new folks in. It's  getting them to stay. So they're entering because it's interesting and there are  some problems to solve, but then they move along. So having value, I think that  makes a lot of sense. Anna: I was going to just say like networking, like people don't understand this.  Someone goes to a company, right? They like it. They tell their friends, their  friends get hired. It's the way it works. People don't seem to understand that the same thing works when you have  underrepresented people. You know what I mean? So if someone goes there and  it's not a good place they're going to talk to their network. They're going to be like, do not work here. And so I think people need to  understand it's organic as long as you make it a good place. Angelo: It makes sense. That's how we hire all of our people, is usually from  someone who knows somebody. So that's really interesting, because each  company tries to build their culture a certain way, you know, saying, our culture  has a research-driven mindset. We want you to think like that. Not everybody's comfortable with that. I'll talk  with some folks and they're extremely intelligent, but they'll say I don't really  want to do that. That isn't part of my DNA and I would just feel stressed. And so  you say, okay, well, that's wouldn't make sense. So communicating that kind of culture and you're right. The best way to do it is  someone who knows somebody in a social kind of aspect will grow the entire  industry, kind of move people in the right way. We’ve talked a little bit about some of the different patterns and designs in  Kafka. For example, event sourcing. We talked about, Lambda. I know that was  a really important one. We've talked about that in many episodes. I think that  you've got this batch and you have a speed layer and Kafka fits nicely in the  speed layer, but there are also some newer patterns, some emerging that weren't  necessarily built for Kafka or event streaming but end up having a wonderful  home there, for example CQRS, and that stands for command and query  responsibility segregation. What do you think Anna? Have you heard of it? Anna: Oh yeah CQRS. Oh, absolutely. CQRS yes. I think CQRS is great! And  just like shout out, I did a course, we have a course online about event sourcing  and stuff like that, and I cover CQRS in it. It's on our website which…come on  Anna, I'm horrible with anything that's not incredibly tech, developer.confluent.io? They'll kill me if I get that wrong. I think that's what it is. But there's a course  on CQRS there, that I do with a cool hoodie that has the thumb holes in it. You  should watch it. It's awesome. But yeah, so CQRS. Angelo: Nice, I will put it in the notes. Anna: And well, it's awesome because it's separation of concerns, right? I love  that. I love logical separations of concern, not death by a thousand paper cuts. It's bad. It's like a metronome, right? You can go too far one way or another. But  CQRS is great because you're naturally segmenting your computational  intensive part of this from needing a fast reply. It's brilliant. I saw it, actually,  the first time I was ever introduced to it, I think was at an O'Reilly conference. Angelo: Nice. So I like it too. It's a relatively new experience for me. But let's  take a step back. So if you haven't become familiar with CQRS and what it is  and what it does, I think it's worth a little degression so that our listeners have an idea of what we're talking about. When you start applying a pattern like  event sourcing data is no longer easily queried. The idea is, as you apply it as the events come through, the ability to do some  kind of ad hoc searching becomes difficult. You still need those kinds of things  though. So as Anna, as you mentioned, there is a separation of concerns and in  fact that's an important part of what CQRS is. So what happens is we separate  the schemas of the data being inserted and manipulated. And then the schema of the data that's being read—the query. This gives you a  very powerful side effect. They don't need to even look the same. For instance, I  can insert my data in a certain way and then propagate it through a system to a  read database. And then what I can do is use a different kind of schema to be  able to read it. Maybe one that's already pre-joined or pre-computed or looks a certain way,  removes fields I don't need. And so now you get a lot of very powerful side  effects. The downside is it's kind of hard to maintain these moving pieces unless  you are very careful about the way that you do it. So then I suppose another downside, a big one too, is that you cannot apply an  RM or object relational model where you just manipulate these objects and it  pushes the data back and forth to the database and it's all transparent to you.  However, it does let you use a decouple, the technology behind what these two  schemas look like. The read and the write. And so you can scale them separately and do some different things. Best of  breed products for each of those. But you can use something like Apache Kafka  as the backbone. You write messages to Kafka, Kafka then in its event stream  has these messages recorded. Other systems downstream, a read optimized and  maybe a write optimized system are subscribing to events and doing their thing. And you can kind of see now you have this beautiful cycle of the data coming  and going through, and it's also extremely high performance. Now Anna, you hit on something really important. And that is separation of  concerns. For a long time, I have felt separation of concerns is important  because there are natural boundaries between things that we make. It makes  maintenance and development a little bit easier, but there are other things that  we don't talk about a lot and you hit on it. Because we might get into maybe some technology camp or another, for  example, ORMs that I just mentioned make life really easy, but they're very  difficult in practice. Because they do tie you, they've cut across so many layers  that the concerns all become kind of blurred together, even though you would  think it decouples the data layer. In fact, what it accomplishes is it does do that. It does decouple the data layer,  but then it couples everything else to one model, both reading and writing, and  you cannot scale them separately. The other, and I think kind of secretly powerful thing about separating concerns  is that you can separate technology stacks and use them for the thing that is  most interesting. What they're best at. For example, I have some computation  and I want to write it in Rust or in C, this other component needs so operate  inter-operate with these other data systems that are already written in Java or  Scala. I can write that in that. So my write to performance can be in one technology  my read can be in another or even data systems. The read can be done in a data  system that is very optimized for reads, and my writes can be done in a database  that is very optimized for writes, maybe even analytics at the same time. And Kafka can be the backbone. Deciding I think that I'm in one shop, I'm in a  Java shop or I'm in a whatever shop that just never made sense to me. Anna: And the reason why is because of the first thing I started with, because  people are like, we need to plan this so any change we make it will last for 57  years. You know what? It's not the mainframe. I'm sorry. You missed your  chance. If you want to write something that's going to last for 30 years they are  hiring Cobalt developers. Like they seriously are. You can still get a job doing that. I love Cobalt and I'm  not, I want a native Cobalt client for Kafka. Just in case anyone's interested, you know, hit me up. We’ll work on it together. But I think, you know, in this day  and age you want something to be flexible, right? You want something to be  able to be exactly what you said, targeted in the language it's best at because  new things come out every day and speed is everything nowadays. And plus who likes to develop something in a language it wasn't meant to be?  That's makes you as a developer feel like crap. I want to develop something  that's like, wow yeah, they thought of that. Because guess what? I'm using  something for what it's for. That feels so good as a developer. So I’m with you. Angelo: Yeah. And I think making a camp saying, well, I'm a whatever  developer, when we develop our craft, we should never you have one language  under our belt, right? We should learn a new one every year because it's part of  what we do. It's not that hard. It's not as though learning these other languages  is so earth shatteringly hard and C is not that scary. It's got pointers, but yes, once you master that it's not super complicated. So,  when you write something in a purely functional language, when you do that,  you see that the things that you held dear in your object-oriented world aren't all  that important and that you get things done in two different ways, that have two  different characteristics. I think that that experience is extremely valuable. And  so I think organizations, rather than saying, we are this, they should say we're a  whatever makes sense to us. We're not afraid. Anna: I agree and unfortunately that is not how people hire. And it is a very  uncomfortable concept for people to introduce into an organization. People tend  to say we're a Java shop because then they know what their hiring pool has to  look like, to me, I shouldn't say this is a fact, this is where I think it comes from. I think it comes from a workforce initiative rather than best practices. The  places you see where this happens are very innovative and I can think of a  couple of places that I won't mention, because their customers and I probably  shouldn't mention it, but the places I know where that innovation is encouraged,  right? They're an everything shop. They're like a, let us research, let's do what makes  sense, you know, plead your case to me. Tell me why we should do this. And,  you know, you got a reason, it's a good one? Let's do it. And those places, those  people are happy. I mean, those are some of the happiest developers that I know. So I think I would be happy doing that. And I think you can have a preference.  Like I do not like object-oriented programming languages. My brain is functional. I love Kafka Streams. Right? Technically that's in Java sorta, right. I  mean, it's very functional. It's like the most functional shoehorned thing. But you know, the experience is good. You need to know these things like you  need to understand and know the ins and outs I think. So I agree. Angelo: Yeah, way to take that full circle Anna. I think that's key. If you can  have an organization like that or be in one that's, that's how I hire. I don't  actually care what languages, you know, the question I normally ask is what if I  gave you something you never saw before? How long will it take you to feel  comfortable writing code in something like that? It tells you a lot because you have to be hungry and you have to want to learn  things. It has to be, you know, part of your DNA. I want to also say, I think I  talked about that in the season introduction, there was a quote by Urs Hölzle  from Google one of the early Google leaders, he says something like hire ability  over experience. And I do that every single time. I will take ability over  experience 10 times out of 10 times. Anna: Can I tell you one other thing you just hit upon my most important  litmus test when I was a developer. When I would look at a piece of code, one  of the main things I would evaluate in terms of where something should go,  how it should be orchestrated, how we should implement this is, how long is it  going to take somebody who's never been in the service before to feel  comfortable pushing a change to production. And I use that as a guide on how to do things, because that's what matters. Is  somebody going to have to read an entire book in order to understand what you  just wrote when this service really should be like a sink. Two classes. Like, I  don't understand, like folders. Why do you have 17 folders? That that type of programming drives me nuts, nuts. And so I love the way you  said that because I think we need to design things and picking the right tools  makes it easy for that burden to be low, to feel confident pushing something to  production, as opposed to something you’ve shoe horned in. But I love that that's how you evaluate, Angelo. I think we should code and  evaluate that way. Angelo: Yeah, you know Anna, I'm glad you mentioned that. First of all, one of  my senior engineers said something similar to me when he talked about now he  was using the Axon framework and one of the projects we had to implement, CQRS, it wasn't using a Kafka backbone. It had a PostgreSQL database  backbone and it had some nuances to it. We had to kind of push it on the edge and maybe we'll talk about that in a future  episode. But one of the driving factors as to what he liked about the CQRS  pattern was that this particular project had a constraint, you know, budgeting  constraint and had to bring in new developers using kind of commodity  technology. And so you said one of the driving things that led him to choose this architect,  was that he felt it fit nicely in the ability to be able to bring in people who have  never seen this before. Even ones that we consider new to the field who don't  have years and years and years of experience like design patterns are really  dense.You have to learn a lot. But this one let them come in. There were rails in place, guardrails, so to speak,  and they could come in and be productive from day one and get a lot of work  done. Okay, so we've talked a whole lot about separation of concerns. In fact,  segregation is in the name of CQRS. It's about separating the responsibilities of  the commands and the queries, which we have learned in big data as being  critical. In fact, in Apache Cassandra you have to build tables around the way  they query it. I mean, it's just, it's how the systems work. But I want to talk about something  that we always run into in all sorts of big data technologies, but especially if  you're going to do something in event streaming. It is mapping. It’s a whole  thing in many technologies, but it's a huge performance concern. Translating  things, or serialize them from one kind of thing to another is expensive. So for example, there are technologies out there that specialize in kind of doing  this right? Protobuf, Avro. There are schemas intended to be very important and  compact. I mean, we know Parquet was that, Apache Carbon, all of those, ORC,  right? These different schemas and formats have a very important purpose. And so sometimes we need to move data from one kind of thing. You'll see  JSON data, for example, in moving across the wire and then landing in Parquet  because it then needs to run to analytics. Avro, we se in Kafka a lot. Do you  have any thoughts on any of that? Anna: I do. And so that's why one of the best things you can do, and this is why  I also love Kaka. You store your data in Kafka, you precompute your views and  then you just update them as the data comes in real time and streaming. Right?  So that first computation, if you were to try to collapse all of that and do a  reduce, right? Our favorite thing in the world, MapReduce, it takes forever. If you precompute  that and then allow the streams to come in, you can have as many views as you  want. Spin up a new one and then I think that's kind of how you minimize a  performance hit when you have somebody waiting on a result. You obviously have to deal with eventual consistency but we should all be  always making sure that that's okay. Anytime you think something has to be  synchronous, rethink that, you know, you can hit me up on Twitter. I will talk  with you about it and make you really think about if that's true. Angelo: That is one of the longest running arguments I've had, that you have to  design for performance, you can't optimize for it later. And what I mean by that  is, for example, I was telling you about this clinical decision support tool, you  know, at the moment of care. I you're going to query all this data, pull all this  stuff across the wire, run computations, and you expect that to be done at the  moment a doctor is view interacting with the patient? You've lost. It's already  going to be longer, the latency, then a clinician is going to wait. It has to be  instantaneous. In order for it to be instantaneous, you don't throw computing  power at it. You have to pre-compute or you have to fetch data at other times in  order to be able to solve. You cannot push this entire, for example, you can't go get the data from an  electronic medical record, translate it into something else. You know map it  from whatever it is to JSON data, which is notoriously slow, push it across the  wire, translate it into something else, run the computation on it, translate it  again into something else, put in a database, query it, and then return it to the  user. Even if computation was zero. You will never overcome the amount of latency  you gained by this entire chain. It's a myth of that being stateless is somehow  important. It doesn't need to be stateless. It needs to be synchronous, but parts of  it don't have to be. It's really all about latency. It’s about latency, it’s about  performance. Anna: It’s where you keep the state. Angelo: Exactly. Anna: I always say, it will be stateless for your front end. It’s where you keep  the state. Angelo: Well, yeah, you know, an argument that I had once we said, well, okay,  so over time it grew. And so what did they do? They put caching over here.  They put caching over here. They put caching over here. And I go, state, state,  state. So why didn’t we just build a stateful, you know, make it where the data  goes into the stream, stays durable. We compute on it and it moves. So the funny thing about it is I said, okay, okay,  let's not call Kafka a data system. Let's call it a cache. Does that make you  happier? Right? It's because it's the caching is state. Anna: Yep. Absolutely. Angelo: So, yeah, it goes back to this thought of, can we rethink old problems  in a new way? And I think that that's what these kinds of technologies bring to  the table. Well, Anna, thank you very much as always, I enjoyed spending some  time with you. I’m your host, Angelo Kastroulis, and this has been Counting Sand. Thank you  for joining us today. Before you go, please take a moment to like, review, and  subscribe to the podcast. Also, you can follow us on social media. You can find  me on LinkedIn @angelok1, feel free to reach out to me there. You can also  follow me on Twitter, my handle is @angelokastr. Or you can follow my  company Ballista Technology Group. You can also find the show on  countingsandshow.com or your favorite podcast platform.