Counting Sand

Can Computer Science Make Life Better?

Episode Summary

Can computer science improve our lives? Most people who work in the field would like to think so. Host Angelo Kastroulis considers the ways that computer science is poised to make our lives better but also introduces caveats such as bias in machine learning. Along the way he sets up themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail in future episodes.

Episode Notes

Can computer science improve our lives? Most people who work in the field would like to think so. Host Angelo Kastroulis, CEO of the Carrera Group, considers the ways that computer science is poised to make our lives better but also introduces caveats such as bias in machine learning. Along the way he sets up future episodes’ themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail.

Acknowledging that technology is neither a panacea nor a tool without downsides, Angelo starts with a review of some research on the psychological and sociological effects of social media. Beyond social media, he also questions the predictive ability of big data and introduces the idea of bias in machine learning. He does this through recalling a chance encounter with an old friend and fellow computer scientist, Andy Lee. Andy is the chief technology officer and founder of NeuroInitiative, a company that uses advanced simulation techniques to try to create new drug compounds, as well as chief operating officer of Vincere Biosciences, a company that takes these drug compounds all the way to human trials and hopefully to the market. Andy talks about how, since bias is unavoidable, we should find a way to make this weakness the strength of the model. In considering this, Angelo defines key concepts such as a model’s features, what accuracy means, and why it is important not to conflate correlation with causation.

He shares the important axiom "All machine learning models are bad and some are less bad than others" and exhorts listeners to “Never lie with stats.” He ends by suggesting a few actionable ways that computer science can help our lives become better, setting up themes—including predictive analytics, simulation, and health decision support systems—that he will dive into with more technical detail in future episodes of Counting Sand.

About the Host

Angelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.

Citations

Bruce, V., and Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305-327. doi: 10.1111/j.2044-8295.1986.tb02199.x

Cameron, S. (2018, November 12). Shark Attacks, ice Creams, and the Randomised Trial. Retireived Septmber 16, 2021, from https://the-gist.org/2018/11/shark-attacks-ice-creams-and-the-randomised-trial/

Data Never Seeps Infographic. (n.d.). Domo.Com. Retrieved September 16, 2021, from https://www.domo.com/learn/infographic/data-never-sleeps-8

McLean Hospital. (2021, February 9). Here’s How Social Media Affects Your Mental Health. Retrieved September 16, 2021, Https://Www.Mcleanhospital.Org/Essential/It-or-Not-Social-Medias-Affecting-Your-Mental-Health. https://www.mcleanhospital.org/essential/it-or-not-social-medias-affecting-your-mental-health

World Happiness 2019 Chapter 2. (n.d). https://Worldhappiness.Report/Ed/2019/Changing-World-Happiness/. Retrieved September 16, 2021, from https://worldhappiness.report/ed/2019/changing-world-happiness/

Further Reading

Is Social Media Bad For You: The Evidence and the Unknowns

World Happiness Report 2021

Person Perception 25 Years after Bruce and Young (1986)

Host: Angelo Kastroulis

Executive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Audio Engineer: Mert Çetinkaya; Communications Strategist: Albert Perrotta

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo Kastroulis : Can computer science improve our lives? I'd like to think so. And I think most who work in the field would like to as well. But it's not going to come from the next social media platform. It's going to come from something that will help us live longer and be safer.

I'm your host Angelo Kastroulis. And this is Counting Sand.

There are so many bold claims and promises made by computer science and technology in general, that it's easy to understand why we would have fatigue or we'd get overloaded by all the buzzwords.

We live in an age of unprecedented information. And if we're honest with ourselves, has it made our lives better? Social media may even have made our lives measurably worse.

According to Domo, who publishes an infographic every year depicting how much data is generated every minute, there are 41,666,667 messages. The user share on WhatsApp every minute. Facebook users share 150,000 messages. 69,000 users apply for jobs on LinkedIn. Instagram business profiles have 138,000 clicks every minute. So while it's easy to understand that maybe 69,000 job applications every minute or a good thing. Are our lives measurably better? It's hard to say. To the person who applied for the job and got it? Yes. It definitely made their life better. But what about uploading all those images and messages? Were all those useful? Sometimes. And sometimes not. Those messages could easily be bullying or they could be some important piece of information about your family. So whether social media makes our life better or not as a pretty hard question to answer. After all, how do you define better?

Every year, a new world happiness report is published. And this year, well, of course COVID 19, certainly played a role. So let's go back a couple of years. How about to 2019, before COVID? Well, what happened? What did we see? Well, the findings are very interesting. Depending on where you lived and the countries you might expect to be happiest are not. In fact, some of the poorest countries were the happiest, according to that list. According to the report who were some of the unhappiest countries?

The countries with the largest populations, China, India, United States, and Russia were some of the least happy. How do you measure happiness? How did they measure it? Well, they actually measured it in three different axes. One of them was a general happiness computed based on a questionnaire and then the adjusted by population.

The second was a questionnaire that kind of indicated how your continual happiness is. For example, do you have a trend of happiness by maybe what were your feelings yesterday? How did you feel? Did you feel stressed out or worried yesterday? Or did you feel laughter and happiness yesterday? And then try to record a trend. If you felt happy yesterday and you feel happy today, then we see a general trend towards happiness, and we call that the positive. If you have a history of worry and anxiety, general and happiness, and it continues to after day, well then we'll label that a negative trend.

That's probably as good a measure as any other of happiness.

In fact what the report did show is that there was a steady decline in happiness since 2011, year over year. And a steady increase in the negative effect or the feeling of worry, anxiety and unhappiness, year over year. So while a record number of people are joining technology, social media. In fact, I had heard it said that there are more smartphones than there are working toilets. So has that made our lives better? Well, this is just one study and we have to take data for what it is. But it is still very, very interesting.

According to a BBC survey on social media. They found that among 1800 people who were surveyed women reported to being more stressed than men. Twitter was found to be a significant contributor because it increased their awareness of other people's stress. And the normal human reaction is empathy. But, to be fair, they also concluded that Twitter could maybe even be a coping mechanism for some folks. The more they use it, the less stressed they were. The researchers in 2014, found that mood was also affected by social media. University of California found that a good or bad mood may also spread between people on social media. And then, if there's a general negative view on social media, it will spread.

Anxiety also has been shown to spread on social media. Researchers at a university in Romania, reviewed existing research on the relationship between social anxiety and social networking back in 2016. And they found that the results were mixed. There was some good and some bad. A similar study was conducted in 2016 involving 1,700 people. And they found that risk of depression and anxiety among people who use most social media platforms work, correlated. Researchers at the university of Pittsburgh asked 1,718 to 30 year olds about their social media and sleeping habits. And they found a link between sleep disturbance and social media.

McLean Hospital is a hospital that leads treatment on mental health. And it's a Harvard medical school affiliated hospital. They pointed out that according to Pew Research Center, 69% of adults and 81% of teens claim to use social media. That puts a large amount of the population at risk of feeling anxious, depressed, or ill over social media use. Social media use has a reinforcing nature. Its activities reward the brain center by releasing dopamine. Dopamine has been called the feel-good chemical and it's linked to the pleasurable sides of our brain. So if it has a positive effect, it will continue to reinforce that becoming addicted. If there's a negative effect, it will reinforce that behavior as well. According to McClean though, the platforms and social media are designed to be addictive in, are associated with anxiety, depression, and even physical ailments. Social media made some bold promises, connecting humanity, uniting us and making us more social. In fact, we are social creatures. However, social media has proven to have some very antisocial. Now it does connect our lives. In fact, there are positive benefits. Social media is not all bad and it's not fair to single out and harp on social media.

As the only problem here, it's just an easy illustration of how computer science maybe has gone awry. And social media is not the only example of computer science kind of going wrong. It's not even about connecting us as individuals through technology. Technology does a lot of other things. It's not just about even getting data through technology, even if we were able to get information or to make some sort of use from the information. Is it actionable? Is it useful? I it were, would we even trust it? For example, if a computer could predict your diagnosis of some disease, would you believe it, should your doctor believe it? Those are two different problems. The information a computer might give us for some sort of prediction in that sense might be easily believable for those who don't understand how it derived it. And maybe your doctor should be more. Or think about this, if the information or the prediction was accurate, most of the time say 99% of the time, what would we do in the 1% that it isn't? Would we be likely to just click past it, to accept it, to just become a little bit lazy in our attack or scrutiny of that particular piece of data?

There's also another really big problem, especially with machine learning. Four or five years ago, I was attending the Open Data Science conference in Cambridge, Massachusetts, and on my way to a pub after one of the long days of lecturing in classes, headed towards a brew pub by the convention center.

As it so happened. I bumped into an old friend of mine on the street, Andy Lee, who I hadn't seen in must be a decade. I had a chance to interview Andy some years later and you're going to get to hear the whole interview in a few episodes, but he recounted.

Andy Lee: Kind of funny that I, several times just bumping into your own town here. Um, which is kinda kinda neat to have that. It is very much a, I guess, a small world and also how interconnected these, these communities of biology and technology are becoming.

Angelo Kastroulis: He's not kidding. What are the chances? We hadn't seen each other in so many years. It's not as though, as we both met in Cambridge or Boston and we both lived there. In fact, we met in Jacksonville, Florida, and known each other there a long time and our paths somehow diverged, but Cambridge seemed to be the epicenter of what we were doing.

Andy Lee: Yeah. We started there in Jacksonville, uh, and, uh, doing the, the technology, uh, stuff and that there's, there's quite a strong software engineering community there that was helpful to tap into and some good people to bounce ideas off of to kind of get that stuff going.

And it's a little more internet-community friendly, I suppose getting into the real world therapeutics and partnering with pharmaceutical companies and talking to venture capital, Cambridge is just the center of the universe. So that was really a no brainer to move up here when we spun off the therapeutics piece.

Angelo Kastroulis: Now our paths diverged. I went to Harvard and spent a lot of time in Boston and Cambridge, but I often liked to go back and visit, especially if there is a data science conference. Andy ended up founding a few different companies. He's a chief technology officer and founder of NeuroInitiative, a company that uses advanced simulation techniques to try to create new drug compounds. He's also the chief operating officer of Vincere Biosciences, a company that takes these drug compounds all the way to human trials and hopefully to the market. Cambridge is very established for its talent pool and bioscience, so it's not surprising that Andy would find himself there.

But what is interesting and a coincidence that we both have been involved in machine learning just from our own separate evolutions, we went down different paths. I did it because I wanted to understand machine learning and frankly, I'm uncomfortable with not knowing how something works. Maybe I'm a little bit like you. You may not feel very comfortable thinking of the world as just a bunch of black boxes and it's just okay that this thing does whatever it does. I don't need to know how it works…But that just doesn't work for me. And I bet that might not work for you either. We all should become masters of our craft. As computer scientists, we always want to continue to improve our knowledge and move the bar up. That's an important part of who we are. But moving our bar of knowledge up also means that we kind of have to dig down into these black boxes and demystify them just a little bit. Why, why do we have to know how they work? Well, if we don't know how they work, we won't truly know the trade-offs we're making when we can hitch our wagons to them. And we start using these technologies, there are trade-offs, there are downsides and there are distinct advantages. And especially in the world of big data nowadays, there are many technologies that seem to overlap. The nuances, the differences, the devil is in the details. So knowing those little pieces are really, really important. So one of the things I would like to keep doing is digging into the details of how some of this works. The challenge will be keeping this the right level that makes sense for the listener. So being that I wanted to get inside this black box, I thought I would go back to grad school and learn a bit about what's going on inside this machine. Now, Andy wanted to develop some technology that would change the world. And in fact, biosciences a pretty good way to do that.

It's still a pretty cool coincidence that we bumped into each other. And you know, an old friend that you haven't seen forever, especially randomly on some street in some other city. So, of course we took some time to catch up. One of the topics of conversation was the inherent bias in machine learning. Listen to what Andy had to say about it.

Andy Lee: There's been a lot of talk about AI models, being black boxes and finding out later they've got all these biases built into them. And you know, this idea that oh, it's a black box. Well, I think you have to acknowledge that there, that bias is going to be there and address it and handle it appropriately and, and proactively and say, well, let's, let's use that as a strength, not a weakness. Let's, let's take these these differences and use them to get better at what we're doing.

Angelo Kastroulis: What Andy said, there really resonated with me. It's a thought I've long held: Success is taking your weaknesses and turning them into strengths. So taking this black box and turning that into a strength is where you'll discover something new and interesting and different. But to do that, you really have to understand what the black box is. I think that this distills down everything that I've been trying to do.

What's he talking about? What is bias?

Well, as I mentioned, Andy has been working in bioscience and in simulation. So there's no way to escape machine learning in that field. And the same is true in the path I took. If you've been working in machine learning or artificial intelligence for any length of time, you'll be introduced to the concept of bias. So I think it's worthwhile talking about it for just a few minutes.

Let's first establish a few common terms. First of all, machine learning creates some algorithm. That's able to make some prediction. Let's call that a model. Now, in order to build these models, we call that training in training requires the use of a lot of data. So first of all, a model is going to only be as good as the data we get. We could actually take the same model, give it to someone else. They train it on a completely different set of data and the model will behave completely differently. It'll have its own characteristics. It'll will have a different accuracy. Accuracy is one way that we can judge a model. When we say accuracy we're meaning what percentage of the time is it right and when is it wrong. That's for categorical classification. There are other ways that we could determine how good a model is, and we'll get into those too, but there's something else that's interesting about machine learning.

There was an old saying at Harvard that I'll never forget. And that is that "all machine learning models are bad and some are less bad than others". What does that mean? Well, let's think about it for a second. A machine learning model was designed by some human. That human had to make some decisions about which things, which features that model needs to learn. The little pieces of data that we give it are called features. Features are those items that might impact the model. For example, if there's one data element, maybe your age…that is, age may have some predictive power in some of these models. And that's what we're trying to find out when we're training it. We're asking the model to look at all of these data elements and see if it can find any kind of pattern. How strongly correlated are these features to each other? Incidentally, some of them might not work very well individually, but put them together, all of a sudden they have some really interesting benefits. The act of choosing which features to include, that act alone, introduces bias to a model. What it means is you've encoded some of your knowledge, some of your leaning into the model, the fact that you even said that age is a factor biases the model. If you left age out, you've also biased the model because age may have been an important factor. We just didn't know about it. So all models then, regardless of how use slice it, are biased because you can't just give them infinity.

So let's go back to our question. Would you trust a machine learning model to predict your diagnosis? Well, you'd probably ask, well, it depends—what features did it use? Why did it get to this diagnosis? These are really important questions. My dad died of pancreatic cancer. Why did he get cancer? We really don't know. It could have been genetic. It could have been environmental or a whole host of other things, because we just don't know the cause of it. Because science doesn't even understand why it happens, it's hard for us to know what to encode in the model. We can't just throw the kitchen sink at it because we don't even know what the kitchen sink is. Is the kitchen sink our genome?Everything we've ever eaten? Everyone we've ever come in contact with, every place we've ever lived? Every we environment we've touched, our entire family history? Our whole medical record, even though the medical record is incomplete? You can see where I'm getting at. It becomes very, very hard to do that.

Some things may not be relevant. For instance, every single substance we've come in contact with. A vast majority of those will be completely irrelevant, but we don't know which ones they are. So we begin the task of painstakingly using our intuition to pick things out. What might be important and what is not. Let's see if there's a correlation here. But there's a bit of a danger with that. Sometimes our intuition is just wrong.

Let's take a look at a very famous example, probably the most famous one there is that illustrates that correlation is not the same as causation. Ice cream and shark bites are very highly correlated. As the ice cream sales increase, so do shark bites. As they decrease, so do shark bites. What does that tell us? Does that tell us that ice cream causes shark bites or that sharks could maybe have this sweet tooth? No, not at all. There is a link between the two, but there's some other piece that we're missing. So if we were just to look at that data, we would start making false conclusions, introducing bias, thinking that both of those things are very important to each other. There is a little bit of a difference. Now this goes into interpretability a little bit, but let's just think of it in terms of bias. If there is a correlation between ice cream sales and shark bites, there is an explanation as to why that you are correlated and that is summer. The warmer weather of summer makes ice cream appeal more to the average person. So then naturally there are more sales in the summer for ice cream than there would be in winter. And the same is true with shark bites: more people are in the water, so therefore more people are going to be bit by sharks. In the winter, there's less people in the water. So of course there are going to be less.

So while that example illustrates the point in a simple way, reality is not that simple. Let's take, for example, the link between cholesterol and heart disease. Does high cholesterol cause heart disease? Well, it's very easy to think so. Here's another one that I've heard on the radio often, and this was given by a very prominent physician—That an atrial fibrillation makes you five times more likely to have a stroke. That simply is just incorrect. What the statistics are actually saying is that there's a strong correlation between a stroke and the risk thereof and an atrial fibrillation. The correlation is so strong that the risk for stroke could be five times higher. It doesn't mean that if you had an atrial fibrillation, all of a sudden, now you're five times more likely to have one. It doesn't cause it. It is, though, strongly correlated, so it's a risk factor. It's something we should keep an eye on. It could potentially put you into another category of thinking. Think of it this way: If we observe individuals who maybe have had a stroke, we noticed that a whole lot of them may have had atrial fibrillation and much, much less of them don't. So our intuition then tells us there's something here. That exactly is what stats is all about. We're just looking at it and saying there's something there. We have to be very careful not to turn it into causation. So you should be skeptical anytime someone says there's a 98% chance that this will happen if that happened. That is simply not true. That is not what the statistics are trying to say. But if you've fallen into this way of thinking, don't feel bad. Experts fall into this all the time. Statisticians do it. Physicians, highly educated people, PhDs. It's a very easy trap to fall into. I remember one of the first stats courses I took, they made the statement, "rule number one, never lie with stats." Now, of course, I don't think anyone really intends to lie, or at least most people don't intend to lie with statistics. But suppose you're showing a chart of some statistics and maybe you, you magnify a section or you're showing it on a logarithmic scale and you're not telling anyone so that it magnifies the differences more than they actually are. While you are not really technically lying because that's the scale, that's the statistics. If you haven't pointed it out and you're representing it in a certain way or not even including some of the caveats of how you got there, that's lying with stats. So my caution: Just be careful when using statistics; they’re are very dangerous weapon.

I want to point out another really important factor to think about when we're generating these machine learning models. Sometimes our goals are just too aggressive. I remember a colleague of mine some years back fresh out of machine learning school said that he wanted to create an algorithm to predict breast cancer. That isn't very realistic. Instead, we should probably focus on things that are reachable. For example, can we save the physician some time? An MRI exam can produce thousands of images. How long would it take to go through thousands of images to review each one, to see if there's something in it? But many times there's just not all that much interesting information in most of the images. It's a few of them that are really, really telling. Could machine learning, maybe help bubble those up to the surface so that the physician can see the few most important images? And as the technology advances, they're going to be able to produce many, many more images at higher resolution. That makes it really hard for radiologists to sift through all that. We can apply the same thinking to cutting down the volume of data or maybe in getting a prior authorization done so that you're already able to know what you're paying for before you even start the procedure. It doesn't have to delay your treatment, or what if we could create a model that can do some interaction checking in between the different kinds of drugs that you're on to make sure that the one that's about to be prescribed doesn't interfere or interact, or maybe take a look at your allergy history. Or maybe we can do something like the restaurant-based matching algorithms work. For example, if Yelp knows what kind of food I might like, maybe we could do the same kind of thing for medications start thinking of it in terms of this medication may bother your stomach ‘cause other people like you seem to have an issue with it. To me, those are far more reachable goals than something really lofty.

You have to think of it like this. If a human can't do it, if a human doesn't know what it is or how to do it, or how to determine it, a machine can't know. That isn't to say that machines can't help us learn new things. That is also, I think, a really important application of computer science. For example, a machine can maybe group things together in new and interesting ways so that we can see correlations. Maybe we didn't see before, because in the way that we currently do research without machines learning, a human has to look at the data, sift through it, see if there's some kind of pattern determined. If there's some kind of correlation, then start creating a study to prove it. If we were to take all the real data as it sits, actual data, we can start seeing if there actually are correlations, not just between individuals, but between populations that could help us in all sorts of problems. Think about what that could have done for us during the COVID-19 pandemic.

Again, remember we can't just give a machine everything. So problems like predicting cancer are really impossibly hard.But what we can do is give it all of the things that we do know and see if there's some interesting patterns that shake out. Imagine if computer science could find new compounds for predicting drugs or maybe predict our biological age. What does our body think it is? How is our body reacting? What are some interventions we can implement in our lives to be able to improve our longevity or maybe just to improve the quality of our life or our happiness? See, those are really, really interesting. And in fact, I have to say that those are doable. Now we're onto something. After all, if we can predict what restaurant I might like or what posts might interest me or what ads are interesting to me, surely I can predict what kinds of interventions can I have in my life that will make me happy. What ways of exercising might work for me and be able to have the outcome that I want, but it might be something I might stick to? Or what about understanding my genome? What if I could use machine learning to bring out any interesting things? We know that the human genome is incredibly complicated. It's gigantic. We've only begun to scratch the surface on how to understand it and to be able to compute it. It is so big that most of it is not included in our medical record. In fact, genomics is in its infancy. We may have learned how to decode the genome, but that doesn't mean that we know what any of it means. We actually know very little.

So today we've talked about some really important topics. I wouldn't say that we've talked about the dark side of computer science. I would say more of that we had lofty ambitions, and it didn't play out the way that we thought. We talked about how we might introduce bias into what we do, maybe misinterpreting the results, or maybe having unreachable goals. Next episode, we'll talk a little bit about some ways in which technology can benefit us and what we have to look forward to. We have been doing some things right.

I’m Angelo Kastroulis and this has been Counting Sand. Please take a moment to follow, rate, and review on your favorite podcast platform so that others can find us. Thank you so much for listening.