Counting Sand

Simulating Biological Systems Part 1

Episode Summary

Angelo is joined by fellow colleagues Andy Lee & John Ryan as they dive into what is possible and practical through the process of simulating biological systems.

Episode Notes

The episode starts by asking the question, what if we could use computer science to shorten the amount of time it takes to discover new medications? Angelo then shares, "If we meditate on that for just a second, our minds might wander over into the world of machine learning and artificial intelligence, where we can imagine a world where these complicated neural networks or other types of AI are trying to discover a new kind of chemical compound. We might even think about the far future things like quantum because it has application in chemistry because chemistry can be thought of as an optimization problem. Or we could do something like a simulation. What if we could simulate the chemical structures of the world, or we could even simulate the body. We could conceivably introduce new kinds of compounds to the body and see how it reacts."

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta;

Communications Strategist: Albert Perrotta;

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: Discovering new medications is a long and expensive process. It takes many, many years, but what if we could use computer science to be able to shorten the amount of time it takes to discover new medications?

If we meditate on that for just a second, our minds might wander over into the world of machine learning and artificial intelligence, where we can imagine a world where these complicated neural networks or other types of AI are trying to discover a new kind of chemical compound. We might even think about the far future things like quantum, because it has application in chemistry because chemistry can be thought of as an optimization problem.

Or we could do something like simulation. What if we could simulate the chemical structures of the world, or we could even simulate the body. We could conceivably introduce new kinds of compounds to the body and see how it reacts. And that's what we're going to talk about today. I'm your host, Angelo Kastroulis and this is Counting Sand.

I'm joined by my friend, Andy Lee, actually a returning guest. You might remember last season, we talked about the work that he's doing as a chief operating officer of Vincere Biosciences in ways of improving our lifespan or at least measuring our biological age to help that that happened. And Andy's also the chief technology officer of a company called NeuroInitiative which simulates chemical compounds. And that's one of the things we're going to talk about today. I'm also joined by one of Andy's colleagues, John Ryan, who heads up the technology aspect of their organization.

And as I've mentioned in the past, Andy and I have known each other a really, really long time, we go back to before he started any of these companies. In fact, I remember the very first years of Andy writing code with his idea that he and his wife were coming up with to try to solve this idea of simplifying drug development through simulation, got some grants through IBM in a research center and the Michael J. Fox Foundation.

John though I don't think you and I have met. So tell me a little bit about yourself and what brought you to Andy's company and what made you even think about getting involved in this kind of field?

John: Sure. So, I initially got my start into programming through a recording engineering degree program. So I was at a university for recording engineering and

ended up getting into doing game audio and met some programmers who were doing game dev and thought that was really cool.

Ao a few years later after graduating from that, I decided to pursue a programming degree. And while I was attending St John's River State College here in Florida I happened to be attending there during the time when Andy and Spring were starting up NeuroInitiative.

And they came out to the college to look for some some students that were interested in pursuing this new simulation project using game engine technologies and I thought it sounded really cool and it was a chance to use something that I, you know, an engine that I'd already been working on, which was the the unity engine at the time, to do something that was completely different.

You know, not just making games, but actually doing something really cool with science. So I pretty much immediately was interested in an onboard. So started out as an internship. And and here we are six years later.

Angelo: That's awesome. So he kind of hooked you on this idea of using something maybe you thought was right in the middle of gaming for something that could maybe save people's lives.

John: Yeah, exactly. To see this entertainment tool, you know, is how I'd always thought of it as something to save people's lives and to cure disease and to do all sorts of other interesting work was something I had never thought about at that time.

Yeah. And it really was you know, opening the door to a whole different world of use for technology because before that I had been pretty much just making games and learning some basic, you know, going through college and learning the business applications of programming and computer science.

So and seeing the science and biology side to all this, but something that interests me, cause I I'd always had an interest in science and biology from a young age, but was never something I thought I'd actually pursue until this happened.

Angelo: That's awesome. You know what I love about that? I love that we're able to kind of bring new and fresh perspectives to computer science problems.

Cause we might think, well, let's bring someone in with a biotech background or let's bring someone with pharma background or someone who's got a heavy engineering background and instead you're going no, let's think about this problem a completely different way. And let's attract people to this industry that are not necessarily thinking in that way.

That is really cool. Andy, I think you're onto something there.

Andy: Yeah, I think really exciting things happen when you can get, when you can cross functional boundaries and pull people together that are looking in these different areas. And, you know, that's kind of started with Spring and I, with her having the neuroscience background and having software engineering is kind of how we started thinking about some of these and you know, even within the biology space, we do the same thing.

Like so many researchers are really focused in a niche of like, they just look at Parkinson's and they only use the Parkinson's knowledge to make their hypotheses. And, you know, they get really deep into mitochondria or lysosomes or some area and don't really think about what else is in there.

And so, being able to pull all of the data together in this unique way has helped us to cross some of those boundaries, like the AI does some of that for us and allows us to, you know, have the big picture integrated into how we're modeling things. It’s really fun to start to see where those are going.

Angelo: So not to take us too far on a tangent here, but how do you attract people like John to this industry? You know, how do we, I have the same kind of thing where I'll have these discussions with maybe math PhDs, and you try to convince them that, I can teach you programming.

What you're doing is really interesting and we can apply optimization techniques to computer science, to many things, you know, chemistry, physics, other things that they might normally gravitate to. How did you attract folks like John to, I mean, there's the interesting problem we just talked about, but how do you kind of entice them into the industry?

Andy: Yeah, it's interesting. I think one of the nice things about software is it can be applied toward anything. So if you can kind of open that up, then you can you get this great opportunity that a lot of fields don't have of like, what do you want to

apply these tools to? And, you know, you can make money doing just about anything in software, right.

So then you have this great opportunity to go above and beyond and think about doing something that is really fulfilling. So I think, you know, putting forth the chance to help help the world while also honing skills. I think one of the things that we prioritize is the ability to explore and experiment and kind of find tools and techniques that are interesting.

So I think there's been a lot of opportunity to keep up with trends and to learn, but I'm actually curious to hear John's take on what it was of all of these things that we kind of put forward. What was it that resonated that made you decide that this is the place you wanted to dive in.

John: Yeah, I mean, well, early on when I first joined on the simulation was pretty much everything at that time, you know, that was the core focus and for me, it was the fact that I already had been using this game engine and I've been making games.

And then looking at Andy's early prototypes that he had coded before I joined on, you know, I got to see, okay, this is all of the same stuff I've already been doing, but now these shapes, you know, these polygons are actually representing something real. And it's not just a fantasy world that you're building, which has its own merit.

But you know, the fact that we're doing something real, for me that resounded to, work on something so practical and actually something that could change the world potentially. To kind of zoom out a little bit at a higher level, what's interesting and to answer the question more towards how do we get more people that are in this position? I think we have been pretty good about looking at recruiting at the, you know, the college level we've had you know, a number of interns that have come through NeuroInitiative and worked on projects and getting young people who are maybe rough around the edges, so have a lot to learn, but have that passion for learning that passion for growing and are really interested, very open-minded and interested in doing things.

You find a lot of that at the college level, you know, that's where I think we're most open and hungry, looking for the next thing, but in order to actually attract those

people and have them come to you and be interested you have to have a good idea and something fresh and interesting to offer them.

You know, I have past colleagues and classmates that were that went off into different areas of business and IT, and working in different fields and they follow those opportunities when there's a career potential there, or money-making potential and to have a company that at this point, you know, I think we're pretty well established.

We've got you know, some sustainability now going forward. Early on, it was a startup, right. And it was fresh and it was kind of a risk to take. But I dunno for somebody like me, I like taking those risks. I find it exciting. And you know, I'd argue that maybe some of the more exciting people are the ones who are willing to take those risks and try out the new technology.

So if you have something interesting and new and on the edge, you're going to find exciting people that want to be a part of that.

Angelo: That's a great observation, John.

I mean, it reminds me a little bit too, you said something that I want to hone in on a little bit, you mentioned hunger. When I hire for my organization, there are two things I look for and that is obviously they have to be smart, you know, because to do things that someone's never done before, requires them to put in that kind of effort and pick things up quickly.

And that can be stressful. The second thing that we look for, probably more important than the intelligence, is the hunger. They have to love the craft. They have to believe what we're doing and want to improve themselves constantly because in this field there's no stagnation, you know, you can't be the Cobolt programmer in 2010.

I mean, you can, you can do that. You can find a niche, but to do the real cutting edge stuff, you have to constantly learn new things. One of the things that I look at for my senior engineers is that you have to be able to learn a new language, and I'd like to you to learn a new one every year. And if you're learning new programming languages constantly, it means that you're not fixated on any one thing.

And you start seeing the world much bigger. So that hunger is super important. I'm glad you mentioned that. It really resonated with me. So talking a little bit about the simulation that we had chatted about. We kind of just mentioned it throughout here. I did a little bit of research myself in simulation.

I've done some simulation in various ways. We did some clinical decision support tools and I'll talk about those later, but simulation is such a big, broad topic. There is simulation done, I read a great paper on simulation in surgery where what they do is they take fluid dynamics and try to compute blood flow in vessels.

And then how the vessels would react in surgery. That's one part of it, I think, simulating the body and how the body would react to things. But then there's probably simulation on the chemical structures themselves. So tell me a little bit about what simulation, you know, what are you guys simulating and how have you kind of found that effective.

Andy: Yeah you know, one of the challenges there is figuring out what level you are going to simulate. Like how deep do you go? Because you could go down to the energy minimization at the protein structure and chemical structure and docking simulation. And we do some of that at certain points of our project, but then our core is trying to scale that and find a happy point in the middle of biology simulation.

And, you know, you could go the other end of it and just do reaction rates and systems biology type modeling where you're, not getting into the physical aspects at all. So finding that point in the middle was an interesting challenge.

Angelo: Yeah. So when you do biology simulation, what are you simulating?

Andy: Really looking at the, you know, if you think of a cell, most of what we're doing is simulating a single cell. And then looking at the interactions that happen inside of that where you've got quantities of proteins that have a certain properties, they have a mass and a size, and then the transcript Omix that you can measure from patients can give you the quantities of those.

So you pack them all into a three-dimensional space that is defined by the observable shape of the cell and the components inside the cell. And then using physics to play that out over time, you can then kind of model the movement of

those proteins within that cellular environment and then layer in interaction rules that can drive some of the state transitions.

Angelo: Interesting. So if you found a compound and you weren't able to do the simulation, I mean, it could be feasible this compound could just float around and never interact with anything, even though it could it would never bump into anything effectively.

Andy: It is. Yeah. I mean, and there are, depending on the quantities, you may have things that will never interact. And there are tipping points where you have the right ratios of interactors where you'll get very drastically different dynamics changes.

Angelo: So how would you, now I can think about this in terms of a machine learning approach, you would want to try to figure out, and there've been papers on it. For example, there's a paper where he talks about trying to extract features from these. And that's a typical ML approach.

Track some features here and try to put them into a model. Of course, when you do that, you're encoding the model is introducing bias, but as you're putting these into a model, you're trying to figure out something like a decision tree of how this would work. How do you guys do it in the simulation side of things?

So you're not really extracting features, you're using physics and some other things. How do you do that in the simulation engine?

Andy: John do you want to take that?

John: Yeah, so we use existing knowledge on interactions. So, you know, we look at what's publicly available to see what interactions are occurring between different proteins. And so there is a curation process here that we have to have the rules to drive the simulation. And since we're looking at interactions on a you know, protein-protein level or protein-chemical level, you know, those rules have to be filled out into the system.

So and this is one of the big challenges with this kind of system is actually, you know, filling out the knowledge base and getting curated information in to drive the simulation. So the simulation is going to be as good or as accurate as what you can put into it.

And that I think is one of the biggest challenges we have is getting that information in there because it does have to be curated and there's a lot of data out there that's publicly available. But then getting that all into one format that you can use to drive the rules of your system is an arduous process.

So what the approach we've taken is by focusing on certain pathways and just looking to simulate those pathways, we can then focus on getting the rules that we need for those pathways into our system so that we can hone in on that pathway, those interactions, and observe what we see and then we can introduce hypotheses by putting in hypothetical interactions.

If you have a potential drug that you expect to to interact with certain proteins or certain chemicals in a certain way, you can then add those hypothetical interactions into your system, simulate that and then observe what happens in the simulation to further test your hypotheses and this environment.

So by looking at just small pathways at a time and honing in, we can fill out the interaction rules for those systems. And then once you've done that for a number of pathways, you can then start to combine those together to then start looking at a larger picture of the cell and to see then how these different pathways interact with each other.

So it's a process where each kind of step you can do these projects, these experiments, you know, at a pathway level. And then each time you do a new one, you're adding more information to the knowledge base and those build on each other. And we're working towards gettinga very large picture of the cell to be able to simulate more complex inter-pathway processes.

And the more we do, the more that the whole system grows as a whole.

Angelo: Oh, interesting. Okay. So that makes a lot of sense. It’s an expert driven system in that you encode the rules into the simulation, the simulation kind of uses the game engine idea, you know, the idea of running physics, because that's what they do.

Running the physics you encoded to be able to simulate how it's going to play out. In my experience, you know, it's hard to encode rules that you need an expert to write because they just aren't computer scientists, you know, they wouldn't necessarily think, okay, I need to encode this in something. How do you guys,

John, how do you surround that problem of getting these expert rules into the system so that you can run it?

John: Yeah. So I think the, the big key here is developing an interface that can be used by the experts in our case, you know, expert biologists that can input this data. So having an interface layer that makes sense to a biologist to where they can fill out what data is necessary for the system in a way that makes sense to them in a format that makes sense to them.

And then through APIs, you know, we can then translate that information into the rules that go into the database that actually then drive the system. And we also try to store the data in a way that makes it useful, not only for the simulation, but for other work that we do, whether it's, you know, data science, analytics, network analysis, you know, a number of the other kind of work that we perform.

We try to keep as much data as we can in a format where it can then be transformed or translated into different use cases. So there's a translational layer that happens when loading up a simulation, where we take all the information we have about the proteins and the interactions that need to go into the simulation to create the rules for them when we run a simulation. You know, we translate that information into rules that work for this simulation, but then when we perform some other kind of data science process, we may take that same data and transform it and another way that can be useful for another use.

Angelo: Okay. Well, that makes a lot of sense. So you've created effectively your own DSL for these, so the experts can interact with the rules that way. So you have kind of like the problems, a lot of rules engines also have, right. Where they're having to kind of take this domain specific language and compile it into something that the computer can run because it can't run a concrete syntax.

So can you tell me a little bit about that process you went through, you know, what kinds of mental pathways did you go down and what kind of tools did you find were valuable? You know, how did you do that?

John: Yeah, this has been an ongoing process too and we're still finding ways to improve on this and make this better, you know, early on it was just as simple as we started with kind of A interacts with B and this is the result. And as far as, you know, working with that, it was fairly simple to then have, you know, a biologist, an interface that they can put in A, they can put in B and they can put in the result.

And then as we go on we try to find new ways to add more information to that and to be more specific. As far as tools we've used, I mean, we've looked at what others are doing. you know, looking at the kind of syntax and the rules of other interaction databases and other simulations.

We've looked at like SBML format and other rule sets to kind of get an idea, but as far as the actual translational work it's been, you know, just code that we've written in our API or you know, at our program level to actually translate the data into the rules that we need.

Andy: Yeah. There are some interesting tools out there for like graphically building network models where you can kind of put the entities and lines between them to denote an interaction. And, you know, we experimented with some of those and they were, you know, it's complex to do that at scale for like a small model.

The biologists could do that and the dragon droppiness of it was nice and they liked it. But then once you got into, you know, fully describing a model, it gets really overwhelming, really fast. And so we found it was better to break it down and to just define this one interaction. If these two things come in contact what happens, right?

And so then you can start to build those up one at a time where it's really like the biologist just has to think about like, I understand this fact. When this condition happens, this thing happens. So then by building those up one at a time, then the system is able to take these, you know, thousands of rules and put them together into a system that works and, you know, as John mentioned, it was interesting seeing the evolution of A plus B is one thing, but then we start to realize, well, A plus B only when A is phosphorylated and only when A is phosphorylated and is also conjugated with ATP, right? Or something like that. You have multiple states that you have to keep track of.

And only that state when it happens to be inside the lysosome. So now you've got localization and state and proximity or interaction with some other entity. So they're trying to get the interface together where it's still easy for them to think about, like, this is the condition that's going to cause this but now also tracking these states along with it.

And ultimately it's web UIs and simple form type interfaces, but getting the right workflow of those in the right fields to make that easy to do. where do you

constrain them to a set of options? Where do you give them free form to create something new? A little bit of thought into making that flow.

Angelo: And when we think about this problem, you know, I've lived in this problem as well, but from the clinical side. It’s a similar problem where we break the guideline, say a clinical guideline you might want to include in a decision support system. And what you want to do is, they can be enormously complicated, but one approach is exactly what you're talking about.

You create some facts around the system. So you'd say we'll make some facts, little bite-sized pieces of functionality, like, is somewhat advanced in age? Okay, what does that mean? Well, you can code that little bit up now I have a fact. Same kind of thing, is there a Metformin contra-indication say in their history?

Well, how do you determine that? I don't know. It depends on, do they have diagnosis? Is there some medication they're taking? I mean, there can be many ways we infer it, but you create that as a fact. And then you can start building bigger rule sets off of these small facts because you're right it gets overwhelming.

But then the technical challenge, which I think attracts someone like John, is that you'll say, okay, we have this way of now encoding this knowledge. How do we execute this? Because this is not a simple problem. Execution is hard. You know, in my world, we execute to get a result, an answer. Your world, you execute so that you can produce a simulation to figure out what is the progression of this?

So how hard is that simulation, I mean, is it just once you encode the rule, you just run it and it's near real time.

John: Yeah. So as far as adding, you mean adding new rules to the system and then how soon you can then just run that simulation? Is that what you’re asking?

Angelo: Yeah, when you run it is it really fast?

John: Yeah, so we, I should mention here that we moved from you know at some point in our journey, we started with the game engine and we did eventually move to a custom engine. So we have a custom in-house engine that we developed written in C and C++ that runs on Nvidia GPU's.

So we moved to GPU's because they offer, you know, a huge performance increase. While game engines are interesting and fun to work on, performance for a simulation like this, where we're simulating, you know, tens of thousands and beyond entities they're just not built for that kind of work.

So we went to a custom solution which can run much faster. So when a new rule is added, or new set of rules, you know, generally we're adding up to hundreds of rules at a time before going on to test a new simulation. Once that data gets into the knowledge base, it's pretty much ready to go to run that simulation, right?

The rules get transformed from the data into the rules and the, in the simulation at runtime. So as long as the the data coming in is good, then those rules get translated and we can immediately start running a simulation. Actual simulation runtime varies on the size of the simulation.

And how long do you want to run it for? So, there's no real concrete answer on how fast the the simulations run, but the iterative process of adding new rules and then being able to kickoff a simulation run with those new rules is pretty much right away. As soon as you get those rules in, we have a model building process where you take the rules that you've added.

And you, you build your model again, using a web interfaces here where we pretty much stick to the web UI. It's easy for the biologists to use to just run in their browser and add their data, build their models. But once they've added their new interactions, they can use our model building interface to then build their model of simulation and then pretty much right away, you know, kick it off.

So it's a very fast iterative process for the biologist to go from data in their sheet to a running model.

Andy: Yeah, pretty proud of the kind of architecture around that. The way some of this works that, you know, a model is defined by a set of interaction rules and a set of quantities for the proteins that will be in there, the biochemical entities. And so those two things kind of together make a model. They can use this UI to put those together and say, I want this set of stuff to be my model and that posts and stores to a database.

And right from there they can click a button and say, I want to run a simulation of this for, you know, a hundred thousand simulation seconds to see how that system

will run and the backend server farm of a simulation engines will watch a queuing process that'll just automatically pick that up, run it on the next available GPU set and run through this simulation, streaming back locked results to a data source.

So the the biologists can, you know, right through web UI start to see the expression levels changing and queuing up and graphs forming of the simulation as it runs, popping up on their dashboard how long it's run for and when it's completed so that they can go and then do analysis with that within API is built around that where, you know, if they want to just look at graphs, they can do that in our web UI.

If they want to pull that directly into our Python. the hooks are there to do that with some prebuilt, our scripts that they're able to do to generate more detailed analysis of those rules.

Angelo: Can you give me a sense of how hard this is? Computing? What kind of computing power does it need and how long do these things run these simulations?

Andy: Yeah. A typical, you know, project like a full scale, like an experiment, you know, we'll typically take on the order of a week to run across the cluster of 24 GPU’s, in that magnitude.

Angelo: Okay. So it is a complicated problem. What's interesting is in these kinds of problems, you know, hardware makes a difference. There's no question, right? A new generation of GPU's comes out that has some new capability. That makes an impact. But algorithmically, like you guys said you made your own engine. Sounds like there's a war story there where you were able to find immediate benefit.

I’m your host, Angelo Kastroulis, and this has been Counting Sand. Thank you for joining us today. Before you go, please take a moment to like, review, and subscribe to the podcast. Also, you can follow us on social media. You can find me on LinkedIn @angelok1, feel free to reach out to me there. You can also follow me on Twitter, my handle is @angelokastr. Or you can follow my company Ballista Group. You can also find the show on countingsandshow.com or your favorite podcast platform.