Counting Sand

Dynamo: The Research Paper that Changed the World

Episode Summary

In this episode of Counting Sand, Angelo takes a deep dive into why the Dynamo research paper is essential to our modern world.

Episode Notes

The cycle between research and application is often too long and can take decades to complete. It is often asked what bit of research or technology is the most important? Before we can answer that question, I think it's important to take a step back and share the story of why we believe The Dynamo Paper is so essential to our modern world and how we encountered it.

 

Citations:

DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (pp. 654-663).Lamport, L. (2019). Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport (pp. 179-196).Merkle, R. C. (1987). A digital signature based on conventional encryption. In Proceedings of the USENIX Secur. Symp (pp. 369-378).

 

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: I have this philosophy that research and practical application can be put together and that the cycle between research and application is way too long, it takes decades. And so I'm often asked what bit of research or what technology, what paper do you think is probably the most important? Of course, that's a loaded question, but I have several in mind, but before I answer the question, I think it's important to take a step back and I'll tell you a little bit of the story of why I think this paper is so important and how I encountered it. I'm your host Angelo Kastroulis, and this is Counting Sand. In the spring of 2016, I was in the middle of my grad school work. I was one of those people who went back to grad school later in life. And I think that that made a deep impact in the way that I viewed the world. I was one of the oldest people in the class by far, but I think that that was an advantage for me. It helped me squeeze out every bit of knowledge where a lot of other folks, the younger folks, were coming right from their undergrad into grad school. They weren't exactly sure what they even wanted to do with this degree or what exactly they were interested in. They were trying to find it along the way, but when you're decades into your career, you have a bit of a razor sharp focus, you know exactly what you're trying to get out of it. I was enrolled in a series of courses at Harvard University around big data systems. They were taught by Stratos Idreos. He would later become my thesis director and you have met him on previous episodes. I also met Manos Athanassoulis who was doing postdoc work in the Harvard Data Systems Lab at the time. And you've met him too on a few of the episodes. He's now at Boston University, a professor of his own. That course was different than any I had taken before. It wasn't a normal grad school coursework, lectures, homework assignments, exams. It was different. Most of our work was centered around reviewing research papers. We read dozens of them. We had to look at them critically and see if there was something we could get out of it and learn. Sometimes we'd look at the paper and see where their thinking process was wrong or where we can build further work. Ultimately, in that course, we were building something of innovation in our own selves. One of the earlier papers that we read was the Dynamo paper. It was a big influence on me. It was officially entitled "Dynamo: Amazon's highly available key value store". In this paper, I think you'll find it very interesting. In fact, I think it changed the world. What made that paper so interesting to me? Well, it wasn't that it was a bunch of invention, rather, it was innovation. You don't wake up one day and come up with innovation like that. It takes years and years of that being done through many researchers and you build on the back of previous ideas. That's why you have to read lots of different papers because they will look at the world in a specific way and they're all trying to contribute one little piece of new knowledge to the big puzzle. And that's what you're trying to do when you write your papers, you're trying to contribute one little piece of the puzzle. And so I like to think of innovation as not creating something brand new that didn't exist there before, but rather collecting ideas that existed and applying them maybe in a brand new and novel way. This paper did exactly that. It describes itself as a thesis of well known techniques to achieve scalability and availability. Data could then be broken into pieces and copied to different nodes, and we call that partitioning and replication, replication is the copy, using consistent hashing. Hashing is just applying some kind of mathematical formula to be able to have consistent results. For example, say we could apply a mathematical formula to determine if something is even or odd. And if it's even, it goes to one server. If it's odd, it goes to another. We just did a really simple hash there, but it can get more complicated in order to determine, say among thousands of machines or tens of thousands of partitions, you could determine what goes, where you would just have to always get the same mathematical result whenever you hash it. That idea came from a 1997 paper presented at the annual ACM symposium. It was entitled, "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hotspots on the Worldwide Web". That's an important paper, because if you could find a way to not have hotspots. In other words, what if you have more odd things than even things that means the one server that holds all the odd stuff, that one odd partition, would be bigger and have more work to do than the even one. And those are called hotspots. And we want to eliminate those. What if it wasn't the worldwide web we were applying this to? But instead a database or an index? We could then evenly distribute the load and have predictable reliability and scalability of our data systems. And that is exactly what the Dynamo paper asked. However hotspots are not the only problem we're going to encounter in something like this. We're also going to encounter a problem if we have two pieces of data that update the same thing. For example, I update some information, then I update it again with a different value. Which one wins because they might not be executed in sequence? They might be coming at so fast. They might get out of order. How then do we partition in those cases? How do we make sure we distribute it properly? Actually, there is more interesting work done back in 1997 in a different paper, entitled "Time Clocks and Ordering of Events in a Distributed System", also known as the Lamport paper. In that one, they came up with a scheme of versioning things using a computer's clock, but they didn't apply it to a database. What if we could take pieces of data and version them so that we know which one is the freshest version, even if they come in out of order. That is another thing that a Dynamo paper put together. Do you see now how these can be chained together to create new solutions to a problem using pieces that were never intended necessarily to be put together in this particular way? But now you have a new novel approach. It's an innovation. Some of the other things related to data systems, though, not all data systems serve the same purpose, nor should they. There are different reasons for using different data systems and so they're gonna have different characteristics. For example, not all systems require all the plumbing that an SQL database gives you, or even that kind of powerful querying language. And we've seen that. We've created different kinds of categorizations of systems, for example, analytics and transactional systems. They work differently, they arrange their data differently, and indeed you ask them differently for the data that you want. Let's say that we just want a database to read and write operations and we don't need to do any complex joins. If that's the case, I could write a data system that is tuned for that particular purpose. So if some of these ideas date back to 1997 or 1978, why did we not have something like Amazon Dynamo earlier, the fact that it was written in 2007? Well, to answer that we have to go back just a little bit earlier. Let's go back to the fall of 2015, and that's when I took CS165, where we learned to build our own database kernels. Again, not coincidentally that was taught by Stratos Idreos as well. In fact, I tried to take every single thing that he taught. This course was a little bit more what you would expect. Lectures, reading a book, building databases, doing exams, and that kind of thing. Without digressing too far, the course was a little bit different than the courses I had taken in the past. We actually built a database kernel and then at night there would be these large data sets that would run against it in an automated way. And the next day there would be an updated leaderboard of whose databases was cumulatively the fastest. If that plays against your competitive side, then this class was definitely for you. We spent a lot of nights trying to squeeze out a couple milliseconds, over many, many minutes of running on millions and millions of data pieces. But returning to the earlier question of why is it that it took so long for these data systems to develop, let's think for a second about these algorithms and when they were invented. Let’s think about the climate they were in. In those earlier days of computer science our bottleneck was the CPU, but as we talked about in several episodes in this season, Moore's Law is coming to an end. And if you remember Moore's Law, it's that every 18 months computing power will double, but eventually, and it is now, that is coming to an end and not happening. In other words, the amount of compute we could do was limited back then. It didn't matter that I had bunches and bunches of memory. We were all waiting on the CPU. However, with that kind of exponential growth in CPU that we've seen since then now the CPU is sitting around and waiting for data to move. That is the new bottleneck So those algorithms that we created then to maximize the use of the processor are now not as critical. And the ones that maximize the movement of memory are now critical. Remember in the old days, we used to have tapes and slow discs, and then we moved to larger and larger memory and we've got SSDs or solid state discs. Now we have new kinds of memory that will persist. It will stay stored even when the machine is off and there's no power in it. However, we have something that's exacerbated our problems. The amount of data we're trying to move is growing in an exponential rate. So not only is memory movement the problem, we have much, much more of it and that's creating huge issues. Some of the ways to solve this is the kinds of data systems we're seeing. Ones that can distribute that over many, many, many systems, and then try to compute on the data, wherever it is. Try not to move it. Instead, we'll take our computations and break them apart and let them run on all these little pieces separately and we kind of bring them back together. That leads us to some other papers and we'll get to those. But going back to the Dynamo paper, if we were to take the data and divide it up among many machines, we can parallelize the computation, fine, but we create new problems. What happens if one of the machines goes down? Well, if a machine goes down, now you have a hole in your data, that data's gone. Some piece of it is gone. So in order to solve that we talked about this a little bit earlier, we will replicate it. We will make copies on some of the other nodes. Okay. That presents a new problem. What if data goes to one node and then data gets replicated to two other nodes and something happens and the different machines have different copies of the data? Something didn't make it somewhere or something like that. And so now we have a problem with consistency. How do we maintain consistency? Not just from versioning objects, but having copies of the data. One common solution to that is to take a page out of the book of most SQL based systems. And that is to just assign one server to be the master. It is the source of truth. writes go to that system and then it makes copies to other nodes, for durability's sake and consistency sake, but you can also read, then, from the other nodes. But now you have a bandwidth problem. You have all these other nodes that you wish you could write to, to have faster writes but now you have a bottleneck and you're gonna have slow write performance. And in fact, that is the Achilles heel of SQL databases because of this approach to the way they do it. And of course, the way that they internally maintain their structures. writes are a problem. Now, what if you could give your data to any of the nodes? And then they were responsible for contacting each of the other nodes who had a replicated copy and updating them. That's what Dynamo presented and to do that, they came up with this idea of maintaining a quorum. So if you write data to a system and you might write data to another system, we solved the problem with versioning and now our consistency problem might be solved, because they'll talk to each other and they'll eventually settle on what's the right one. But if you were to ask them, query that piece of data while that's happening, while it's in an inconsistent state, you're going to need to ask each other who has what, and they're going to need to vote. And that's what the quorum is. They'll vote. Usually keep some non-tiebreaking amount of replicas, and say you have three, you'll ask them what the value is and if two of the three agree, you have a quorum and that's what you return. So it lets you solve that problem with a technique called eventual consistency, because now this will eventually settle, but in the meantime, we can vote on it. And that is far better than having a single master node that can then go down or that can be a bottleneck for performance. But again, it depends on what the purpose of these data systems are. You can get inconsistent results. For example, you might get a wrong answer. If two of the three have the wrong answer, if you caught it just before it was able to replicate to any of the other nodes and the one system has the newest data, the others won't know about it. The vote will be wrong. That might not work for certain kinds of data, but it will work for a whole lot of other kinds. For example, you might not be okay with that, with a bank balance transaction, but you might be fine with it with a social media post or computing some kind of large result and your graph is really barely going to change with one little data element. And in order to have this eventual consistency, we have to relax a requirement called the ACID requirement. Now ACID is actually an acronym. It stands for atomicity, consistency, isolation, and durability. So we've honed in on consistency, meaning that the data has to be in a consistent state when a transaction starts and ends and anytime it's asked for the information. And in exchange for that, you get extremely high performance and you get very high performance writes would you trade it for that? It depends. If you were to take a whole data system down just because of the write performance, then yes, you would trade it. Now I will say you don't have to pick one data system and hitch your wagons to one thing. You know, each of these serve different purposes. You can have a very high write performant database that is accepting and ingesting all this data. And then you can have other data systems that are transactional and maybe those are managing your general ledger or they're doing other things and you can chain them all together and put them into a big ecosystem. And that is mainly what modernization is all about. But Dynamo proposed much more than that. If we have a node that has all the data or has a piece of data and we could divide it up among all the nodes, this idea that we just talked about of partitioning the data, if we made that able to be tuned, in other words, if a data system administrator could decide how many of these partitions there are, they don't have to match the number of nodes. The nodes can then have some number of partitions. Meaning if I had 10 machines or 10 nodes and I had a hundred partitions, each node could serve 10 partitions. And then they would replicate amongst each other, but I don't have to move all the data among all the node. If I had say, the idea that I only wanted to have two extra copies. So a replication factor, as we call it, of three, meaning there are three copies among the whole cluster, that means for one partition only two other machines would need to have it and seven of the others have to have no knowledge of it. And that's okay. That reduces the requirement for network traffic and storage. That's a tremendous advantage. It's better than the way that many SQL databases do it with sharding, it's a similar idea. It partitions or chops up the data and then sends it along. But it has to be a little bit more sophisticated than that. We talked earlier about this idea of hotspots. So you may have a certain number of partitions and then predictably the data, the same data, will go into the same partition all the time. Remember though, that certain partitions may get more activity than others because that data may be updated more often than another. And that was called hotspot. What if we could make that hashing algorithm a little bit more sophisticated? And that's what Dynamo did, they proposed the murmur3 hash to be able to determine a consistent place to put these, the same partition each time, but also give you the ability to tune any arbitrary number of partitions and be able to have a little bit of control over this. And it has to be very fast, because if you're going to hash every incoming piece of data to know where to put it, it's gotta be quick. So it has to be stable, meaning it always gives you the same answer. It has to be fast. It has to be tuneable. The Dynamo paper, if you haven't realized this already, it is the ancestor of a lot of ideas. And in fact, this murmur3 hash is now ubiquitous. It's supported in Dynamo, of course, but it's also supported in Apache Cassandra and Apache Kafka and many other databases. You'll find it all over the place. Okay, how exactly do we get rid of a master node? Somebody has to start the replication process. One of the ideas that came out of this paper is the idea of a ring. A ring is where everyone is a peer. If every node is a peer, which one do I write to? Well, theoretically you could write to anybody. And then that one will act as the leader of that particular work and then replicate it to other nodes. But you can't just replicate it to some random nodes. You have to know who to send it to, because it has to go back to the same replicated copy. You have to know who is my follower, who is replicating, who's my backup in case I go down. It's easy to figure out who the leader is because that can be assigned. So we'll have to do that. We'll have to have some kind of coordinator of this transaction instead of a coordinator of the data. And that coordinator has to have a little bit of knowledge of the ring. It has to know where the data lies, who has what. And there are ways to do that. The one thing I do want to say about the coordinator is that if it could be ephemeral, then you can have a coordinator of some other transaction. So if people are asking different things, as long as everybody has an idea of how the ring works, then you can do that. And there are ways to do that. Maybe with a technology like Apache ZooKeeper, which keeps the metadata of the whole cluster in its own memory. And then the cluster can ask it, hey who's the leader of this partition or who has this and who are the replicated copies? And then they can cache it among themselves so they don't have to keep reasking. Another method is the way that Apache Cassandra does it with something that they call the gossip protocol. And the idea is when a new node joins the ring, it says, hey guys I'm the coordinator or I'm the, the, I have these partitions and I'm leading this one. And then eventually all that data gets around and everybody gossips and tells them what they're each doing. You would think though, that that would result in a whole lot of network traffic, but in practice it actually doesn't and it's quite effective. It also eliminates something like ZooKeeper that has to store all the metadata. The really nice thing is that once you know who has what you can just ask that machine directly. You can go right to the node and say, I know you have this, what's the value? The problem with it is you have to know all of this. And so you can ask this metadata store as a client, say the client, asking the cluster, what's the value of this data? And it can get all that, but now the client has to maintain all that knowledge. It has to say, oh I know a lot about the cluster, I even know what nodes have what data. And so that kind of goes along with this territory. You have to have a smarter client. But let's say it isn't possible to get a smarter client and the client can't have access to that metadata maybe because, technically it can, it's behind say a load balancer which then does some kind of round robin and moves you around the whole cluster. You don't know who you're going to ask. It might not even be the same node next time. In fact, it won't be if you're in a round robin. Well, that gives us some advantages. Now the client doesn't have to know anything and in a situation where the coordinator could be anybody, now we can do that. Let's take the scenario, though, of a new member joining the cluster. So you have a ring and everyone has their assignments. And everyone probably has some data that's copied from some other nodes on themselves. That's how replication works. They'll have some data they're responsible for and then some data they're listening for. And they keep that as a second copy, that way if a node goes down, they now can serve that. They're just entrusted with a copy, that's replication. But individually, no node has the entire data set. They have some subset of the data. SQL style databases have been doing that for a long time. That's what we talked about with sharding. Sharding presents its own problems. We'll get into that one day, but if we add a new node to the system it has to be assigned some data. The Dynamo paper calls that tokens. So when a new node joins, it's given a token assignment, then it uses the gossip protocol to be able to maintain what partitions to get. So it says, I have a token assignment everyone, I'm new to the cluster, I'm allowed in here, what do you want me to do? And then they'll repartition data. Some node will say, you can have this partition and another node will say, you can have that partition. And now the cluster is in a state where it's rebalancing. It's moving data around to let that new node take some of the load. When that's all done, it's now able to serve requests, but as that's happening, that means that it's going to be responsible for some assignments. And it's also going to get some backup copies of its own that it's responsible for. And other nodes will then release their backup and their responsibility. And they'll remove that data. And eventually that will stabilize. Under certain circumstances though, let's say there's a network outage, what might happen? Let's take an example. Suppose a patient's medication is coming into one node and it's added to the patient record, but then some event happens that prevents it from being updated in the other node. It just doesn't get there. Well, the node has the data, the other node has an old copy of the data. But both pieces of data are still meaningful. The new and the old data are both there, but you can't just supersede it. You don't know when you're asking which one is important. I still have to remember these changes. And in other words, I need to remember the new change. So when the system comes back up and gets stable, I have to reconcile this change with the other system. That's particularly important in medications, because there may be an interaction between medications, some new one that just appeared on the list and I would happily continue to prescribe it. The system would be able to and should be able to handle scenarios like that. But I wouldn't necessarily want to override the data in the other system. You can't just take the version timestamp and just erase the other one. Dynamo treats each result of a modification as a new and imutable version of the data. The imutability is key to what Dynamo represents. In other words, you cannot change state of a particular data item. You cannot alter its value. That's not to say that the data can't be changed. There's a new version that comes in all the time. We just can't delete the old one. We just have to maintain their sequence. We say that this one happened, then this one happened. And so when you ask it, what's its value? You're really asking, what is the last current value of this piece of data? It becomes an immutable log. So most of the time, the versions are subsumed automatically by the system. It can figure out which one is the newest and latest one through the techniques we talked about. And that makes sense. The system should be responsible for saying which one is the authoritative version. That's easy to do in the happy case. However, if a system goes down in a middle of a write, it's durable and it's on a node, but that doesn't necessarily mean that we'll have a quorum. And we talked about that scenario a little bit earlier. Versions, sometimes, can diverge. Let's take a different example. Let's say that a piece of data was altered and another piece of data was altered on two different systems and there was an outage. And then when you come back, you can't figure out which one might be the latest one as easily. You may have version A and version B. You're not sure which one it is. How do we resolve that? How do we resolve which branch then is the correct branch, because they could have multiple writes after that and you have a whole host of cascading writes that now might be completely out of sync. That's where this other paper we talked about, that time clock paper, comes into play. The idea was that you could take a computer's clock and using it as some sort of identifier, not just a timestamp, because that wouldn't make sense, timestamps would happen in all systems, but to determine a version. Dynamo calls that a vector clock. This was in the Lamport paper we talked about. Now what's interesting is that, again, it did not apply the clock to a database. But in Dynamo, we're going to take that idea, we'll say, okay, if a client wants to write a particular piece of data, it must first read it and I must know which vector clock you intended to update. That solves our lineage problem. So now we know both of you were trying to update, say, a different vector clock. Or if you're trying to update the same vector clock, you two are colliding. And we can resolve that accordingly. Now we have context to the write and that's the key solution here. Yes, the downside is I have to read before I do every write. But the upside is we can then enforce consistency. Now, remember there are many nodes in a cluster and so a node would have to confer with other nodes to determine what the current state of the system is. Because remember, consistency is eventual. That's what we traded off for the extremely high performance. It's not immediate. That means it'll take some time for a piece of data to write to one node and we can return immediately. It's safe, but the system needs to eventually become consistent. So it needs to replicate to its copies and those need to say yes, I got it, we’re all on the same page. Now we're consistent. So it has to propagate the cluster. So if you ask then a particular node for a read for a piece of information, it might not be the latest. And again, that's where the quorum comes into play. Very common, is a quorum of two. That means the three nodes participated so you need to have two in agreement of the three. You'd never wanna have an even number because you can have deadlocks, right. A quorum of two on four nodes would be a problem. And so they each tell you what they think the data should be, and they return the same answer and whoever returns the most wins. We're hoping that the two of the three don't have the old copies and they're voting wrong. So you're playing with probabilities a little bit. You're saying what's the probability and that's a tuneable. That's why this can be tuned. So suppose though that we want a quorum of two, but one node went down. So we have three nodes, one went down, we can never get a quorum because we know that we won't have enough voting nodes to get a majority. That means that even in the most simple of examples and in fact, on systems of thousands of machines, dozens of them are down at any given time. That means that even very simple scenarios, we will just be unavailable. We won't be able to return an answer. We'll have a deadlock. We'll have two nodes left and one will give us one answer and another will give us another. The Dynamo paper proposed a concept called a sloppy quorum, where the operations are performed first on the healthy nodes in a preferred list. So they aren't necessarily the first X nodes that are encountered. How does this work? Okay, bear with me here. Let's say we have nodes, A, B, C, and D, and A temporarily goes down or becomes unreachable during a write operation. Well, a replica will have a copy of the data. And let's say that D is the replica. D is given a request that would have gone to A, that's called a hinted handoff. It was handed off to D because we know that D has a copy. So we hinted this data was originally intended for A now. And so D now knows it's getting a piece of data that's tagged with, this was originally intended for A. So now when A comes back online, D can give it the data saying, hey, I'm holding this for you. It allows the system to be much more resilient, have a higher up time and not be as susceptible to some of these problems. These quorum issues. However, if the churn of nodes is low, it works great. If the churn in the ring is high, you can imagine there's a lot of this going on and that causes system instability, maybe even increased latency because they have to, when they come back online, give them all back their data. So we need to say that something else has to occur. For example, let's say a node is permanently off. That actually presents us with a bit of work. To address that, the Dynamo paper proposed an algorithmic approach. Back in 1988, Ralph C. Merkle wrote a paper called "A Digital Signature Based on a Conventional Encryption Algorithm". Now that was intended for cryptography. Again, what if we could use that to be able to detect changes, applying an old idea, again, to something brand new? Now we've known for a long, long time that we could hash values and be able to use them to determine changes. And so Merkle's algorithm was actually applied in a new way to create a miracle tree. And here's how the tree works. Now the reason we use the word tree in computer science is we have a specific meaning. A branch, think of every little point of data as a node, a branch is a kind of node that points to another node, which then points to something else. So that's a branch. Makes sense. Branches continue until they get to leaves. A leaf is the last piece of data. So now we have some context of what we're talking about. When we say trees. Suppose we could take a piece of data though, hash it. And then we produce a hash value. That value becomes a leaf on a Merkle tree. The non-leaf nodes, these branches, are the nodes that have children. Those are also hashed with the values of their children. So then we can continue doing that so on up the entire tree, so each node has a hash value, but they're made up of the sum of the hashes underneath them. We could easily use that to compare two trees, to see if the data in two nodes is entirely different without having to go down the trees. Another big idea is the Bloom filter, but I'm going to pause, talk about that some other time, in fact, that one's probably better shown with a video, so don't worry about it right now. But I hope you're starting to see why it is that I thought that this paper was so big. It took all these really interesting and novel ideas applied to different things and put them into a new context that let us do something really amazing. It has a lot of really big ideas, a lot of innovations, but in itself, creating a database that thinks this way, that has an eventual consistency idea, is a big breakthrough. It's the new kind of data system that we've been waiting for for certain kinds of scenarios. And as I mentioned, it's ancestral, it inspired Apache Cassandra to exist. Cassandra would not have existed without it. It was the implementation of the Dynamo paper in open source. And you can see Bloom filters and these trees, and partitioning, that's all inside of things like Apache Cassandra, and Apache Kafka. You'll find these concepts reused over and over again. You'll find them in Spark, nearly every big data system there is. So if you look at just the list of references in these papers, we talked about just a few, there's some really big ones. For example, it references the other paper that I think that changed the world, Google's Bigtable paper. And that paper is worth discussing too. But again, that's another day. This paper, the Dynamo paper, changed my life. I wasn't exactly sure what I was going to do coming into this. I did know in grad school that I wanted to squeeze every bit of knowledge and I wanted to have an experience in AI and big data systems, but at a deep intimate level. This paper did that. In fact, it inspired my thesis. Which the line of reasoning is, well, there's a lot of knobs to tune. We talked about some of them. What if AI, a big neural network, could tune these knobs? We could do a way better job than a human being could do. And in fact, that's what I showed in my thesis, that a neural network can be 10 times faster than a human's best possible tuning. In fact, we could even get underneath the covers and adjust some of these algorithms. Although I applied it to Postgres. This one helped me decide though, what I was going to do. And it helped me realize where my passion led in big data systems. I knew I wanted to use AI. I knew I loved big data systems, but in reality, these systems are used for something. And so, a lot of the work I do in healthcare is based on these ideas. We don't always build databases, although sometimes we do, but we do look at particular scenarios like this, and it helps you understand how to put together solutions. So, while I hope that this paper was interesting to you and you'll dive in and you'll read it, maybe even read some of the papers that it references, I hope that it gives you a little bit of insight into how to put together solutions that will look at the world a little bit differently. Dynamo is not the solution for every problem, but no database is. I'm your host, Angelo Kastroulis and this has been Counting Sand. Before you go, please take a moment to follow us and to subscribe to the podcast. You can find us on LinkedIn, Angelok1, or you can follow us on Twitter @AngeloKastr or you can follow my company @BallistaGroup.