Counting Sand

The End of Moore's Law Part 2

Episode Summary

Angelo is excited to welcome back his friend and colleague, Manos Athanassoulis. Manos is a Professor at Boston University and Data Systems Researcher. In this episode of Counting Sand, Angelo and Manos dive deep into what does a database actually do? What is at the core of a data system? Most importantly, how can we use new and old techniques to free up the CPU's load by algorithmic trickery.

Episode Notes

The last time we had Manos on the program, we talked about Moore's Law coming to an end. It's important to note that we can't rely on just sheer computing power doubling to be able to meet our ever-increasing demand for data. We must find new and exciting ways to collect and compute large amounts of data. In this episode of Counting Sand, we will dive deep into what does a database actually do? What is at the core of a data system? Most importantly, how can we use new and exciting techniques to free up the CPU's load by algorithmic trickery.

 

In a time crunch? Check out the time stamps below:

[00:53] - Guest Intro 

[01:30] - Intro to data systems

[03:00] - Hardware types

[05:00] - Why is it important to choose the right format

[10:15] - What is column storage, and what the benefits

[16:30] - Injecting the CPU, The hierarchy of memory

[20:00] - Why not just duplicate data

[22:55] - Acid properties

 

Notable references:

Relational Memory: Native In-Memory Accesses on Rows and Columns

 

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: We've talked about Moore's Law coming to an end in several episodes. And what we mean by that is that we can't rely on just sheer computing power doubling to be able to meet our demand. So given that processing power is not something we can rely on doubling indefinitely, we have to be more careful in our use of CPU, more efficient in the way that we use it, what we're asking it to compute. And we have to find some interesting ways to do things. In this episode we're going to dive a little deep into what does a database actually do? What is the core of the data system and how can we use some techniques to be able to free up some of the CPU's load by doing maybe some algorithmic trickery. I'm your host Angelo Kastroulis and this is Counting Sand. Okay. I am so excited to have my friend and colleague, Manos Athanassoulis back again, Professor at Boston University and Data Systems Researcher. And this time we're going to talk about something a little bit different in the way that we access and represent data. And especially in the thinking about it in terms of how Moore's Law compute is changing and coming to an end. We have to find other interesting ways to improve performance. Before I think we get into this, it might be nice just to have a quick little introduction of the concepts we're going to talk about for the audience. The two big things I think that a lot of people might not know about data systems is there are two main schools of thought in terms of how you store data, how you can arrange it. One way, if you think about data as a row, in other words, like a transaction, all the atomic elements of one transaction can be thought of as a row. If you store them together, kind of sequentially, we call that a rowstore. But if you think about it differently and you store say each columns of data together, we call that a columnstore. And so those two, the vertical, if you think about the columns and the horizontal as the rows, they have different characteristics. So if you're going to fetch maybe the entire row, a rowstore is efficient. If you're going to scan many of these, say you want to find transactions within a certain date, you can scan the column and it's going to be very efficient there. And the two lend themselves to two kinds of workloads. So I'm just defining a terminology here. analytic workload, where you're trying to kind of find this data in the big set, lends itself to the columnar way of storing and columnar systems are like Apache Druid and SAP HANA and things like that. And then if you're going to do transactional workloads, like, a particular transaction at a time, that's a rowstore. We're going to talk about some key technologies and you may or may not be familiar with them. An SSD is a solid state drive. Meaning it's like a hard drive, except it doesn't have the spinning platter of disks. It's got chips on it, that are like memory chips, but the data doesn't go away when it loses power. An FPGA is a Field Programmable Gate Array. Instead of on a CPU, we arrange the transistors a certain way, we call that the die, and then it stays that way because you manufacture it. You can't change the CPU, it does what its function is. FPGAs on the other hand can be programmable so that you can deploy this little block, this little chip, wherever you want and then you can put code on it to change it. Some of them you can change 10,000 times and then that's it. Then device is useless. Now we're going to talk about main memory and we're gonna talk about cache. Main memory is the RAM inside of a computer. It's volatile. Meaning when you turn the computer off the RAM erases and then when it starts back up again, it has to reconstruct it, rebuild it, and start again. Whereas a disc stays. Cache is a kind of memory that's usually very close to the CPU. And so the amount of time needed, the access time is like nanoseconds. It takes very little time to access that memory by the CPU. And if you remember in season one, we kind of talked about all of this as an analogy of cache is like finding something on your desk, main memory is maybe like finding something in your neighborhood or in your city and then disk is like going to Pluto. It's that kind of orders of magnitude in terms of latency when you're looking for things. So the more you can take advantage of cache, the better. The problem is it can't be infinitely large it has to fit near the chip. So it has to be very small. So now you have this trade off. How big do you make it? The bigger you make it, the slower it's going to get. So you're trying to find the right balance. I wanted to set that so that everybody knows what we were talking about, when we talk about these. So Manos tell me a little bit about, you know, some more characteristics about a row versus a column that got your attention and research and why it is important for us to choose the right format. Manos: So, first of all, I'm very happy to be here again. Thanks for inviting me. I’ll dive right into it because I think it's a very interesting topic. So this question has been on the forefront of data management research, at least for the last 10 or 12 years. And people have been doing research on column versus rows, even from the eighties. However, it's still relevant because many of the workloads that we're executing have this dual nature, and now it's becoming more and more that this dual nature exists at the same time, all of the time. Right? So on the one hand, we're collecting lots of data measurements, information about sales, information about monitoring activity, whatever data we're collecting, we're collecting lots of them at any point of time. And the natural way of collecting data items is putting together all the conceptual pieces of the data item. And that's what they're always essentially. So a row can be for example, I want to store all the metadata that I captured from a group of sensors at this specific timestamp, or it could be all the meta data of monetary transaction that you made online. I want to show all these metadata, but then again, every now and then I might want to analyze my overall data collection to figure out some trends. So maybe I want to make sure I find, for example, how does the temperature change across seasons from a specific sensor or how my sales, how do they change as I am moving into different products or different locations and so on. So the first one is what you were calling transactional and the second one is what you were calling analytical. And they typically favor different ways of storing the data as you call them rowstores and columnstores accordingly. But also this means that there's a different architecture of the system. So to make a long story short, when we are collecting these measurements one at a time, essentially the easy thing to do is to simply append the whole measurement every time we get one of those, right? If we want to say, find the average temperature of a specific sensor, we are interested only in this temperature information. And we don't want to be reading, at the same time, more data when we’re asking this query. And why one would ask why we would need to read more data? Well, when we're reading data in any sort of system, really, we're actually reading a block of data. This block of data can be at this page, if we’re reading from the disc it can be a memory page. If we’re reading from memory, it can be a cache line, if we’re reading from the cache. However, we have to read the entire block, even if we only need four bytes from this block. So if we co-locate the useful data, then everything that we read is going to be useful. If we do not do that, we're going to be reading the whole block only to be reading only a subset of that. And that's essentially what is the benefit of columnstores. If I know that I'm going to access only a single column for my query or maybe two columns. Right? I want to be able to access these two columns out of my, let's say, 20 columns that my table has. Right? So I can access my first column and then I physically asset my query, or maybe I didn’t find the rows that I'm interested in, and then using these values, I'm accessing the second column to calculate some statistics. Right? So, that's fine now what's the problem? What's the drawback of columnstores? The drawback is that we have to break the data down to different columns when we're inserting these new data items. So if we wanted at the same time to be able to insert new data in our system and having them readily available for efficient columnar access that read only the useful data for all these analytical queries. This is not an easy task. So there has been a lot of research over the last several years, essentially you can break this down into a few schools of thought, but you either say I'm going to start from a rowstore and I'm going to have a columnstore accelerator, which is like an index memory. Or I'm going to have a pure rows, a pure columnstore, and then I have to pay the cost of ingestion. And then we optimize that. For example, we optimize by having a so-called write store, where it gets all the new data, which is in rowstore format. And then every now and then we get a batch of these things to be more efficient. Right? We batch these data, we break them down into columns and we load them into the main system that has essentially, the efficiently laid out data in columns. Right? And then, as far as maybe 20 years ago, there's this approach called fractured mirrors, which was proposing to say, have two copies of the data, one is a rowstore copy, one is a columnstore copy, maintain both of them and essentially have the transactional data coming into the rowstore and then have a constant flow into the columnstore under the hood. And then you could actually have an almost very fresh version of your data efficiently laid out at any point of time, but then you have to store, physically, two copies of data and you have to have two systems to maintain. Now what's the problem with having two copies of the data? Well, obviously you have to spend the time and the money to store this data, which is essentially duplicating your size. And now that we are moving more and more in the cloud, essentially everything directly translates to renting more disks or renting more storage. And on the other hand, you have code complexity. You need to maintain two systems and you know, you have to resolve all sorts of problems that any of the two might face. Angelo: One of the things that you mentioned was the problem with columnstores. They're fantastically good at searching like predicates, the where part of a query, especially if you're going to have analytic capability. What the problem is, you know, you’ve mentioned this was that, say that the data is sorted. I could quickly find what I want and start scanning. But what if I have to insert a new row in the middle of this? I have to find where it belongs. I have to move everything out of the way. I have to create a place for it and put that in, but not just there, in every row, in every column of that row. Which is extremely expensive. So both of these, rowsstores as well, have their own problem. Right? If I'm going to scan, like you said, I'm going to throw away most of the row because I can only scan a certain amount in a chunk. And then if I'm only looking at one small piece and throwing it away, I'm spending all the time of moving all that into memory just to ignore it. And so now the system designer is faced with a problem. There's no perfect solution here. I have to make a decision. Do I go with one kind of workload or another workload? And I liked what you were talking about. That's the natural way that we start thinking of things we'll say, well, do we want to create a system, maybe we'll store both ways. And we'll just try to decide during the workload process, but that creates a bunch of problems, kind of like what you were just saying. You have two systems now to maintain. What other problems do you see with this idea of, because people might say storage is cheap. Not when we're talking about petabytes, it's not. Manos: You're right. And essentially don't get me wrong. There's a wealth of wonderful solutions that are actually very good. And they're trying to bridge the gap between the two in various ways. It's actually too many to list all of them, but essentially, you might say I'm going to start from one extreme, either a rowstore or a columnstore, and then I'm going to create fragments of data that are exactly in the layout that I want them to be. So this is great. And then essentially, if you have similar queries you’re going to have readily available the next piece of data that they're really interested in. Or maybe you say I'm going to start from a rowstore, but I'm going to maintain an accelerator that has everything as a column, which I don't care for durability. So if my system goes down, I will simply reload it. I don't need to maintain this and store this durably. But my queries are very fast when I'm operating normally. Right? And then of course, you have a lot of effort happening in columnstore database systems or columnstore, even, cloud based data management offerings, that essentially they are storing pure columns to us on the cloud, but they do have these ingestion mechanisms of ingesting rows, storing them locally, buffering them locally and splitting them in as a path along the way. Some of the problems that you might face for example, is that, okay, if you have fragments of data to the layout that you want them to be, now you have to maintain these fragments of data as well. So you have to maintain probably your traditional buffer pool, which is for your transactional workload, but then you also have to maintain another memory space that you have all these fragments or these columns or groups of columns that might be tailored to the specific queries you are receiving. And then you have to search through them and find which one is the one that you're looking for. Again, all of these is not say that this is not possible, or this is not working. It's actually working. What we saw as an opportunity is to try and simplify the architecture of the systems. That's what we wanted to do. Essentially, even during my postdoc years at Harvard, I was thinking of this magic memory, this magic device that could actually give you rows or it could give you columns from the hardware, from the device itself. Right? So you don't need to physically decide I'm going to store them as rows as I'm going to store them as columns. You can store them in one way and always get whatever you want. And I was thinking, okay, that's too good to be true, but still if we had that, that would be amazing because then, we can essentially make every query ask exactly the columns that it wants. If a query needs to access three columns, column one, seven and eleven, for example, of the data of the table, it can simply fetch these three columns and nothing else through the memory hierarchy. Right? If we have this hardware, then we can actually build simple data systems because the data system does not need to worry about, what is the layout. It will always have exactly what layout it needs and nothing different. Right? So, at the time I was thinking that this might be something interesting to consider in the context of smart SSD's, smart solid state disks, because the last, maybe 10 years now, seven years there have been a lot of SSD devices that are equipped with logic within the device. So this can be in a smaller FPGA, essentially you can implement whatever you want in this small FPGA. It turns out that I was discussing this idea with a colleague, BU at an outdoor, who was working on real-time systems. And he has a lot of expertise in building new hardware with FPGA. And, he was also working on an idea to use an FPGA, which is located between main memory and CPU to actually make the CPU believe that a specific memory address is available or is not available. Starting from this idea, we developed a new idea. So what if I have my data in memory as a rowstore, right? And then I create a variable, which is another variable, not the base variable of your essentially two dimensional array, which is your lesson on data. I create another variable which is faked. So it does not exist. It does not point to any real memory address. Right? But the moment someone tries to access anything in this variable, it uses this technology that they're already developing to actually go through the FPGA and read and morph some specific subset of the data. So we actually call this, ephemeral variable, from the Greek word, ephemeral, which means something that can be forgotten easily in the near future. What this ephemeral variable is doing is that it is configured to read specific a subset of the overall array, of the overall table. And the moment that CPU, like a C line code is trying to access, you know, let's call it, ephemeral variable E right? E of location five, this is going through the FPGA to fetch the specific parts of the row that we’re interested in and by specific parts I mean, the following. When we create the ephemeral variable, we say, this is an ephemeral variable pointing to a meta version of the relational table only containing for example, three columns, column one, seven and eleven. So the moment I'm accessing E of five, I'm going to be fetching a row, a mini row of three columns column one, seven and eleven. And I'm going to propagate this through the member hierarchy as if it existed in main memory, even though it never existed in reality in main memory. So the FPGA sits in between, the main memory controller and the CPU. It directs with the main memory controller to fetch the corresponding data. It cleans them up and it forwards them to the cast hierarchy, really, before it reached the CPU. This approach one might think, this has to be terribly inefficient because you have an actual FPGA doing actual reading and some version of selecting data on the fly in between the critical path. It turns out that by doing very diligent optimization of the way we develop the FPGA hardware, this can happen really, really efficiently, essentially showing zero penalty to the memory access that the CPU would see if it was accessing, directly, the corresponding data. In fact, it's not a time of zero. So there is a small penalty if our ephemeral variable is reading only one, two or three columns, but the moment it's reading three or more columns in our specific platform, we’re better than the columnstore per se. Not better than the rowstore, which was the easy part. We are better than the columnstore design. Angelo: That's amazing. So I want to make sure that we understand the key concept here that the FPGA rearranges a data and injects it directly into cache. So you don't pay the penalty of moving it through the hierarchy of memory. It's kind of already in the lower hierarchy. And so to the CPU, and I guess also to the algorithms that run as transparent, they think it's a row even though it wasn’t. It's not really a rowstored on disk. Is that right? Manos: So on main memory and disk the data is written as rows. And then what we are doing is that we can give to the algorithms that do the CPU either one column or any group of columns that we're interested in. So a different subset of the data as if this was the way we store the data on the disk. In order to make a parallel, I mentioned before that many systems actually create fragments of the data in memory, which is the optimal layout. So essentially we are creating these fragments of the data on the fly without ever creating them in memory. The CPU can consume them and they never existed. Why this is interesting? Because, we still maintain our base storage, which is a rowstore and can get all the new rows added at the end without any sort of penalty of chopping down the row into multiple columns, without any of those more expensive operations and without needing any specific code that would do a write to read transition later on. So we do this on the fly using this FPGA. Now of course, I want to say two disclaimers. This is a prototype. It does work for hand-coded queries. So we have a synthetic benchmark, but also some of the PCH queries that we have started running. We are using this technology to also answer multi-table queries on not only selectivity queries, but we don't implement in the hardware anything other than creating the optimal layout. But by having this protection operator implemented in hardware, we call it their data transformation. We can essentially allow our systems to always have access to the optimal query layout without having to maintain multiple copies of the data, without having to have a software process of transforming row-oriented data to groups of columns or columns. Angelo: The thing that makes that really interesting is that if we were to say, well, why not just duplicate the data? You mentioned it, we've got all these other auxiliary structures that, that makes it very hard to maintain properties like ACID, because I have a lot of things I have to keep in sync. I have to update whenever you do a write, which is what makes a lot of the big data systems like Cassandra and Dynamo so attractive because they take very, very high writes because of the co-location the way that the carefully spread the data, these LSM trees, the way they carefully package them together. So we can scan while at the same time buffering. If we didn't have to do any of that, this kind of gives you the best of both worlds, because we could create a very smart system. I remember the H2O paper that I think Stratos [Idreos] and some others wrote. That paper is very interesting because it presents the idea of, well, what if we could dynamically start to understand what I'm being asked? And then on idle time, I rearranged the data to kind of fit so over time the data will constantly be rearranged, but I still pay the penalty. This is saying, well, what if I didn't have to rearrange it at all? What if I just made you think it was in the format that you wanted? Manos: That's exactly what it is and H2O is actually, very interesting that you mention it, is essentially one paper that says, one approach that says, let's try to create these optimal layouts and then have a cache of these optimal layouts and use them as much as possible. And maybe do that also under the hood. And then if we have that, then we can do cogeneration and we need to change the rest of the system when we're answering the queries, which is fantastic. And what we want to say now is that let's assume that all this optimal layout business is for free and then you still need to change the rest of the system. So actually this is our next phase. If we have a hardware that can actually offer the ultimate layout at any point of time, how should we change the rest of the database system specifically? How should we change query optimization? This is the key question there. Now, physical design I think it’s going to be simplified because you wouldn't need to think so much for physical design anymore because your smart hardware is going to give you any physical fragment of the data you want. But query optimization now, has become more interesting because you might have more options available to you. Right? And in query optimization what we're trying to do typically is to see what are the options that I have and then to make sure that we find the fastest one to answer our query. So we're actually going to turn the question around, not let's use the fastest I have, but let's make sure that I ran the fastest that I could ever possibly get. Now, you also mentioned ACID. So clearly, the ACID properties are very important for any database system. You might have questions here as well. Right? So one can say, okay, so you are creating, on the fly, a fragment of the data that exists only in the CPU and in the cache memories at the time that the CPU is consuming it. So how do we know that this is the actual correct data that we should be reading if at the same time I'm actually also writing on the base data of the table? Right? So we cannot do this with classical locks and a simple classical approach. So we are using essentially, relation through one format of multi- vessel forecast control. So every row has timestamps. Beginning and end of validity, those are the two timestamps And every query has also a timestamp and if the timestamp is in between the validated timestamps of the row, then this store has to be part of the query. It cannot be part of the query. So this is what we're using. The beauty of that is that our hardware that is doing projection is also manipulating these metadata to extra columns. So they can actually be part of the same hardware. And we are also exploring moving at least some part of selection physically within the FPGA as well. So imagine that not only you can get the optimal layout, but you can also offload your selection in the FPGA. Now, again, this is not a new idea. Offloading selections to FPGA has been happening for many years. There's a lot of different efforts across the globe who are working on that. Notably there is a group from ETH giving a lot of effort in this realm. Essentially it will compliment our current hardware and it will push even less data through the member hierarchy towards the CPU, making it even more appealing moving forward. Angelo: That's interesting. So for our listeners, you mentioned projection is where you're implementing. Projection is the part of the query that returns the columns you want. That's the select part of SQL. I was going to ask and you answered it already, but can you do the selection? That's the predicate part, the ‘where’ part of the query where you're trying to reduce the rows, if essentially selecting it. Some data systems call that filtering, but in data systems speak, that's what we mean when we say selection. Now you can use columnstore algorithms on rowstores and they would think it's a columnstore if you could implement the selection. Manos: So you can do this already even without the selection, right? Because essentially you can say, you know what, I'm going to ask for column number one from my table. And then I can use this column to do my column at the time style processing, and then I'll use another ephemeral variable and I’ll request column number three. So you can actually do this with our hardware. Essentially, our hardware allows you to work with many rows and groups of columns and with entire columns. Any of the three is possible. So you can do a column at a time. You can do a row at a time and you can do many rows at a time. Or of course you can do block or vector at a time if you implement this on top of our hardware. This is not done yet from our prototype, but it can definitely be done in the future. Angelo: Thinking about where data systems have gone, we have OLTP, which is the transaction processing and we have the OLAP, which is the analytics. And there's this in between thing that we're talking about, all this, all the hybrid approaches. There has been, I remember even talking about this when I was at Harvard all those years ago that there could be a time when all of this coalesces. There won't be SQL and no SQL, there will be more and more things in common. The Venn diagram will cross over more and more, and you are indeed seeing no SQL databases that have ACID properties. And so now all of a sudden, you know, we can start combining these things. And I think this kind of technology, this idea of saying, let's use hardware technology to make the penalty we pay with memory low so that we can maximize what we're using. You know, again, now that Moore's Law is coming to an end we don't need more and more CPU. The bottleneck is memory movement, well what if we can eliminate that a bit. Manos: Now in the beginning, of course we didn't start doing specialized hardware, but we started a big so-called multi-core processors that everybody now has even in their phones today, even the phones have like eight cores. Why, why a phone has eight cores? Not because necessarily we always prefer to write a program that uses multiple threads, right? This actually is more difficult than writing a single monolithic program. However, the single monolithic program will not get faster anymore with a better CPU. It used to be getting faster with faster CPU’s. It doesn't anymore. So now we have more CPU's or more cores, to say it better. And we have this parallelism and essentially over the last maybe 15, 17 years now, almost any software system is being rewritten or tailored for increasing parallelism. The other part of the end of Moore's Law is that now we are not getting faster CPU's anymore. So if I build the specialist hardware that actually accelerates something, that's probably very interesting because I'm not going to get this benefits in any other way, unless I can parallelize things. And by the way, you need to upgrade your servers every few years or every few months. However, the cloud essentially does it for you and they do it because they have thousands or millions of customers at the same time so that they can afford to do that. And then you can always say, you know what, now if you have this, I won't take advantage of your FPGA. I can simply go and turn it on, on the cloud. And that's why cloud offerings, under the hood, have started using more and more hardware acceleration techniques, not always, but it is definitely possible. And one more interesting facet of Moore's Law is that from 2004 onwards, we could still get the exponential improvement in performance at a lower rate. And that was coming from the early exploitation of parallelism. But in the beginning, it's easy to say, okay, I'm going to break one task in two parallel tasks. That's fine. I'm going to break two tasks to four tasks. That's fine. Once you have 1 million tasks and you have to organize them and you have to synchronize them, that's very hard. Right? So that's why in the beginning it was very easy to get benefit. And also there was a lot of painful effort they might say from computer architects to optimize the processor, right? And they optimize the processor by exploring and highlighting parallelism. So every step of the pipeline is essentially parallel with the other steps of the pipeline. So even though we're executing one command, you have 30 commands in flight. So they built more optimized and deeper pipelines and more optimized processors, bigger cache memory. So accessing data and code is more efficient and all these organizations have actually helped to maintain these exponential increases in performance for the last 15 years. It's not clear that we can always keep doing that. So every year we need to bring new ideas about how to gain more performance, not by a better or faster hardware, but by a better design. And that's why our approach and our effort comes into play with Moore's Law, because we're trying to say, let's gain performance out of the fact that we cannot have a faster CPU essentially by taking the silicon and eventually using it closer to the data to make something smart over there. And our ultimate goal is not to have an FPGA between memory and CPU, rather is to actually put all this functionality in the memory controller. So in the future, we hope to have memory controllers that can essentially allow you to query specific columns from a rowstore laid out. So that's the dream. Angelo: So, thinking about this for a second, it makes a lot of sense, right? You can only make things so small. We've gotten down to two, one nanometer process, you know, you're not going to go to zero. The clock speed in the computer has gotten as fast as it's going to get. Parallelism. Now we we’re putting more and more cores on there. All this makes sense. And you are starting to see exactly what you're talking about. You see CPU's that have tensor processing units on them. You see CPU's that have, the new generation of the Apple chips have a machine learning set on the chip. And so you see that now where we're putting these specialized bits of technology in there, which when you think about it, software is just a different manifestation of the hardware. Hardware is just, very general purpose and the software is what makes it special purpose. If you push it onto the silicon, it's just going to be faster because it runs right next to the CPU. So this makes complete sense. And I like the idea of an FPGA is the testing ground for new ideas. And this is what we can do in the next stages of classical computing is continue to take ideas like this and program them with FPGA. And then once you prove them out, it makes sense to invest and put them on the processor. Because then what we can do is, we can have data systems that are designed for this kind of thing. And they matter, you know, having, I think some of the numbers were like 1.8x or 1.6x performance, even just 10% improvement in performance at the scale we're talking about, so much energy, so much computing power costs. Manos: I've done the math with an example. I don't have it in front of me, but you can easily make millions of dollars by 1% benefit. If you have many instances, that's the crucial part. And you're making them why? Because in the cloud you don't pay for what you don't use. So that's why it makes a lot of sense to actually tailor things in that way. Angelo: All of that can be also affected all the way back to the engineer who's thinking about new ideas and implementing new ideas. And it has impacts, you know, to energy and all that kind of stuff that you just don't even think about. It comes down to the algorithm. The algorithm is a most powerful part of this whole thing because it can make change exponentially across the board. So I think that that's an interesting and very fun part of this research. Manos, thank you again for being on the show. Manos: Thank you for inviting me. I have been very happy to talk about our work on what we call relational memory. I didn't talk about the name. The name comes from the fact that it allows you to do any relational access, columnstore or rowstore, right? And also was happy to bring everybody up to speed with what we've been doing and essentially what motivated our work. Angelo: Great. Thank you. Really, it's always a pleasure to talk and to have you on so I want to thank you for that. And for our listeners, hopefully we've peaked your interest. Maybe even demystified a little bit what data systems do under the covers. And hopefully this will lead you into some research of your own. I'm your host Angelo Kastroulis, and this has been Counting Sand. Before you go, please take a moment to follow us and to subscribe to the podcast. You can find us on LinkedIn, Angelok1, or you can follow us on Twitter @AngeloKastr or you can follow my company @BallistaGroup.