Counting Sand

Energy, Edge Computing, and Data Centers

Episode Summary

The world is consuming information at a voracious rate. In fact, we created new terminology to describe this phenomenon, and we call it big data. It can be characterized by four V’s. Volume means the size of the data. Velocity, how quickly is the data sent our way. Veracity, how truthful is the data. And, of course, the variety of data as the information itself comes in all shapes and sizes. Once all this data is collected, we are faced with real problems concerning infrastructure and computation. In this episode of Counting Sand, Angelo is joined by colleagues from across the world to talk about the impact and opportunity of big data ecosystems.

Episode Notes

What if there was a way to reduce the amount of energy consumed and produced from servers around the world. Would these new methods positively or negatively impact the environmental footprint of today’s big data ecosystems?

 

In a time crunch? Check out the time stamps below:

[02:15] - Research Paper 

[05:55] - Power consumption of data centers and methods to save energy 

[08:50] - Server cooling methods 

[12:00] - Energy production from data transportation 

[13:55] - The impact of location and climate through venting and cooling computers. 

[15:38] - Edge devices and cloud computing 

[20:47] - Cost and energy optimization

[21:45] - Machine Learning + A.I. productive maintenance

[24:45] - Automobile processing unit, big data

 

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

Episode Transcription

Angelo: The world is consuming information at a voracious rate. In fact, we created new terminology to describe this phenomenon and we call it, big data. It's been characterized by four V’s. Volume, meaning the size of the data, it can be very large. Velocity, how quickly it's coming at us. Veracity, it's truthfulness. And of course the variety of data. It looks different in all shapes and sizes. And when we say, truthfulness, we're talking about data that might come from a sensor. It's actually, it's an accurate measurement. What does it mean? So we still have to process all this information. So, that presents real problems for infrastructure and computation. Today, I'm joined by a few colleagues, joining me from across the world, to talk about this impact and opportunity. I'm your host, Angelo Kastroulis and this is Counting Sand. As I mentioned, I'm joined by a few colleagues, Jon Summers, Jon, why don't you introduce yourself? Jon: I scientific lead in data centers at Research Institutes of Sweden. And we are looking at the thermal and energy management of all things, electronics that goes into data centers, from the chip to the chiller and the ground to the cloud. Angelo: Also joining me is Mats Eriksson, Mats, why don't you tell us a little bit about yourself? Mats: I'm CEO of a tech company called Arctos Labs and what we do is edge optimization, specifically think, considering distributed cloud, smaller clouds connected, and how you distribute workloads and how you can save energy by distributing workloads in a smart way. So we have technology that addresses that problem. Angelo: So we became acquainted through a paper that Mats wrote. Mats you had worked on trying to kind of quantify how it is that we can measure the impact that we might save in terms of energy, by pushing computations to the edge, as opposed to building bigger and bigger data centers. And one of my partners found this paper and said, hey, you have to read this, and that's kind of how we've found each other. One of the papers though, that you talked about, you referenced a few other papers in your paper, which of course any good paper does. One of them was from a conference in 2019. It talked about effectively the amount of energy, for example, that YouTube uses and they projected, I mean, we really don't know exactly, but they projected that it uses 10 metric, tons of carbon emissions in a year. It makes sense to use energy, of course, to get around and the power of things and provide food. But we don't think about the fact that our current lives today watching things like videos, and Tik Tok, and all that uses a tremendous amount of energy. There are maybe some things we can do to decrease it. Another paper that I thought was really interesting was one that was published back in 2015 that talked about, communication technology in general, and they predicted in 2030, 51% of energy would be electricity, would be used to power our communications, YouTube, the Internet, all that kind of stuff. And then the rest of it is to power the things we need, like food and, transportation, other things like that. So those were very important papers, I think, that kind of sets the stage for your work and your discussion around, this is becoming untenable. Mats, do you have any thoughts on that? Mats: Yeah. I mean, those figures are, you could say, highly debated, right? Jon, I guess you can give your views on that, but they are definitely highly debated. What was good of course, was that it created awareness about the IT industry is actually consuming substantial amounts of energy. Whether the projections relate to the data that tsunami that was in some of those papers, we’re seeing data consumption are skyrocketing, basically. Whether that would lead to a corresponding increase of energy consumption, which some of the papers kind of assume that, but since I wrote my paper, we've seen newer research coming along that indicates that the linearity of that energy consumption isn't there, basically. So I would say it's a bit up in the air, but it still implies that there is significant savings we could do. Angelo: One of those papers was very optimistic. And it was saying, well actually by 2030 we'll probably have a lot of renewable energy in the data centers. And it probably won't be as bad as we think it is. Pretty optimistic, I think. One interesting thought too, I thought is that, there's another paper you mentioned, I think it was an ACM paper where they analyzed a CPU and they determined, because that's what a lot of things in a data center, they use energy are these CPU's and they're not very linear in terms of their consumption of energy. Right? It doesn't start when the CPU is idle there you're not using zero. And then when it's fully utilized, you are using a hundred percent, the most it can use when it's fully utilized, but it's not a linear scale. In fact, when it's idle, you're using somewhere between a quarter and a half of the energy or more, just doing nothing. And that was, I think some of the things that you postulated on, right? Can we use that, exploit that fact to be able to use it to our advantage in a data center or in, I should say, not just data centers, but anywhere that we have devices, right? Mats: Yeah, exactly. I think that the idle consumption, if we call it that, as a fraction of the total maximum consumption has decreased over time, long-term. It used to be worse, sort of. But it's still not ideal. So of course it makes sense to pack the workloads so you use them as efficient as possible and turn off other servers not being used. There are drawbacks with that, of course, that your tolerance for load variations becomes less. Jon: Yeah. I mean, I think what you're saying, Mats, is correct, over the years, the idle power has dropped. I think the other aspect that we need to consider is, and this is where data centers are very complex to operate. So, servers have multiple bio settings and one of the things that happens in the industry is that a lot of the folks that install these servers see the word performance mode. They put the servers in performance mode. And of course, what performance mode does is it kind of disables a lot of the energy saving features, but makes them extremely responsive and react really fast to workload requests. And so you get the situation where you don't see so much variation in the power, because they're all drawing a lot of power all the time, because they're in some kind of state that is a state that they can be in, but it's probably not the ideal state for operating today. And again, this is kind of an education piece for people who install the servers. I don't know how much they look at that, you know, that was amply demonstrated to us when we were looking at some research where we were trying to see how workload deployment across racks of servers could find the sweet spot of operation. And what we were seeing is that there wasn't significant difference between idle and sweet spot. Because we couldn't find the sweet spot. And we found out that the simple answer to the problem was that all the servers were in performance mode, not in energy saving modes, you know, so that they have all these features that allows the workload to fluctuate. And the last thing I'd like to say about the energy draw of the micro electronics, is it's dependent on temperature, the temperature which they see. You know, we have the ASHRAE, which is the American Society of Heating, Refrigerating and Air-Conditioning Engineers standards for the operation of air-cooled IT equipment. And you know, they've been pushing for running these with an air amulet temperature hotter with the unintended consequence that when they're doing a certain amount of digital work, they draw slightly more power because they're running hotter. And so that's another aspect that needs to be considered and factored into operation. Mats: Yeah and going away from this PUE kind of metric, right, to, rather than talking about the energy being consumed to do a certain amount of work. Jon: Exactly like an energy efficiency metric. I mean, an energy efficiency metric is what you want, which is a digital service divided by what you have to pay for, which is the kilowatt hour of electricity. And of course this power usage effectiveness, that Mats mentioned is a metric that has been useful for data centers to see their continual improvement year on year. But it's being used as a metric to compare data centers between themselves and the design of the data centers and the cooling systems are completely different. So you can't really compare apples with pears, which you can see how your apple production gets better every year. Angelo: We talked once about, the fact that as you use this energy, it's converted into something else like heat and the CPU's then get hotter. Then the many CPU's inside a data center continue to get hotter. So one of the ways that we use energy is not just CPU's, but to cool everything back down. Right? There are some interesting things maybe that we can do there to save energy. What do you think? Jon: Well, I just love looking back at the history of science and I think people should look up this paper in 1961, which was written by a guy called Rolf Landauer, who was working for IBM at the time, and the title of the paper is called Irreversibility and Heat Generation in the Computing Process. So the first point is irreversibility in the logic. Which is an interesting one, which we could come back to later, but the heat generation is a natural consequence of the computing process. As it's been known about ever since we started devising these systems. And of course we've tried to reduce the power consumption for the compute that we do over the years. But based on some of the ideas of information theory, you can see that 99.97% of the electrical energy that goes into those electronics stops just there, gets converted into heat. 0.03% is the actual energy value of the digital processing. It has an energy value as well, which eventually gets converted to heat later on. But that's the part that you want. That's the thing that makes the money, not the 99.97%, which is unfortunate. And that's why we're getting ourselves into the situation where saying, oh, well, we, you know, data centers don't produce anything physically, but they produce a lot of heat. And of course we live in a physical world. It makes sense to try and do something with that heat because it's physical and it's wasteful just to throw it away. Angelo: And when you think about it in that respect, it's not terribly efficient to get only 0.03% actual digital energy value out of the energy that we use. Couldn't we actually recycle? Reuse this heat for some constructive purpose instead of just then using more energy to chill it and cool everything back down? It seems like that could be done? Jon: One of the ways that this has been done in the large scale with data centers, and it involves the energy provider in the mix, is to use what's called a heat pump, which upgrades the heat from the data center to a level that can be ingested into the district heating system because in the Nordics here, there are quite a few examples of that happening already. And that's the successful way of using the heat. Particularly if the heat is consumed all year round, which, you know, although the heat demand in the summer is lower, maybe you can turn down other contributors to generating heat and just allow the data centers to provide hot water to the population. The problem we have with heat is when we want it, there isn't enough of it. When we don't want it, there's too much of it anyway, with data centers. And that's the problem because, you know, we have seasonal variation and we can't match digital service demand with heat demand very easily unless we move the workloads around. And I think Mats could say something a little bit about that. Mats: That's the kind of algorithms or kind of scenarios that we are, of course, looking for to create value with our technology. Whether it's saving energy or moving, you tend to move around workloads by the sun, basically. Not to forget, while I remember it, we tend to talk a lot about the compute and forget about the energy being consumed by moving data from you and me, or for me to move to the data center, between data centers. And if you look at some of these reports that you’ve mentioned in the beginning, it's about the same amount of energy being consumed by data networks as is for data centers, or at least in the same neighbourhood. So we should be aware of that, that moving data is kind of energy costly from an energy point of view. Angelo: Yes, it is, for sure. And I want to get back to that point. Jon, I think you were going to say something? Jon: I was just going to say, I think it was Google that once said, follow the moon policy. And I think their rationale was, it's easier to do free cooling at night than it is during the day. Because they have multiple data centers around the globe, you just say, well, where's the moon? But it wasn't always where there was night, but it was easy to get rid of the heat. This is the problem we have today. We use the word free cooling and I think we shouldn't use the word free cooling, but the industry uses it, you know, ice free cooling, just reject it outside because it's easy to, you don't need any mechanical, you don't need to use a compressor or an evaporator. You could just get rid of the heat straight outside. And that's kind of a wasteful thing. And the other thing that I think also makes it difficult for people to understand this industry, is we use the word cloud computing. And it's a classic statement, from a government official who said, why do we need all these data centers when you can put everything on the cloud? And it kind of sums it up, doesn't it? Angelo: Another aspect here that kind of makes me think is you can't always just vent it into the atmosphere. Maybe that might work in the Nordics where it's cooler and, and you can vent heat out, but that won't work in a tropical area. Places like Texas and Florida opening it up, just pumping the heat out there. I mean, it's probably hotter outside anyway than it is inside. So you're not actually cooling anything. Jon: We sometimes say air economization to mean free cooling, is that you use the outside air. But there's another economization, which is called water economization, which is where you evaporate water to cool an air stream down. Now that works if the air is not humid, if it is reasonably hot, and you can add water to the air and then it cools the air stream down, and then you've got cool air, which you can consume in your data center. So you don't use these expensive compressors because that's where, when I say expensive compressors that are expensive to operate because they consume a lot of energy in their operation. So then people forget about the fact that wait a minute, data centers are consuming power, but now they're consuming water as well. So these are all kinds of resources that one has to consider. And there are metrics that look at this, and one of them is called the water usage effectiveness. So it's how many liters of water you use for kilowatt hour of operation of your data center. And that's why you need a refrigeration cycle and hence the compressor. Of course the other way of doing this is to run things hotter, which goes against my previous statement, where you run IT very hot. But you can run it hotter if you run with a liquid rather than air. And so the elevated temperatures allow you to get rid of the heat more easily. And there's this whole raft of technology, which is a very old technology because it started in the 60s, but it's sort of really never taken the data center at full force. And that's called liquid cooling. That's where you get liquids very close to your micro electronics. And some people may say, look, water in micro electronics don't really mix do they? And they don't. You have to be careful, yeah. Angelo: Yes, liquid cooling has been around for a very long time. In fact, it's still very popular in the enthusiast market where they try to overclock their machines and they do that because they can cool them down with liquids, much easier than they can with air. I want to take a step back for just a second. And let's think about this a little bit differently. We have all these edge devices that are growing exponentially, right? Phones, sensors, all kinds of things that are sending data up to the cloud. I think Eric Schmidt of Google said that every two days we create as much information as we did since the beginning of human history up to 2003. We do that every two days and that's only going to get exacerbated. So it makes it untenable because we have to build bigger and bigger data centers. That's all the cloud is, are these data centers. So then what if we didn't have to do that at all? Instead, what if we could push these computations or data cleaning down to the edge and let the edge devices, because they're idle and consuming energy anyway, do some of that work? Kind of lift the burden away from the cloud. Could that have an effect? Mats: Yeah. Yeah. And then of course we need to think about what do we mean by the edge, because of course it could be on the device itself. It could be on my smartphone. But it could also be in a cell tower or in a Metro area that we build kind of smaller, but still quite big, data centers. And I mean, we already see this as content caching systems like Netflix and those guys say they have all their movies stored close to you because they have the same problem, but kind of downstream to us as consumers. So of course, when we have this huge amount of data, it kind of makes sense to process it before we send it upwards or consume it. Kind of turn data into information, if you will. Something that means something. And do that as close to the source as possible. And for that matter, if we want to, maybe, to use cloud technologies for new applications, like, controlling machines or things like that, that has kind of a control loop and you need some latency or things like that, which your far away that the center in the cloud won't support. This is a trend that grows, a lot of talk about it at least. Maybe It doesn't grow as much as people talk about it, but soon, I think. Jon: It's a bit of a cliche. We say that data is the new oil because there's so many things producing data and we need refineries. We need to refine that data because we talk about mega, giga, and tera. They say that the digital universe is in the yottabytes. And what the devil is a yottabyte? That's a one followed by 27 zeros. And we're in the zettabyte age, which is one followed by 24 zeros of data transmission on an annual basis. And then HP comes along and says, well, yotta is not going to be big enough. That's the largest prefix you have in the prefix dictionary. And they came up with bronto. A brontobyte is 10 to the 30, so it's a thousand yottabytes. So we have all this data, you know, and I think it does need processing and I think edge is still kind of searching for applications that really need it because what we tend to see, one of the statistics that we should look at is data transmission on the networks from the end user or from the core to the end user. And 80% of that is streaming data. It's coming from audio and video. The other 20% is all the other activities that are going on. So I guess, you know, if we're pushing data up, we're not pulling data down, we're pushing data up to store, you know, like our photographs, our health activities, whatever it is you want to do you, I mean, some of this data needs to be sorted, and made more accessible, because first of all, we had The Four V's of Big Data and now there's 27 V's because there are so many things like viscosity and all these kinds of things about data. Because the data has value to somebody. Angelo: Yes, exactly. And I guess now everybody has to jump on the V bandwagon and add a new one. Mats: On the other end you can wonder whether that 80% figure will be assigned. I mean, how many 4K Netflix videos can I watch a day? If this is continuing to grow, is this growth or something else? Perhaps. Jon: I think because the networks obviously swamped by those 4K videos. What's another interesting statistic, Mats, was streaming data between data centers. So that could be data center to edge. And only 20% of that data is streaming. So the other 80% is a lot of other activities. So between data centers, it's different, whether it's a fail-over, caching, moving those streaming, you know, I guess there's a lot of different applications that are going on. Angelo: That's a very good point. We don't normally think about the inter data center load, for example. Data centers are really not made of one big building. They're made up of a cluster of buildings. And in order to have availability zones, they will have to copy some of this data to other buildings or other locations, even miles away so that we can have redundancy. That way if power takes out a building, the whole internet doesn't go down. The other thing to think about is that it isn't just about the data centers copying data. Sometimes we move and copy data and push it to two data centers in different areas. We might take some data maybe from the United States and copy it to Europe and copy it to Asia, so that we can have localized data so they can get better performance right next to them. That's another thing to think about because that's more traffic being put, so it isn't just the edge nodes then pushing up data is it? Mats: Yeah. That's why we exist. That’s the kind of challenges that we started working on. There are many different aspects, of course, from energy optimization, cost optimization, which is perhaps a bigger concept, energy is part of that, to all these constraints on latency and performance that certain applications may need. And to be able to sort that out in an automatic way, which we think at least that it needs a machine to sort that out because no human will be able to kind of cover these thousands of data centers and hundreds of applications. Angelo: And when you say machine, I take that to mean machine learning AI. And it makes sense because we're able to identify patterns with machine learning. That's what it works very, very well at doing—things that humans couldn’t do. But there are other aspects to machine learning. Jon: Things like predictive maintenance, I think, is another thing, if you sensor your equipment. I mean, this has been done in the aircraft industry for many years. So, at the end of a flight, you will download the time series from the rotation, the vibrational characteristics of the engine through flight. But now, wouldn't it be good to be able to do that while the aircraft is flying. So you get streaming data and it needs to be processed on the fly. Angelo: What if we were actually able to build a micro cloud on the airplane, then we can actually do computation on the plane itself. Of course, a more traditional method would be to try to send all this telemetry to a data center and let the data center around this. But what are your thoughts in this kind of thing? Jon: As you see, you know, the avionics that goes into aircraft is pretty powerful these days. So definitely you are doing a lot of processing, on the device. The aircraft itself, because it'd be too much data to send over. But you could envisage that even driverless cars, you would expect a lot of compute to be taking place on the vehicle itself, not relying on a 5G connection to a series of edge data centers to do the processing. That could cause problems, could cause accidents, couldn't it? Mats: I think this is interesting because I tend to miss one word. When we talk about edge, we talk about edge computing, but we forget the word cloud and cloud computing in the sense that, we've had microprocessors in gadgets, in cars for long, long, long time. We didn't call that edge computing, but it is edge computing. But the difference is to me, when you use the concept of cloud, meaning you have a number of compute resources and you pool them and you dynamically put workloads there and make use of that pool of compute. Either for efficiency reasons or for being able to be dynamic. And that's where we start to use these technologies like virtualization and these kinds of things to share our underlying infrastructure between multiple pieces of software applications. And I think that is the concept that goes into these new environments. So if we now say we have an app store for my car, which is kind of relying on some kind of platform of virtualization, I can download apps. I can change, you'd mix and match of the software's in my car or in my plane or somewhere now it's a completely different concept how this software comes into my devices. And that is the bigger difference actually than whether we have a certain mega flops computer there or not. Angelo: The idea of the car as a processing unit has been around for a very long time. Engines send data telemetry to some central computer for all over, sensors and temperatures and different things. Some are even very sophisticated, race cars send tire temperatures and all kinds of telemetry information for computation. We get maps to our cars processing unit, but it makes sense still, I think, to send the data out to a larger cloud. That would allow us to be able to understand the relationship between cars, but that would mean a bigger and bigger data pipe so that we can send more and more telemetry. 90% of it we would throw away because it would be meaningless and we're not all Formula 1 drivers trying to understand all the slight temperature variations. Right? So the telemetry is just a different problem. It makes sense then, if you could do this you could have a vastly improved scenario with data, still getting the data from the car to the cloud, but just only the meaningful data. And that makes it much more cost-effective. Mats: I think that's a very important point and also one that lines up very well with the way we think. There are very few applications nowadays, that is one piece of software and don't talk to anything, right? Normal applications, people refer to that as microservices or composable applications, meaning we divide them in components and we spread out the components, meaning the functions. So what you just described is kind of, we distribute the function of doing that data processing, but we probably have something in the cloud that collects and draws conclusion out of multiple cars or creates statistics for tire manufacturers on their mix of rubber in the tires, or I don't know, but, you would normally need something more, than the actual data processing to make up an application. Angelo: I want to mention that a side effect though, to these micro clouds, is not just energy considerations here. And we'll talk about this in future episodes, but the idea is that if you can virtualize this technology in a car, you can now do things like push new updates to vehicles from a central location. Now you've opened up all kinds of doors. Of course, you also open up security vulnerabilities where people can now hack into the cars cloud and start doing some nefarious. Well, Jon, Mats, I want to thank you guys for joining me today. I really enjoyed the discussion. Look forward to talking again throughout the years. Best of luck to both of you. And thank you all for joining us. I'm your host, Angelo Kastroulis, and this has been Counting Sand. Before you go, please take a moment to subscribe and we appreciate your listening to the podcast. Feel free to rate us so that others can find us and follow us on Twitter @AngeloKastr, or you can follow my company @BallistaGroup, and also feel free to reach out on LinkedIn, Angelok1. And take a look at the show notes to see how you can maybe reach out to Mats and Jon.