Counting Sand

Inspired by Archimedes...Counting Sand

Episode Summary

How much sand would it take to fill the universe? And what does this 2,000-year-old question have to do with a podcast on today’s big data challenges? In this kick-off episode of the Counting Sand podcast, host Angelo Kastroulis, CEO of Carrera Group, explains how an early research paper by Archimedes of Syracuse has much in common with his own approach to today’s big questions in data science and how the paper provides not only a metaphor for how we can meld research and practice in tackling today’s big problems but also the inspiration for the perfect podcast name.

Episode Notes

How much sand would it take to fill the universe? And what does this 2,000-year-old question have to do with a podcast on today’s big data challenges? In this kick-off episode of the Counting Sand podcast, host Angelo Kastroulis, CEO of Carrera Group, explains how an early research paper by Archimedes of Syracuse has much in common with his own approach to today’s big questions in data science and how the paper provides not only a metaphor for how we can meld research and practice in tackling today’s big problems but also the inspiration for the perfect podcast name.

In order to explain the origin of the name of this podcast, Angelo starts with a little history on Archimedes, as both a practical designer and also a scientist interested in the theoretical underpinnings of mathematical principles.

Angelo then talks about some important research by Archimedes but begins by explaining what a research paper is, what the history of research papers is, and why anyone undertakes writing one. He then spends time talking about Archimedes’ paper that attempts to spell out how many grains of sand would be needed to fill the universe. Of course, to answer this, Archimedes needed to approximate the size of the universe and, in order to do that, he had to develop a new number system.

Angelo—who himself has both a Greek and entrepreneurial heritage—begins to draw parallels to Archimedes and his approach to the sand problem and his own approach to understanding and addressing big problems today. He talks about his journey to find the balance of the theoretical and practical, just as Archimedes did, applying a rigorous methodology, dealing with disappointment, and exercising patience. Angelo shares his first operating axiom: “When the solution isn’t readily apparent, be patient, keep researching; the solution will present itself.”

In his work as a data scientist and technologist best known for his high-performance computing and Health IT experience, Angelo uses this process time and again. In this episode he gives examples from his own research career and the applications he has developed. Ultimately he shares his axiom #2: “If you find yourself doing too much theory, do more application and it will make your theory better, If you find yourself doing too much application, do more theory and it will make your application better.”

As Angelo says, Counting Sand will be a bit different than other podcasts. We will talk about some big problems and both discuss the theory behind potential solutions and see how they can be applied to tackle real problems. We are excited to bring listeners along for the ride.

Citations

Bourne, S. (2004, Deecembeer 6). A Conversation with Bruce Lindsay. A conversation with Bruce Lindsay – ACM Queue. Retrieved October 4, 2021, from https://queue.acm.org/detail.cfm?id=1036486.

Heath, T.G. (2020). The Sand-Reckoner of Archimedes (Vol. 1). Library of Alexandria.

Kastroulis, A. (2019). Towards Learned Access Path Selection: Using Artificial Intelligence to Determine the Decision Boundary of Scan vs Index Probes in Data Systems (Doctoral dissertation, Harvard University)

Further Reading

On Archimedes’ Sand Reckoner

Angelo Kastroulis’ Harvard master’s thesis

The Harvard Data Systems Lab

“Publish or Perish”

About the Host

Angelo Kastroulisis an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.

Host:Angelo Kastroulis

Executive Producer:Kerri Patterson; Producer:Leslie Jennings Rowley; Communications Strategist:Albert Perrotta

Music: All Things Grow byOliver Worth

Episode Transcription

Angelo Kastroulis: How much sand would it take to fill the universe? Archimedes asked that 2300 years ago and he attempted to answer it, but before he could even try to solve a problem that big, you had to come up with an entirely new system of thinking about numbers. There is so much we could learn from him that applies to today's hard problems.

I'm your host, Angelo Kastroulis, and this is Counting Sand.

Who is Archimedes? Archimedes of Syracuse is an ancient Greek polymath. I'd use multitalented, but I think a Greek word to describe him as probably more appropriate a mathematician, inventor, physicist, engineer and astronomer. What makes Archimedes especially interesting is that he wasn't just an academic.

He was also very practical. His biography has been long lost, but we do know he had friends in the collegiate world of Alexandria. In fact, the library of Alexandria was a hub in the Greek academic circles. He theorized about astronomy, mathematics and physics. But he also focused on practical application.

In fact, he was a prolific inventor when ships attack the Greek city, state of Syracuse, he built a contraption that may have used mirrors to reflect light beams in a parabola, creating a heat Ray that burned the enemy ships.

He was so well-known in shipbuilding that writing speak of his being commissioned to build a ship for King Heiro, the second of Syracuse to be used for luxury travel, transportation of goods, and to war ship. It would hold 600 people, including a gymnasium, garden decorations and a temple. The largest ship in the world at that time, such a large ship would leak water.

So he invented a device as a bilge pump. We know that device today, as the Archimedes screw later, the Romans adapted that design to be used in the Hanging Gardens of Babylon as an irrigation pump. The Hanging Gardens of Babylon were one of the seven ancient wonders of the world. Even today, we still use that device on assembly lines, propellers, and lots of other places.

My favorite story about our communities is his discovery of a method to determine the volume of an object was an irregular shape. Today. We use that method and we call it the Archimedes principle. Now how exactly he solved this problem is up for a little bit of debate. But as the story goes, he was approached by the same king Heiro the second who feared that his shiny new gold crown was perhaps made with inferior materials. Maybe someone snuck some silver in there to dilute the purity of the gold. Archimedes, couldn't just melt the crown down to see what it was made of. He needed to devise a non-destructive method of seeing well that's complicated.

He noticed though that water levels rose in his bathtub when he got in and they went down when he got out, given that water is incompressible. If we compare the mass of the crown with the amount of water displaced, when it was submerged, we could determine its density. And then we could compare it with something else, maybe the same quantity of pure gold and see if they displaced the same amount of water.

If it was mixed with something less dense, like silver, it would displace a different amount of water. The story goes that he was so excited , that he ran the streets, naked, crying, "Eureka!", which in Greek means, " I've found it!". As it turns out the crown was indeed mixed with silver.

So never try to fool royalty who has someone like Archimedes working for them. Now, whether the story is true or whether he used a different method. In fact, there is another method that we refer to as Archimedes' principle today. We don't know, but in any case, he clearly saw the value of a research-driven approach.

Let's take a brief digression for a moment and let's define what we mean by a research paper. A research paper is a scholarly work that contributes some new knowledge to a field. That paper is usually peer reviewed, meaning other scholars review it for its quality. And then if there is some kind of agreement to its quality, it is then published in some academic journal or maybe a conference.

The more prestigious the journal or conference the better, but it doesn't have to be published in a journal or a conference to be a valid research paper. You'll hear me talk a lot about the importance of papers and I'll talk about quite a few of them in detail. The reason they matter is that they bring some new knowledge forward.

Now that new knowledge can be either brand new, original research. Or it can shed light on some existing research, perhaps by introducing it in a different light or challenging some of its position, or maybe extending its application papers, usually follow a pattern.

First, they introduce the problem... they tell us why it is important. Inform us as to the current research on the topic. Why should we care? Why is it an important problem? What have others tried? Why is it worthy of our attention?

Second, it presents some hypothesis. What exactly will you gain by reading this paper? What's the solution it's presenting.

Then you'll get to the meat of the paper, the solution and results sections. This part takes months to write. It's where you painstakingly present the solution and provide enough details so that others can reproduce your results. A paper that doesn't have reproducible results is worthless. Many peer reviewed papers, fail here and have to go through rounds of revision, (if they're even accepted at all). It took me over a year to get my thesis right.

Finally, the paper closes with the conclusion. What did you determine the outcome was? What did you specifically exclude in your thinking process and what are some next steps? What are some limitations that this paper presents?

The oldest known modern research paper in English was published in the Philosophical Transactions of the Royal Society in 1665, I would argue that Archimedes' research is a modern research paper. It includes an introduction, a hypothesis, exploration on the prevailing research () complete with citations), a solution, and finally, a conclusion.

The only thing missing is that it wasn't peer reviewed and published in a prestigious journal. Or was it, it was addressed to King Gelon the son of king Heiro. The second mentioned earlier, not to be confused of the possible ancestor of the same name. Gillen was a ruler of Syracuse in Magna Greca. Magna Greca means "Great Greece" and it is in what is now known as Southern Italy and Sicily. He helped make Syracuse the greatest Greek city in the west. Gelon obviously had some aptitude to understand the paper and for Archimedes to write to him. For Archimedes to write to him directly indicates the prestige of the publication. The paper was originally called

"Psammitis" translated today, roughly as the Sand Reckoner. What was the paper about? Well, he starts the paper by asking the question. "How much sand would it take to fill the universe?", Saying that some people think it's infinite, wouldn't it eventually fill the universe?

So if it were infinite, it should fill everything. So I wouldn't exactly say that the paper is about filling the universe with sand, but we'll talk about that a little bit. Many at the time believed that sand was infinite. But as mentioned, if it was, would it not fill every crevice in the universe, how would there be room for other things like people, how much sand though, would it take to fill the universe?

Now that question is very interesting. This paper changed the world. Why? I like to think about important papers that way. Since we're trying to move that collective bar of knowledge a bit further. We are changing the world. Every bit of knowledge we add is a stepping stone for further advancement, some papers more than others, but this is one of the papers I think that fits the criteria of being really world changing. There are three key things in this paper that I think change the world. First, we have to scrutinize our understanding of how big the universe is. Next, we have to prove that sand is not infinite. And finally, we have to invent a new number system.

And we'll talk about that one in a moment. Let's start by pondering the question. How big is the universe? He cited work done by Aristarchus that claimed, albeit incorrectly, that the earth was the center of the universe and that the sun was the outermost boundary. Now that would create a sphere such that the earth was the center.

And if the sun was the outer most boundary, that sphere would represent the entire void of the universe. If it was a sphere we could then compute its volume.

The second thing he needed to establish was what was the size of a grain of sand? They didn't have a unit of measure that small. So what he did is he lined up poppy seeds and he determined that 25 poppy seeds end to end measure about an inch.

Then he assumed that at most 10,000 grains of sand would fit inside one poppy seed. At that time, the largest representable number was a myriad or 10,000. So a myriad grains of sand had a diameter of a poppy seed. Then it follows that a one inch area, one inch by one inch square would require 16 million grains of sand.

Now, again, they didn't have the million in their number system, so that would be 1,600 myriads. Despite the fact that the background research on the universe was wrong. The earth is not the center of the universe and the universe extends far beyond the sun. It was still a solid starting point to challenge the notion that sand was infinite. Even more importantly, he established that even having enough sand to fill the earth volume alone was problematic, but it wasn't problematic for the reason you think. There was no number big enough to represent that much sand.

Okay. So the largest number representable was 10,000 a myriad. You could represent a million as a hundred myriads. So it makes sense at the time that that could have been the ceiling for their number system, after all, how many troops do you have? How many people would you have? How much land would you have?

However, when we start talking about big numbers, such as universe, big, you need something new. We could continue counting, maybe hundreds of myriads, a hundred married members, a million or thousands of myriads. But suppose though we had to count higher than a myriad of myriads or 10,000, 10,000. To do that he proposed a number system in which we could represent a myriad of myriads as an order. Today, we think of that as exponential notation, although we base ours on the unit 10, not 10,000. So a myriad myriad would be represented by 10^8, today. That would be the first order to go higher we could then go up another order and represent the number as a myriad-myriad to the power of myriad-myriad, and call that the second order, and so on.

In his paper, he went as far as going eight orders. Although this system could support any number by adding orders, infinitely eight orders of this system would represent 80 quadrillion in today's number scheme. That's sufficient for what he was trying to do.

I was born a first-generation American of Greek immigrants. So something spoke to me about this paper from a heritage perspective. These historical individuals have a strong and deep connection to my heritage, much in the same way as I feel more connected to Benjamin Franklin or Thomas Edison as inventors, because I'm an American.

As I grew up, I found myself between these two worlds. A part of me was distinctly American and a part of me distinctly Greek. At the same time, I thrive in the culture of learning and the American way of life. I still long for the simplicity and natural connection of island life in Greece. I say island life because my father was born on the island of Chios in Greece.

That island itself has a very long history of entrepreneurship being seafarers () In fact, I'm often asked to my father was a ship captain... and he did indeed spend a lot of time on ships, but those are stories for another day). The thing that impressed me most about Chios is that they have an evolutionary path, much like that of the United States.

They value entrepreneurship and they have a long historical connection to independence. There's a lot of history there worth exploring. And when I did, I found I could understand myself a bit better. Why am I driven the certain directions I am? How do I satisfy this restlessness I have?

My father was the youngest of five children surviving Nazi occupation () and that's another story for another day). He became a Seafarer and after his time in the Navy traveled the world and eventually settled in the United States. My father was the eternal optimist choosing to see only the best in people despite having literally seen the worst. I didn't fully understand the value of that until my twenties.

In fact, I didn't really understand how important cultural heritage was at all. I started seeking to understand Greek history and language. I could speak Greek before English thanks to our home, but I started learning ancient Greek. In fact, English became my primary language once I went to school, but learning ancient Greek was a bit of a turning point for me.

I began to have this desire to visit places that Aristotle, the apostle Paul, and Homer walked. In fact, Homer, the author of the great works like the Iliad and the Odyssey was also from the island of Chios. So reading papers like Archimedes' Sand Reckoner to kind of different meaning when you know a bit about the man, his stories, his lineage, the world at the time, and his daily life. He takes on a different meaning when you realize he's not just some abstract genius.

He also had struggles to contend with contemporaries to learn from, research to do, things to learn, and a family to feed. So how do you solve hard problems? What can we learn from Archimedes? A problem this big is fraught with landmines. He could have gotten lost in proving how large the universe was making this entirely worthless.

And an undiscerning reader could have thrown out the entire paper as it sits, because that premise is wrong. As we mentioned, the earth is not the center of the universe. The sun is not some kind of boundary, let alone, even the boundary of just our solar system, when faced with a very hard problem, we may stare at a blank canvas and not know where to go.

For those of us who have gone through the thesis or dissertation process, we've had moments or weeks or months like that. The first step is to break down the problem into a set of smaller problems and attempt to solve each of those, just like Archimedes did then start thinking about what we do know. We're not trying to boil the proverbial ocean, here.

Archimedes established the larger problem. He listed a set of smaller problems that he would intend to solve. And then he discussed and referenced some standard of knowledge. What we do know, albeit that it wasn't entirely correct. We can do the same thing. While certainly effective in academia, I have found this methodology effective in practice as well.

For example, I encountered a very hard problem of clinical decision support computation. A clinician just doesn't have the time to sit and ponder long. And the decision process time is of the essence. A supportive system would need to compute in a few hundred milliseconds, but if you consider the movement of the network, data, and various participants in this process, you literally only have a handful of milliseconds or even microseconds to actually compute some kind of decision to support the clinician.

Add to that, that you may have millions of patients worth of data, which is hundreds of gigabytes or tens of thousands of rules to execute. It's a big problem. We could try to take the naive approach and trying to cobble together something off the shelf. Just hope for the best. But as the saying goes, "an abstraction is an intersection of benefits and a union of problems".

You have to think differently sometimes. So we broke the problem down into smaller, still hard problems. And look toward the research. We analyzed the tooling that was out there, including DROOLS style, RETE engines, but those suffer from memory explosion problems. If you've never heard of the memory explosion problem, Google it. Now, Drools style engines and RETE algorithms, trade memory for performance.

What that means is they'll create in-memory structures to make computation a bit easier. Now that works great if w what you have can fit in memory. However, if you have say a lot of patients worth of data, or a lot of rules to run, and it gets bigger than the memory, you're going to run into a problem.

In fact, the reason it's called the explosion problem is that the more rules and the more data you put into the system, the amount of memory you need exponentially grows. It explodes out of control. It is not linear. So we recommended what would terrify many. And that was to build your own data system. Now I'm not a proponent of "not invented here", which means that just because you don't have full control over it, you should rewrite it.

But, let's consider what it means to make your own data system. A database is effectively a parser, a computation engine of some kind with an optimizer, and some kind of storage component. While that may sound scary, it's built on plenty of well understood principles, compiler and parser theory is very well known as is Lambda calculus.

Those have been around for decades. Granted, not usually applied to this problem or other problems per se. But the storage technology we were using was fairly new. We looked at state-of-the-art papers and systems seeking to learn from them. How was Cassandra, Kafka, Spark, Big Table, Dynamo. How were they doing it?

That's the heavy lifting. The result was an engine that computes in a handful of milliseconds.

While that is a case of success. Sometimes we're met with disappointment. My original thesis was going to center around using a genetic algorithm to determine the order in which a database should perform a join. Now, a join is where you take several relations or tables and put them together. But if you have a query that requires you to join, say four relations.

How do you do that? Which two do you join first? And then subsequently, which is the next one, which algorithms do you use to join? There are several, we could loop them in an inner and an outer loop and just traverse both of them. We can hash one of the relations and then stream the other one in to compare in the hash.

How many threads do you use? How much memory do you reserve for the results versus the computation versus the hash. All of that are tuneable and hard things to know. Now we have some intuition and what to do through the years of research. However, it's not quite that easy. So databases normally keep these statistics to help give them hints as to what to do. However, those are wrong more than you'd think.

As I was continuing my research though, I started to see that this approach wasn't going to work any better than simply brute forcing our way through a grid, search testing, maybe every kind of algorithm and every kind of memory configuration and trying to find the best one, or maybe testing every permutation of pairs to see which ones would be optimal, that wasn't going to work.

There just weren't enough factors that made it hard enough to take advantage of the randomness in a genetic algorithm. A genetic algorithm is kind of what you think it is now our DNA is made up of genes and these genes contain some pieces of information about us. And when a species procreates half of the genes from one of the parents and half the genes from the other are expressed in the child.

So, what happens is you could take half of the genes of the mother and half of the father, and then randomly mix them together. And then you'll have a new set of genes and you do that times, the number of children, and you'll get different permutations. Now, applying that to a genetic algorithm is similar.

Let's say each of these little decisions we have to make how much memory, maybe that's a gene. What kind of memory, maybe that's another gene and so on. Genetic algorithms work very interestingly, if you were to just randomly cut half of the genes and splice them together from each of the parents, there is a possibility that you might not be related at all to one of your grandparents, however, by randomly shuffling them and putting them together, you get a complete randomness.

Now you might think that over time the population would all homogenize. In other words, we will all look the same, eventually, if we all inter marry and there were enough generations, we would all look identical because our variations would end up kind of getting lost. But in fact, that is not the case.

What happens is every once in a while, there is a rate of mutation that some gene will randomly change preventing us from homogenizing. And the same is true in a genetic algorithm. We don't want to homogenize. We want to maintain that randomness. For example, let's say that I was hiking and I was trying to climb some mountain range and I wanted to get the top of a mountain.

Now, if I randomly went in any direction in a range of mountains, I will eventually get to the top of one mountain. But is that the top of the highest mountain? I want to find the highest mountain. Every once in a while, you have to throw a little randomness in there to make sure that you go a different direction.

So you don't get stuck in always going in one direction and finding one peak and then staying there, that's called "the local minima". And I noticed that my genetic algorithms seem to be stuck in local minima often. So I switched my thinking a bit. I started thinking about the problem in terms of artificial neural networks. Those seem very promising.

A few months into the research a paper was published by Brandeis University that proposed a method of using a tiny neural network with a technique called reinforcement learning to determine optimal joint order over time. Now they didn't talk about all of the things they didn't talk about the kind of algorithms and how much memory, but it was pretty effective at determining the order of joins.

They beat me in a publication, but it was also a fantastic approach and honestly, a bit better than mine. My thesis director, Stratos Idreos, and also the director of the lab that I was a part of at Harvard university () the Harvard data systems lab) reminded me that even discovering that something doesn't work is worthy of publication in that it pushes the boundary of knowledge forward.

Still. That's pretty unsatisfying when you're looking for a break. You start to wonder, do I even have anything interesting here? Is this just going to be lost somewhere? Nobody references papers for things that provably don't work all that often. Fortunately the breakthrough did come. When I pivoted to thinking about the decision boundary between scanning a table or using an index, I was able to apply an artificial neural network there and significantly outperform Postgres' query optimizer by 10 times. The thesis got published, it won an award, and it's now patent pending as I apply it to other technologies. It occurred to me , though, that while this idea was a disappointment, maybe there's a disappointment- elation cycle that I had to break. And you get through that in doing breakthrough research.

It also applies to a lot of work that we do, but while it does, it has some distinctly academic aspects of it. Academia is much more susceptible to it. And practical application is a bit less susceptible to it. Let me explain. Since the goal of the academic is to discover something that didn't exist before, or to understand something in a new way, it necessarily requires exploration and research that may or may not bear any food at all. In practice, though, we seek to understand new hypothesis and new applications. However, many times we seek to timebox that into a proof of concept that proves that we can apply some component of it to a new problem.

It's really unusual for a project to spend an unbounded amount of money or resources on research, just for the sake of discovery untied to any business objective. It's not just about learning about the problem, come what may. We usually ask, "what's my return?", "How much will it cost me?", "Will there be profit?"

It's impossible to know if you're even sure you're not going to find anything at all. If you're going to hit a dead end. So, that's why you don't really see that in practice. And that's why I think that that elation, uh, disappointment cycles probably a little more rare. Instead, we usually seek to apply something that has worked somewhere before that we get a hint, a little bit of intuition that this might work, but in a slightly different way.

And we will want to just explore that a bit. In that sense, we could take the best of both worlds, keeping the best of research and academia and then apply that to the practical work at hand. Still it's difficult to solve a hard problem when the solution isn't readily apparent, there may be a solution out there and at times we may have this intuition that there must be something out out there... "I can't be the first one to want to do this", but we just haven't found it yet. To that, I like to apply a personal Axiom: "when the solution isn't readily applied. Be patient keeps researching. The solution will present itself in time."

I know that this sounds like we're being hopeful, but Thomas Edison believed in it. Matthew Walker, a renowned sleep researcher recounted a practice at Thomas Edison had. Thomas Edison is one of the most prolific inventors of our time. He rarely got a long night's sleep. Instead, he would sleep in bursts, in a chair in his study, holding steel ball bearings in his hand, and on the floor below his hand, he would have a tin tray.

As he progressed through REM sleep to deep sleep, his muscles would relax and he would drop the balls, clanging on the tray and waking him up. He would then immediately write down anything that came to his mind. Matthew attributed his ideation, his ability to create all these inventions to the sleep process.

One of the purposes of REM is to test new neural pathways, to find solutions to problems we've encountered, perhaps there's something to this, "sleep on it" Axiom. So letting a solution marinade has some value.

Another reason that Archimedes' paper spoke to me was that it reminded me of my own journey as a computer scientist. I've served in various executive roles, CTO, CEO, COO, and I love what I did, but I never finished my degree. I started as an electronic engineer and I drifted into computer science.

It wasn't that I particularly cared about completing my diploma as much as I felt a little restless and what I was doing. Was I doing the right thing? Was I, finding my calling was this what I should be doing. I decided to put everything on the table and consider reinventing myself. First, I needed to finish my undergrad and then get to grad school.

And grad school would be a version 2.0. One day I was called for jury duty, and that further reinforced the nagging feeling I had because it made me think, I bet I could do a pretty good job of being a lawyer. I didn't think that the prosecution attorney was really that good.

So research began. Should I go to law school? Med school? I had so much already invested in computer science. I loved working with health technology and was fairly involved in the standards process. Then I realized that we've been so focused on moving data, which was one of the biggest problems in healthcare, just Google health care data blocking and you'll see what I mean.

Eventually that data would indeed move, but someone needed to analyze it. Other industries were already doing this, but healthcare is really in its infancy. It's actually a really exciting time to be in healthcare, to be in computer science, in general. We're lucky because we're blazing the trail we're training and learning and creating things.

The papers that we write are going to be the papers that are referenced in the future. Artificial intelligence and machine learning are brand new, still. So I came to find Harvard. They have a great stats program, a great medical school and the great, but fairly unknown in comparison to their law and med schools an engineering and science school.

In fact, I'm often told , "wait... Harvard has computer science?" From my second class. a class on database kernels, I was hooked and joined the data systems lab, doing research in addition to my day job and having kids, I'll never forget one of the quotes from Bruce Lindsay. And one of my first classes, he said, "three things are important in a data system, performance, performance, and performance".

In fact, you'll hear me say this a lot. It made quite an impression and I'm sure that my team has heard it hundreds of times. I've heard it said that academia and practice tend to be at odds with each other. One looks at the other scornfully that didn't make sense to me. There's no doubt that both are critical.

I found myself the product again of two worlds. Do I want to teach and be a professor? Discover, continue working in the lab as a researcher? Or do I want to build something, build a practical application people are going to use? I love all those things. Why do I need to choose between them? Although I love to write papers and to do scientific research I wasn't quite an academic at heart. " Publish or perish" seems to be at odds with searching for knowledge, for knowledge sake. Now, if you've never heard of "publish or perish", It's an idea. It's a practice that academics have to continue to publish in order to become relevant. You don't get tenure unless you publish enough.

In fact, academia is extremely competitive. There are many people entering that industry, and there's not many spots. So you have to out publish your competition. However, that metric is a bit odd. It isn't as though we're trying to make new discoveries as much as published papers on them. That didn't feel right.

But, although I love to build things that are really useful in the world. I wasn't quite an implementer either. I'm not someone who can find some off the shelf solution, not care about how it works and live with whatever the drawbacks are just because I don't want to, or have time to understand it. Or maybe I'm afraid to try it.

No, I was part researcher and part implementer. The person who loved the thrill of discovery, but also wasn't satisfied until it was actually used somewhere. There's a saying in computer science, "if you find yourself doing too much theory, do more application and it will make your theory better. If you find yourself doing too much application, do more theory, it will make your application better."

That makes sense to me. At Harvard, I met countless, extremely intelligent professors and mentors. Some that stand out were Stratos Idreos, my professor, as I mentioned, he was the one that I took that second class. And after taking that class, I took every single thing he offered. He's also the head of the data systems lab.

And I finally convinced him to be my thesis director. That's another story for another day. Manos Athanissoulisis now a professor at Boston university and a prolific researcher in his own right, was also one. Archimedes stories certainly strikes a chord when I think about the Greek professors at Harvard, but it's still more than that.

This story of Archimedes felt like a familiar story of a person stuck between the world of academia and application. But as he successfully navigated it, he found his balance. I came to the conclusion that like Archimedes, I'm a product of two worlds research and practice.

So does this idea of combining these two worlds really work? Well, I found that in our work time and again, taking the time to apply research yields, massive benefits. Let me give you another example. One of our clients had a very common problem. They needed to map one kind of data into that of another that's a problem.

I'm sure that we've all seen millions of times. It's in every single data system. There's always some kind of translation we have to make. So we normally punt that problem and micro optimize the system in other ways. But this time the system was so optimized that the cost of mapping made up almost all of the latency.

There is no other place left. If you want it to continue to optimize you either did that or nothing. Countless consultants and teams tackled the problem, but couldn't find a way to improve it. So true to our nature. We found a paper that described a method of serialization using a technique called zero copy. Zero copy is very well known in the Linux kernel.

The idea is, why copy the data over and over. Let's just move pointers to where the data is and what it might look like. And then each time you don't actually traverse the data, you move through the pointers to see if you care about what's there. What was new about this particular paper was that they had a clever method of using the SIMD processor instructions to determine the boundaries or the edges of each of the pieces of data, thus avoiding the penalty of reading and copying data.

I won't get into the details of SIMD. We'll put that in the show notes, but the SIMD instruction set is a way of doing many instructions at once on the processor itself, implementing that solution resulted in improvements, orders of magnitude better than others. And it was relatively simple, although it can be frightening to build something new or to look at the math and then think it's too complicated.

It really isn't and it's worth the business benefits.

Naming a podcast is itself. A hard problem. You're staring at a blank canvas. But when an intractably hard problem presents itself, what do we do? What would Archimedes do? Break it down into smaller problems. And then let them sit for a while.

So we let the problems sit for awhile. Then when we read this paper, we thought that's it! Eureka! Well, not Eureka and not sand reckoner. So true to the form of bringing together theory and application. This podcast will be a little bit different than others. We'll talk about some big problems. We'll discuss both the theoretical side of it. And we'll also talk about the practical applications, ways to solve it. So was Archimedes' paper really about sand. Or was that just a vehicle to a bigger idea? I like to think it's a formula for solving really hard problems, but it's also a formula for finding a balance and expressing yourself in a way that makes sense. I am excited that the listener is with us on this journey.

I’m Angelo Kastroulis and this has been Counting Sand. Please take a minute to follow, rate, and review the show on your favorite podcast platform so that others can find us. Thank you so much for listening.