Counting Sand

Angelo Kastroulis

Counting Sand, the podcast that tackles the hard problem of how to make meaning of all the data available today. Introducing the themes at the heart of big data, high performance, and computer science, the show highlights the most cutting-edge applications. Whether discussing the best designs for a complex data system or the social implications of bringing a diverse skillset to data science, each new episode will provide research-backed perspectives on today’s hardest problems. read less
TechnologyTechnology

Episodes

AI Hot Sauce Taste Test Challenge
26-07-2023
AI Hot Sauce Taste Test Challenge
Key TopicsAI-optimized vs Commercially Available Hot Sauce: Angelo and Petter perform a blind taste test with three different hot sauces, one of which is AI-optimized, to see if they can determine which one is created by AI.Background of the AI Hot Sauce Creators: A brief insight into the story of Shekeib and Shohaib, the two brothers who combined their passion for data science and business to create an AI-optimized hot sauce.Understanding Bayesian Optimization: A comprehensive discussion on Bayesian Optimization, a technique that uses previous knowledge to influence future decisions, perfect for creating unique hot sauce recipes.Discussion on Other Optimization Techniques: Petter invites Angelo to delve into the different types of optimization algorithms and their pros and cons.Understanding Gradient Descent: Angelo gives a brief introduction to the concept of Gradient Descent, a popular optimization algorithm, explaining it as akin to finding a valley when on a mountain.RecommendationsCheck out the previous episode interviewing the creators of the AI-optimized hot sauce to understand their process better.For tech enthusiasts interested in AI and its applications, further exploration into optimization techniques like Bayesian Optimization and Gradient Descent can be insightful.Episode Quotes"Hot sauces are part of my favorite start of the day, so it'd be interesting to see what AI could come up with here." - Petter Graff"Bayesian is an optimization technique that centers around using your previous knowledge to influence the future and that works really well." - Angelo Kastroulis"Bayesian can kind of skip a bunch of steps because you've got a better second try." - Angelo Kastroulis"The algorithm of gradient descent basically goes like this. If you're trying to find from where you are to where you should go, imagine that you're on a mountain trying to find the valley." - Angelo Kastroulis Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
AI Hot Sauce Brothers Part 2
12-07-2023
AI Hot Sauce Brothers Part 2
Introduction:Angelo and Shohaib discuss the inclusion of new ingredients in hot sauce batches.Shohaib explains the process of introducing new ingredients and the excitement surrounding it.Incorporating New Ingredients:Angelo asks about the approach to incorporating new ingredients: creating new models or expanding the feature space.Shohaib suggests keeping the base model and increasing the search space for new ingredients.Both options are considered, including transferring the optimal features to another model.Metaphorical Understanding:Angelo highlights the advantage of using hot sauce as a metaphor for complex concepts.Shekeib acknowledges the clarity provided by the hot sauce analogy and the opportunity to learn more.Engaging with Mathematics:Angelo expresses his enthusiasm for discussing the mathematical side of AI.Shekeib shares his brother's interest in math and how it goes beyond his own understanding.Shohaib emphasizes the subset of AI concepts being discussed and the value of conceptualizing them through hot sauce.AI as an Expansive Field:Angelo mentions that AI encompasses various subfields, such as machine learning, Bayesian optimization, and active learning.Neural networks, deep learning, and reinforcement learning are discussed as additional branches of AI.Shohaib highlights the similarities between Bayesian optimization and reinforcement learning.Reinforcement Learning:Angelo mentions the significance of reinforcement learning in solving video games and its applicability to different domains.Shohaib shares his experience with reinforcement learning in an AI class, specifically using it to make Pac-Man play autonomously.Specialization and Continuous Learning:Angelo praises Shohaib's expertise in Bayesian optimization while acknowledging the vastness of AI knowledge.The discussion emphasizes the complexity of AI and the continuous learning required to stay up to date.Generative Pre-trained Transformers:Angelo brings up the popularity of generative pre-trained transformers like ChatGPT.The ensemble nature of these models and their unique combination of techniques is highlighted. Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
AI Hot Sauce Brothers - Part 1
21-06-2023
AI Hot Sauce Brothers - Part 1
IntroductionShekeib and Shohaib join the podcast as guests to talk about their experience with creating hot sauces using AI optimization.They created a special hot sauce, named "Counting Sauce," specifically for the podcast hosts.The Making of Counting SauceThis is a unique hot sauce that includes pineapple and mango flavors.The sauce was created as a token of appreciation for being featured on the podcast.Journey through Different Versions of the SauceThe hosts have tried versions 19, 20, 21, and they just received version 25.There will be a blind taste test to determine if they can tell the difference between the different iterations and compare them with other sauces to tell which is AI-created.Optimization ProcessThe process involves optimizing the amount of each ingredient.They use a Gaussian process regression model and an acquisition function called Expected Improvement for the optimization.Choice of IngredientsThe base hot sauce has five main ingredients: vinegar, pepper, jalapeno, and lime.After 25 iterations, the differences in taste become so minute it becomes hard to tell the difference.Subjective Taste TestingShekeib talks about how his taste tolerance changes after tasting hot sauces all day.They involved family and friends in the tasting process and asked for ratings on a scale of one to ten.The Learning Curve of the AIEarly on, the AI would try extreme variations like too much or too little salt.It learned quickly from feedback and adjusted accordingly.Strength of Bayesian OptimizationThe AI can learn mathematically from feedback and apply the learnings, making the optimization process quicker and more efficient.It was also able to tweak multiple ingredients simultaneously, unlike a human who might focus on one ingredient at a time.No Prior Experience in Hot Sauce MakingBoth brothers had no prior experience or generational knowledge in hot sauce making.The AI managed to create a decent hot sauce in just five iterations.Power of Bayesian Optimization with Human ExpertiseThe brothers emphasize the importance of having a human expert in the loop of Bayesian optimization.The AI simulates the intuition and experience of a human expert, but having a real human guide the process further enhances the results.Application Beyond Hot SauceThey discuss the potential of their Bayesian optimization process in other areas such as drug discovery.The process can be guided by human experts in the respective fields for even better results. Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
Dynamo: The Research Paper that Changed the World
05-07-2022
Dynamo: The Research Paper that Changed the World
The cycle between research and application is often too long and can take decades to complete. It is often asked what bit of research or technology is the most important? Before we can answer that question, I think it's important to take a step back and share the story of why we believe The Dynamo Paper is so essential to our modern world and how we encountered it. Citations:DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (pp. 654-663).Lamport, L. (2019). Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport (pp. 179-196).Merkle, R. C. (1987). A digital signature based on conventional encryption. In Proceedings of the USENIX Secur. Symp (pp. 369-378). Our Team:Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaAudio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
Galaxy Evolution
15-03-2022
Galaxy Evolution
The episode touches on the wondrous journey a galaxy undergoes as it evolves through its life cycle. Angelo starts off the episode by asking the question, what's an early-type galaxy? Paolo Bonfini  explains that although you may think that early-type galaxies would be galaxies early in their evolution, they're not, they're galaxies a little later. They're the ultimate evolution of two galaxies coming together. Based on the topics touched on in Paolo's paper he then explains the role that supermassive black holes play in galaxy evolution. Paolo explains, "thanks to the recent development in gravitational-wave astronomy, which opened a completely new window of exploration because it's not based on electromagnetic waves, but on gravitational waves, which are a completely different thing. We are now able to explore black holes in more detail and we're able to study when supermassive black holes merged to create a bigger one."Relating to the idea of bringing new technology forward, Angelo asks has any computer science techniques assisted you to be able to model this or put it together? Paolo explains, "there are a lot of computations involved in this process. People have in mind the romantic view of the astronomer who just looks through the scope of the telescope and notes things down on a piece of paper, but modern astronomy is completely digitalized. And recently it has been even automated by a lot of procedures that they track and scan the sky to create huge catalogs. Even the images themselves, they are captured on digital devices, like, the same as they appear in the phone, basically, the same technology, but just on a more refined scale. And the first process for which you will need a computer is to combine exposures. So you cannot expose a telescope on a specific direction in the sky for a very long time, for several reasons. The summary is that, in order to take an image of some patch in the sky, you will have to take multiple images and then combine them. Now the modern telescopes, they are extremely accurate. So when you combine them, you need to align the stars to a sub pixel resolution. That means that you have to find the center of the star and itself be positioned within a single individual pixel. And when you combine images, you have to align them by with the precision of, let's say, a third of a pixel, which sounds impossible because you're like, how can you do that? But, there are some techniques that allow you to do that. And of course you need a lot of computational power for that. It can take several minutes to do this even a half an hour, let's say, to combine and produce the image that you see on famous websites, like the Hubble. I mean, this is just the first step. You mentioned a thing you need to actually extract, in my case for the study I was doing, in order to assess the lack of stars at the center of a given galaxy you actually have to measure it. So what you have to do is, you have to trace the light profile, starting from the outskirts of the galaxy going gradually towards the center. In this way, you can draw a light curve if you want. It's not exact, it's more like a light profile. So you have some intensity at the edge of the galaxy, which would be low intensity because the light is very diffused and all the center it grows, grows, grows. And at some point you will see doesn't grow as much. That's where you meet the depleted core, but you also need to quantify this because you want to actually extract the information about the amount of depleted mass, like comments that you would expect it to be versus how many you actually measure. So, you have to fit the light profile. And this is done by, okay. In my case, I've been doing this with some kind of basic statistical technique, which is the chi squared fitting. So you have our model and you just fit the model to the observation and once you have the model, you can project only the other path towards the center and you compare it with the actual model that you fit. And from the difference between the two, you have the amount of stars that are missing. So you need to explore a lot of parameters and therefore you need to have this thing automated via computer technology. There is no chance you can get this information doing it by hand."Referencing the famous space observatory, Hubble, Pablo explains what it was like to work with such a brilliant piece of machinery. He shares, "it's really amazing because the Hubble telescope was launched in the nineties and just to give you an idea is roughly the size of a bus. There is a replica of it you can visit at, I think it's the Aerospace Museum in Washington, so if you're curious. The main mirror is 2.3 meters in diameter, just to give you an idea, the larger the diameter, the higher the resolution you can achieve. On Earth, there are bigger telescopes. The biggest telescope we have on Earth is currently 10 meters. It's on the Canary Islands. On Earth you have the atmosphere on top of you and this makes everything flicker a little bit because you know, there is air moving, and these big masses of atmosphere move and this shifts the path of the light and this causes the images to be more confused. If you are instead outside the atmosphere, you don't have that problem and you really achieve the limiting resolution of your instrument. So the Hubble Space Telescope is particularly famous because of its resolution. It doesn't have a large collective area, it’s only two meters, let's say, so it doesn't collect a lot of light per second. So it doesn't have, let's say, the same contrast as ground-based telescopes, but it has extremely high resolution. So when you open an image and you're saying, okay, I want to look at this galaxy and I will work on this, which is at the center of the field of view because you pointed there. But, at the edges of it, you see a lot of tiny objects and if you zoom in you can see the structure. Maybe you see a lot of spiral galaxies around the merging objects in the background. And it's not at the center of your research. You're looking at the big galaxy at the center that you're studying. But, you know, it's like a small pleasure, small candy that you have for the eye. You're looking at these things around and you are like, well, man, this is incredible. There are so many things in the universe and I'm here focusing on these big galaxies at the center, but whatever else is happening in the background, and this is really the, I think it's the most impressive thing."Angelo concludes the episode by discussing the ups and downs of crafting a research paper. Paolo touches on the rollercoaster of emotions one undergoes due to the sheer volume of work that needs to be done. to the most rewarding aspect of writing such a paper. He explains, "you know that you are at the forefront of this research, and I think this is when the reward comes when you're actually presenting and you see the people being curious and asking you directly at the conference, “What is this?” “How did you get there?” “It's very interesting. Let's work together.” “This is an idea to make it even better” and so on." Our Guest - Thank you!:Paolo Bonfini - https://www.linkedin.com/in/paolo-bonfini-phd-085a6a179/ Paolo's  Paper:Connecting traces of galaxy evolution: the missing core mass-morphological fine structure relation Our Team:Host:Angelo KastroulisExecutive Producer: Náture Kastroulis Producer: Albert Perrotta; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
The End of Moore's Law - What's Next?
01-03-2022
The End of Moore's Law - What's Next?
Angelo begins this episode with the predictions of Moore’s Law.  In the early years, systems were restricted based on the CPUs ability to keep up.  As the CPUs continued to advance, the bottlenecks ended up around data movement.  Data movement of information from disc to memory and memory to cache become the big bottlenecks. Then of course, disks got faster and eventually you'd have so much RAM on a machine that it was just memory movement inside of RAM.  Eventually, we believe the bottleneck will return to the CPU.Quantum computing is on the rise—that, we believe, is a game changer for Moore’s Law.  Because we're no longer talking about conventional computer chips and transistors, instead we're talking about something completely different.Additionally, as machine learning and artificial intelligence systems make advancements, or like Angelo’s thesis using AI to tune data systems, the advancement of speed and acceleration will be impactful in order of magnitude from traditional systems.Talent and staffing will also change as we adapt to the future. Angelo admires Google’s practices of hiring ability over experience because the problems we face tomorrow are different than today.  The key thing is to be able to independently make progress because there isn't much room for babysitting. It's too hard to predict where the next fire will be. Angelo explains further why he hires ability over experience every single time, because it is true, someone who has ability, someone who's brilliant and has the hunger to learn new things can be programmed like a stem cell. They can just inject themselves into whatever problem they might have.Angelo transitions into his own personal story and his quest for fulfillment and happiness. He introduces a personal story of a boy who was dying during the Nazi occupied island in Chios, Greece.  A doctor took pity on this boy and secretly nursed him to health.  We later learn that this boy is Angelo’s father.  Angelo shares, “My father grew up in a world much different than mine. His siblings related stories of famine and suffering, but he never ever spoke of those things. What he chose to relate were accounts of human triumph, perseverance, hope, aspiration. The sea was his salvation, carrying him from Chios as a sailor, eventually to the United States.”So, what is our true potential? Intellectual achievements can be ignored or forgotten. But to be a successful family person, a husband, a father, a human, Angelo needed to be something more, something enduring.Education builds the qualities of perseverance, hard work, and accomplishment. There is no doubt you'll accomplish many things, but think about what it is that you're really trying to do. You see, building technical solutions isn't just about doing interesting stuff.  Ultimately we're building these things for a reason. We're building technology. For example, if you're doing a healthcare application, it's going to touch somebody's life. That's the point of this breakthrough, right? You want to increase throughput, for example, in decision support, something Angelo spends a lot of time on. We want to say, increase throughput, build a system that can compute faster and bigger sets of data. Why are we doing that? Just because of the challenge of the data? No, we want to find out if a clinical intervention is working so that we can feed that information forward to those making the guidelines.You see, that's the real reason behind doing this. The great resignation has shown us that people care more about what it is they're doing and why they're doing it than just simply being interesting work. We owe it to our family to use our gifts, talents, and opportunities to the best of our ability, but to use them on something that matters.Angelo is really excited that we're going to have interesting conversations around things like the universe, data centers, energy and how they work. There's a reason the hard problem exists. Don't fixate on the fact that it's a problem. Although there is joy in having a problem and solving it.We're trying something a little bit new this season and we would love to hear which kinds of episodes you like most. Do you like interviews or do you like some of the educational discussional episodes?We're going to start a YouTube channel to help deep dive on topics like LSM Trees or RocksDB which are better served with diagrams than with just voice.  Seeing the math for yourself or seeing the way that they operate for yourself on video is much more helpful. We're going to have supplementary content, bonus material that you can find on our YouTube channel, and we'll also have some bonus podcast episodes.  We look forward to your feedback. Tell us what you like about the show, which topics you prefer, and what you wish we would dive a little deeper on.  And we'll really try to do that. CitationsGordon Moore, Co-Founder of IntelHeisenberg, Uncertainty PrinciplePowell, James. (2008). The Quantum Limit to Moore's Law. Proceedings of the IEEE. 96. 1247 - 1248. 10.1109/JPROC.2008.925411.Merritt, Rick. (2013). Moore’s Law Dead by 2022, Expert Says of EETimesAtomic Hire (2019) Further ReadingMoore’s Law EndingWork and Culture at GoogleGoogle Strategy to Hire About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation. Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert Perrotta;Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
Bonus: Season 1 Recap
25-01-2022
Bonus: Season 1 Recap
Angelo begins this episode with reflections on history and what brought us to the AI Winter.  Why do we need a balance between research and practice?  You don’t want to rediscover what has already been discovered or settle for something that could be better if you took the time to research a bit more.  In episode 4 we meet Angelo’s friend Andy Lee who talks about computer science predicting our biological age.  Andy actually met Greg Fahy who talked about longevity.  The study focused on injecting the thymus gland with a growth hormone that produced regeneration effects.  The effects were measured through the epigenetic clock known as DNA methylation.In Episode 6, Jim Shalaby talks with Angelo about how COVID-19 changed healthcare forever.  Patients don’t have to wait in waiting rooms, they don’t have to find transportation to get there, and the patient has access to the clinicians.  The hard problems associated with explainability in artificial neural networks, we talked about in Episode 8.  Angelo’s friend Nikos explained to us about five classic problems, one of which includes data privacy.  Another big issue is developing a machine learning system to create adversarial attacks on the existing system.In episode 7, Angelo’s friend Manos shared how complicated it is for people to invoke their right to have their data removed from a system.  Typically those systems have to schedule deletions to remove the data through tombstones and a process called compacting.What is on the horizon and what should we be paying attention to?  We are going to run against barriers of technology. For instance, Moore's law is coming to an end. What do we do about that?  What is happening in the short-term and how do we get past this barrier to the next?  And then how do we blow away all those barriers with moonshots like quantum computing?Finally, wrapping up our first season, Angelo wants to reflect on gratitude.  Gratitude for you our listeners.  Thank you so much for joining us on this journey. We really want to hear about your thoughts. The show is evolving just as the world is and we want to make sure that we're covering topics that you're interested in.We would love for you to follow, rate, and review the show on your favorite podcast platform so that others can find us too. Thank you so much for listening. Our Guests - Thank you!:Nikos Myrtakis on  LinkedInManos Athanassoulis on LinkedIn and Boston UniversityJim Shalaby on Twitter and LinkedInAndy Lee on Twitter and LinkedIn About the HostAngelo Kastroulis is an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.Host: Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Albert Perrotta;Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
Machine Learning: Your Right to Explainability
11-01-2022
Machine Learning: Your Right to Explainability
How do we make the next generation of machine learning models that are explainable? How do you start finding new kinds of models that might be explainable? Where do you even start thinking about that process from a research perspective?Nikos begins with a discussion on how we make decisions in general.  In the scientific world, we mostly reason through statistical or cause-and-effect type scenarios.  We can predict outcomes and train our models to produce the results we traditionally expect.He then discusses other early pioneers in this work, for example, back in the 70s, a rules engine was developed to help clinicians make diagnoses.  It turns out that humans are very complex and hard to codify.  Dr. Charles Forgy wrote his thesis on the Rete algorithm which is what modern-day rules-based engines stem from.After the AI winter period, there was the introduction of neural networks that would encode the rules.  This became an issue for explainability on why the rule was created.  The neural networks create a mathematical weighted data model evaluated against the outcome.  Without the ability to open up the network to determine why some data was weighted higher than another, has been the challenge in explaining the results we see.  There is also a concern from the European Union General Data Protection Regulation (GDPR) where a human has the right to obtain meaningful information about the logic involved, commonly interpreted as the right to an explanation.    We want to look at explainability through two factors: a local point of view and a global point of view.  The global objective is to extract a general summary that is representative of some specific data set. So we explain the whole model and not just local decisions.  The local objective is to explain a simple prediction as a single individual observation in the data. But you have a decision according to a neural network or a classifier or a regression algorithm, so the objective is to explain just a single observation.  There are five problems that present themselves in explainability:  Instability, Transparency, Adversarial Attacks, Privacy, and Analyst Perspective.For Instability, we look at heat maps as they are very sensitive to hyperparameters, meaning the way that we tuned that network.  How we adjusted the sensitivity then impacts the interpretation. Transparency becomes more difficult the more accurate machine learning is.  We call that transparency because machine learning models, neural networks, are black boxes with very high dimensionality. But what's interesting is that we can say that their prediction accuracy makes explainability inversely proportional to that.  An Adversarial Attacks example is to imagine that interpretability might enable people, or programs to manipulate the system. So if one knows that for instance, having three credit cards can increase his chance of getting a loan then they can game the system by increasing their chance of getting the loan without really increasing the probability of repaying the loan.  Privacy can impact your access to the original data especially in complex systems where boundaries can exist between other companies.  You might not have the ability to access original data.  Lastly, the Analyst Perspective. When a human gets involved to explain the system, important questions include, where to start first and how ensuring the interpretation aligns with how the model actually behaved.  There are some systems by which the ML has multi-use and the human is trying to understand the perspective of use for the result given.  These are some specific ways we have found that create the complexity and challenges in explainability with machine learning models.We continue to learn and adjust based on those learnings.  This is a very interesting and important topic that we will continue to explore. CitationsDr. Charles Forgy (1979), On The Efficient Implementation of Production Systems, Carnegie Mellon University, ProQuest Dissertations Publishing, 1979, 7919143Nadia Burkart, Marco F. Huber (2020) A Survey on the Explainability of Supervised Machine Learning, arXiv:2011.07876 (cs) Further Readinghttps://openaccess.thecvf.com/content_CVPR_2019/papers/Pope_Explainability_Methods_for_Graph_Convolutional_Neural_Networks_CVPR_2019_paper.pdfhttps://towardsdatascience.com/explainable-deep-neural-networks-2f40b89d4d6f Nikos' Papers:https://www.mdpi.com/2079-9292/8/8/832/htmhttps://link.springer.com/article/10.1007/s11423-020-09858-2https://arxiv.org/pdf/2011.07876.pdfhttps://arxiv.org/pdf/2110.09467.pdf Host:Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
The Boundaries of Personal Data
28-12-2021
The Boundaries of Personal Data
Angelo and Manos' connection began in the 265 Course at Harvard University on Big Data Systems. This course inspired Angelo's thesis. The two discuss Manos' papers and how the future of Big Data is on the boundaries of Moore's Law. If you think about LSM trees (Log-Structured Merge Trees) and compacting data, what is considered acceptable deletion when users ask for their data to be removed? Is it when the data is removed from the identifying user that is good enough? In the analysis of Big Data Systems, considerations are always towards performance. An extensive delete sequence will cause a significant disruption in the system. Most people would address the completion of current execution cycles, perhaps during non-peak hours, and flag the no longer valid data. Maybe it could be that your data starts to become dirty, then what? How do you solve issues like privacy and the request for the "Right to be forgotten" or the "Right to erase"?  Manos speaks about the papers he has written, which you can read in the links below. He addresses the delete question and boundaries with privacy in mind.  Performance is a crucial factor, and looking at the issue holistically is just as important as encryption when protecting privacy.Mano's Research Papers https://dl.acm.org/doi/10.1145/3318464.3389757https://disc-projects.bu.edu/lethe/https://blogs.bu.edu/mathan/2020/06/29/lets-talk-about-deletes/Further ReadingCS265: Big Data Systems - Spring 2020Manos Athanassoulis homepageCalifornia Consumer Privacy Act - BCLP California Consumer Protection Act InformationGeneral Data Protection Regulation (GDPR) – Final text neatly arrangedFast 21 Chen Hao Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth
How did COVID-19 change the way doctors make decisions?
14-12-2021
How did COVID-19 change the way doctors make decisions?
Angelo begins this episode with a few questions about the changes caused by COVID-19, specifically around the patient data gathering, such as blood pressure. With telemedicine practice, how reliable is the data, who is legally responsible for the accuracy of the data gathered, and how exactly do clinical decision support (CDS) tools adjust with this new change in a traditional clinician workflow?Angelo explores more on the topic of IoT devices and the data brought into medical decisions. Again, how accurate is the data from these IoT devices, such as Fitbit scales, that a clinician can diagnose and treat from? Jim brings up some of the challenges that came with telemedicine such as workflow within a clinic. If the clinician seeing a patient wants the dietitian to speak with the patient, it is more of a challenge to coordinate than being within a few feet of each other. The other challenge relates to security policy and considerations patients need to agree to regard their personal privacy. To get into a virtual visit with a clinician, a patient has to follow the security protocol that provides a barrier for some elderly and disabled patients. Lastly, the challenge of all this data a patient could be collecting in their IoT devices is, how do you move that data into the EHR or in some format a CDS tool could ingest?With the use of CDS, machine learning, and AI, the future is ripe for opportunity.Further ReadingWhat is CDS - Health Gov ITResearchGate Publication on IoT in Health CarePrivacy-Preserving Single Decision TreeJim Shalaby on Twitter and LinkedIn Host: Angelo KastroulisExecutive Producer: Náture KastroulisProducer: Albert PerrottaCommunications Strategist: Albert PerrottaVideo/Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth