PyData Global 2022

Start asking your data “Why?” - A Gentle Introduction To Causal Inference
12-01, 13:00–13:30 (UTC), Talk Track II

Correlation does not imply causation. It turns out, however, that with some simple ingenious tricks one can unveil causal relationships within standard observational data, without having to resort to expensive randomised control trials. Learn how to make the most out of your data, avoid misinterpretation pitfalls and draw more meaningful conclusions by adding causal inference to your toolbox.


Are you interested in understanding Causal Inference but not sure where to start? In this talk I introduce the basic concepts demonstrated in an accessible manner using visualisations as well as python scripts.

In particular I illustrate the utility of Graph Models to visualise the story behind the data which enables going beyond correlations to make data driven decisions based on causation.

You will also learn how to avoid data misinterpretation pitfalls such as Simpson’s Paradox, a situation where the outcome of a population is in conflict with that of its cohorts. This will be demonstrated using pgmpy as well as a streamlit interactive web app: bit.ly/simpson-calculator.

This talk is targeted to anyone, technical or managerial, that wants to improve how they make data driven decisions. No prior knowledge in python is required; basic statistics is desirable but not essential. My main message is that by adding causal thinking to your analytical toolbox you are likely to ask better questions from data and ultimately get more insights from it.

For those inclined to learn more in depth about Causal Inference, I will summarise with advice on how to climb the "causal ladder" by suggesting useful resources.


Prior Knowledge Expected

No previous knowledge expected

Ex-cosmologist turned data scientist with over 15 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.

My main drive as a data scientist is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.

My claim for fame is between 2004-2014 living in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).

This speaker also appears in: