PyData Global 2022

What-if? Causal reasoning meets Bayesian Inference
12-03, 12:30–13:00 (UTC), Talk Track I

We learn about the world from data, drawing on a broad array of statistical and inferential tools. The problem is that causal reasoning is needed to answer many of our questions, but few data scientists have this in their skill set. This talk will give a high-level introduction to aspects of causal reasoning and how it is complemented by Bayesian inference. A worked example will be given of how to answer what-if questions.


Core objectives:

  • Make the case that causal reasoning is required to answer many important questions in research and business.
  • Flesh out how causal reasoning and Bayesian inference complement each other.
  • Convey how some what-if questions can be answered using Synthetic Control methods.
  • Illustrate how to use Synthetic Control methods in practice with a worked example with Python code snippets (using PyMC) and empirical results.
  • Introduce the new Python package CausalPy.

The talk will be a high-level overview, with very few (if any) equations. Rather, I focus on conveying the intuition and practical steps to answer what-if questions through concrete examples. I will provide references for those wishing to flesh out their understanding after the talk. This talk is aimed at a broad audience - anyone wanting to learn about the causal structure of the world, whether for fun or profit. Knowledge of causal inference is not assumed, but a beginner to intermediate knowledge of data science would be beneficial. Some familiarity with Bayesian methods would be beneficial, but are not required.

Talk structure:

  • I will provide an overview of ‘what-if?’ questions including: “What would have happened to this patient if they had taken the drug rather than the placebo?” or “How much did an advertising campaign drive the change in user sign-ups?”
  • Establish why we cannot solve our problems with traditional statistical and data science methods in the absence of causal reasoning.
  • Describe how causal reasoning questions are complemented by the Bayesian approach, namely quantifying our uncertainty, and a focus on parameter estimation instead of hypothesis testing with p-values.
  • One main example will focus on how to approach the question “How did Brexit causally affect the United Kingdom’s GDP despite this not being a randomized experiment?” I will intuitively explain how the Synthetic Control method works (by creating a synthetic United Kingdom as a weighted sum of other countries unaffected by Brexit) and how we can implement this, with PyMC code snippets.
  • I will summarize by: a) outlining the bounds of Synthetic Control and when other approaches are called for, b) highlight available Python and R packages (CausalImpact, tfcausalimpact, GeoLift, and a PyMC-based solution), and c) providing further reading and learning resources.

References

  • Cunningham, Scott. "Causal inference." Causal Inference. Yale University Press, 2021
  • Huntington-Klein, N. (2021). The effect: An introduction to research design and causality. Chapman and Hall/CRC.
  • Facure, M (2021) Causal Inference for The Brave and True, https://github.com/matheusfacure/python-causality-handbook

GitHub repository

A supporting GitHub repository, with notebooks, can be found at drbenvincent/pydata-global-2022.


Prior Knowledge Expected

No previous knowledge expected

I work in the Bayesian data analysis space. Much of my time is spent solving real business problems consulting as a Principal Data Scientist for PyMC Labs. Before, I held a permanent faculty position for 15 years using Bayesian data analysis methods to research human decision making.