Optimal causal learning using electronic health records

01 January 2020 → 31 August 2021
Regional and community funding: Special Research Fund
Research disciplines
  • Natural sciences
    • Statistics
Causal inference semiparametric estimation high-dimensional inference statistical learning
Project description

Routinely collected health data is increasingly used with the aim to
infer the causal effect of a treatment of interest on an outcome. The
eventual goal is to use this knowledge to steer interventions.
However, because treated and untreated patients often have very
different characteristics, any statistical analysis must account for the
confounders that distort the treatment-outcome association. It is
tempting to do this using existing data adaptive methods (e.g.
machine learning), but these are prone to regularization bias,
drastically complicate inference, and are moreover not designed for
effect estimation and therefore sub-optimal. Furthermore, whilst
inferring population-level causal effects is useful, further patientspecific
information is generally needed to inform decision making.
We will first investigate how to target machine learning methods with
the purpose of optimizing estimation and inference for the treatment
effect. Specifically, we will develop fitting strategies that yield efficient
treatment effect estimators along with honest confidence intervals
under weaker assumptions than current proposals. Because all
confounders are unlikely to be measured, we will then propose novel
treatment effect estimators for case-only designs that automatically
control for time-fixed confounders. Moving back to real-world decision
making, we will develop a theoretical framework for making causal
predictions for new patients under specific, planned interventions.