-
Natural sciences
- Statistics not elsewhere classified
-
Medical and health sciences
- Biostatistics
- Epidemiology
Whilst data-adaptive methods (e.g. statistical learning) have been successful in prediction problems, statisticians have been cautious in adopting them for inferring causal effects. This is because these techniques are prone to ‘plug-in’ bias and drastically complicate inference. However, recent work has shown how data-adaptive methods can be used in a first step for evaluating the effect of an intervention, whilst still yielding valid confidence intervals. These results nevertheless hold under strong conditions on the first-step methods, which are not designed for causal effect estimation and are thus sub-optimal. Proposals also focus on simple causal inquiries (e.g. the effect of binary interventions) or rely on modelling assumptions to ‘summarise’ the causal effect, at the risk of bias due to misspecification. We will first investigate how to target the data-adaptive methods with the purpose of reducing the bias of the causal effect estimator. Our fitting strategies will also yield valid confidence intervals under much weaker assumptions than current proposals. We will then develop new model-free causal estimands that are useful in a variety of contexts and facilitate nonparametric inference using data-adaptive techniques. Finally, we will use these insights to make optimal ‘counterfactual’ predictions at the individual level under specific, planned interventions.