Modelling multiple outcomes using tree-based methods in causal inference

Lead Research Organisation: University College London
Department Name: Statistical Science

Abstract

In medical statistics, causal inference is the process of studying the direct effects of a treatment on a health outcome of interest. In practice, only one treatment is administered, so we need to draw counterfactual conclusions about what we would have observed if another one had been given. Furthermore, treatment is often not allocated randomly but based on subjects' medical history records, and this creates difficulties in separating disease severity from the treatment outcome. These are some of the fundamental challenges of causal inference.

Recently the statistical learning methods for causal inference of individual treatment effects have seen significant advances, partly due to the availability of large datasets such as electronic health records. These causal inference models typically study the effect of a treatment on a single outcome. However, treatment allocation is often decided based on more than one outcome, commonly the main outcome of interest versus adverse side-effects of treatment. For example, the risk of heart disease versus the risk of bleeding (side-effect from the treatment). That is why, many studies collect data on more than one outcome, seeking to explore the effects on multiple outcomes simultaneously.

The project would build on the current work on tree-based methods in causal inference to explore their application in modelling multiple outcomes simultaneously. The aim is to extend Bayesian Causal Forests to build a composite joint model for the outcomes to infer individual-level estimates of the effect of treatment. One of the motivating examples is the use of blood-thinners (anti-platelet therapy) to mitigate heart disease risk on patients who might also be at high risk of bleeding (side-effect of the treatment).

It is expected that the work will lead to a mix of methodological research as well as applied medical results. It would compare the performance of the methods on several healthcare examples with the most widely used language for statistical computing - R.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V520263/1 01/10/2020 31/10/2025
2576152 Studentship EP/V520263/1 01/10/2021 26/09/2025 Ilina Yozova