📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Developing Counterfactual Inference Methods for Clinical Trial Recruitment and Effective Integration of Weak Instrumental Variables.

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

This project falls within the following EPSRC research areas:

Healthcare Technologies
Statistics and Applied Probability
Digital Twins

A current research question in the Computational Statistics and ML spaces is whether algorithms can automatically infer the behaviours and patterns of systems which demonstrate inherent causal relationships in their underlying mechanisms. Humans often excel at identifying such relationships in nature (e.g. we know the rising sun causes temperatures to rise rather than the other way round). Machines however, generally struggle to learn these relationships directly, and one potential research avenue is to develop methods which can automatically identify potential causal relationships between data variables. A number of methods (known as Causal Structure Learning algorithms) have been developed to learn these relationships automatically, and perform well with with small datasets/features and Gaussian data. However, they struggle to scale well with larger, complex datasets which are not linearly Gaussian. An avenue for research during my DPhil is to address this problem through developing scalable Structure Learning methods which can process large datasets without sacrificing the fidelity of the inferred relationships.
Even if the causal pathways describing a set of variables is known, accurately inferring quantitative relationships in a model can be challenging. These problems are exacerbated by the presence of confounders, particularly if unobserved (or even if unknown). In the context of medical clinicians, these could be some variable like sex or age which not only impacts the treatment assigned by a doctor, but also the effect of the treatment itself. The gold standard for accounting for these variables is by running Randomised Controlled Trials (RCTs) to manually intervene and control for these discrepancies across an artificially selected subpopulation. However, there are often cases where RCTs are unethical (e.g. with "treatments" that are known to be harmful) or too expensive. Additionally, there may be a wealth of freely available observational data (i.e. not obtained from an RCT) which could be used in lieu of running a randomised trial.
Despite the size of observational datasets, there may be issues with their completeness and integrity. For example, there may be variables (i.e. potential confounders) which were simply not recorded. These
variables might be known to us (where experts may be able to encode relationships between present and absent features in our data), but may also be unknown. This problem may motivate another research direction which looks at alternative approaches to capture the behaviour of these confounders and reduce the biases inferred by the model.
Another key roadblock a ecting the adoption of causal methods is their ability to scale to large data/feature sets. The run time of current methods often scale extremely poorly to larger, more complex, datasets. Another avenue for potential research would be to explore di erent types of approximations which would scale more e ciently without sacri cing much predictive power.
Whilst the above discuss the methodological angles for pushing the needle forward for causal methods, I have a strong interest in applying these methods to biological/healthcare problems. In particular, how can Deep Generative models of clinical patients be combined with structured causal models to create causal Digital Twins of patients rather than biased samples drawn from confounded generative models.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517811/1 30/09/2020 29/09/2025
2747848 Studentship EP/T517811/1 30/09/2022 09/10/2026 Daniel Manela
EP/W524311/1 30/09/2022 29/09/2028
2747848 Studentship EP/W524311/1 30/09/2022 09/10/2026 Daniel Manela