Innovative Methods to Improve Causal Inference Using Diverse Data Resources
Lead Research Organisation:
University of Oxford
Abstract
Research in the medical sciences has grown substantially, covering a wide range of topics and using a range of study designs. Alongside the proliferation of clinical trials, numerous biobanks have recently been established, offering access to detailed observational datasets on thousands of people. However, despite the availability of these extensive datasets, challenges remain in how to best use these new resources.
Often studies on similar topics will be done in different contexts and using differing study designs. Without sophisticated methods to integrate these sources of evidence, the increase in new studies can lead to more uncertainty rather than better inferences. Study designs have different strengths, weaknesses and sources of bias, and without good analysis methods this can present a confusing picture. Improved statistical methods would allow us to make full use of new data resources.
Our proposal has three main aims: to improve methods for combining observational and randomised data, to better understand the theoretical limits of this process and to use principles from Bayesian Experimental Design to design more informative experiments.
We will describe these aims in more detail below, and highlight areas where we hope to develop new methods. This project falls within the EPSRC strategic priority of transforming health and healthcare.
There has recently been growing interest in developing methods to combine evidence from randomised control trials with observational data. Such methods aim to leverage the very precise but often biased estimates from observational data, and combine them with potentially noisy but internally consistent estimates from randomised trials. In the course of our mini-project preceding this PhD, we proposed a new method, combining aspects of two previously proposed solutions to this problem. This method aims to first adjust for the impact of hidden confounders in our randomised dataset, and then incorporate the adjusted data depending on how well this adjustment has worked. We achieved some encouraging results in early simulations but aim to further improve this method and test it in a wider range of contexts, including using semi-synthetic and real-world data. There is also potential to combine other methods in new ways to improve inference.
As well as developing further methods to improve inference, our second aim will be to improve on the current theoretical basis of this field, where methods often rely on substantial assumptions. A starting point in this direction would be investigating whether it is possible to guarantee an improvement in CATE estimation by adding an arbitrarily biased observational dataset to a randomised dataset without further assumptions. Alternatively, can it be proven that this is impossible? Either of these findings would be a novel contribution to the field.
Finally, as well as improving inference using existing data, we would like to explore whether it is possible to collect data more efficiently to maximise our expected information gain using Bayesian Experimental Design. There is a range of existing research in this area, although
application of these methods has lagged somewhat behind the statistical literature. Initially, we would like to explore whether it is possible to design more efficient randomised control trials which can achieve similar precision in CATE estimates with fewer participants. Eventually we hope that we might be able to use these methods to improve experimental design for causal discovery.
The rapid growth in medical research presents both opportunities and challenges. To fully utilize diverse data sources, advanced statistical methods are needed. Our aims to improve data combination techniques, understand theoretical limits, and use Bayesian Experimental Design will help draw more robust conclusions and advance the field, supporting the ESPRC's strategic healthcare priorities.
Often studies on similar topics will be done in different contexts and using differing study designs. Without sophisticated methods to integrate these sources of evidence, the increase in new studies can lead to more uncertainty rather than better inferences. Study designs have different strengths, weaknesses and sources of bias, and without good analysis methods this can present a confusing picture. Improved statistical methods would allow us to make full use of new data resources.
Our proposal has three main aims: to improve methods for combining observational and randomised data, to better understand the theoretical limits of this process and to use principles from Bayesian Experimental Design to design more informative experiments.
We will describe these aims in more detail below, and highlight areas where we hope to develop new methods. This project falls within the EPSRC strategic priority of transforming health and healthcare.
There has recently been growing interest in developing methods to combine evidence from randomised control trials with observational data. Such methods aim to leverage the very precise but often biased estimates from observational data, and combine them with potentially noisy but internally consistent estimates from randomised trials. In the course of our mini-project preceding this PhD, we proposed a new method, combining aspects of two previously proposed solutions to this problem. This method aims to first adjust for the impact of hidden confounders in our randomised dataset, and then incorporate the adjusted data depending on how well this adjustment has worked. We achieved some encouraging results in early simulations but aim to further improve this method and test it in a wider range of contexts, including using semi-synthetic and real-world data. There is also potential to combine other methods in new ways to improve inference.
As well as developing further methods to improve inference, our second aim will be to improve on the current theoretical basis of this field, where methods often rely on substantial assumptions. A starting point in this direction would be investigating whether it is possible to guarantee an improvement in CATE estimation by adding an arbitrarily biased observational dataset to a randomised dataset without further assumptions. Alternatively, can it be proven that this is impossible? Either of these findings would be a novel contribution to the field.
Finally, as well as improving inference using existing data, we would like to explore whether it is possible to collect data more efficiently to maximise our expected information gain using Bayesian Experimental Design. There is a range of existing research in this area, although
application of these methods has lagged somewhat behind the statistical literature. Initially, we would like to explore whether it is possible to design more efficient randomised control trials which can achieve similar precision in CATE estimates with fewer participants. Eventually we hope that we might be able to use these methods to improve experimental design for causal discovery.
The rapid growth in medical research presents both opportunities and challenges. To fully utilize diverse data sources, advanced statistical methods are needed. Our aims to improve data combination techniques, understand theoretical limits, and use Bayesian Experimental Design will help draw more robust conclusions and advance the field, supporting the ESPRC's strategic healthcare priorities.
Organisations
People |
ORCID iD |
| Alexander Gruen (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S02428X/1 | 31/03/2019 | 29/09/2027 | |||
| 2876277 | Studentship | EP/S02428X/1 | 30/09/2023 | 29/09/2027 | Alexander Gruen |