Statistical Methods for Improving Causal Analyses
Lead Research Organisation:
University of Bristol
Department Name: UNLISTED
Abstract
We are trying to develop statistical methods to help medical researchers in their search for causes of disease. A lot of medical research involves gathering data from people and using statistical models to tell us which factors cause later health. For example, we might ask a group of people about their diet as children, as teenagers and as adults, and use this to try to tell us whether poor diet causes cancer in later life.
There are several problems with these studies that make it hard to draw conclusions. One is that people agreeing to be in a study are often different from people who don’t agree. Another is that people tend to drop out of a study over time – and again the people who drop out are often different from the people who stay. A third problem is that people change as they go through life – and we might want to know whether the cause of a disease happens at birth, or during childhood, or whether there are chances to prevent the disease even in adults.
Statistical models may not give the right answers if any of these problems occur – and this could mean that the wrong health advice is given, or the wrong treatments developed. We aim to develop methods that can overcome these real-life problems, and help medical researchers to be more confident in their conclusions about causes of disease.
There are several problems with these studies that make it hard to draw conclusions. One is that people agreeing to be in a study are often different from people who don’t agree. Another is that people tend to drop out of a study over time – and again the people who drop out are often different from the people who stay. A third problem is that people change as they go through life – and we might want to know whether the cause of a disease happens at birth, or during childhood, or whether there are chances to prevent the disease even in adults.
Statistical models may not give the right answers if any of these problems occur – and this could mean that the wrong health advice is given, or the wrong treatments developed. We aim to develop methods that can overcome these real-life problems, and help medical researchers to be more confident in their conclusions about causes of disease.
Technical Summary
Aim: The aim of this programme is to develop methods for causal inference that are robust to missing data and can investigate change over time, in order to draw unbiased conclusions about realistic problems, using complex observational data.
Importance: Causal inference methods - in particular instrumental variable (IV) and Mendelian randomization (MR) methods - are now straightforward to implement, can be used with summary data, and are widely used by epidemiologists and medical researchers. However, the majority of real-world clinical research settings are more complex than the standard methods allow: data are missing; samples are selected; exposures and outcomes evolve jointly over time; and data from a wide variety of sources need to be integrated. Methods for causal inference, including IV, may be biased by missing data, including individuals missing due to sample selection. Standard IV methods are not able to address complex (and possibly time-varying) relationships between exposure, covariates and outcome. Multiple studies may provide information about the same causal effect, or about different paths in a network of causal effects, and we need to develop better methods to integrate evidence from different study types in order to draw causal inferences.
Objectives:
1. Develop methods to minimise bias due to missing data
2. Develop methods to model complex exposures and outcomes
3. Develop IV methods to examine causal influences of multiple exposures
4. Integrate evidence to improve causal models
Research plans: Part 1 of this programme will develop methods to use study information, and information external to the study, to infer the missing data structure, to inform all types of causal analyses. We will then focus on methods to maximise the robustness of IV methods to different types of missing data. We will pay particular attention to two cases: two-sample IV (using individual or summary data), and the investigation of disease prognosis. Part 2 will extend current methods for modelling trajectories and variability of exposures and outcomes. We will then focus on overcoming some of the current limitations of IV methods, by using structural equation modelling (SEM) and multivariable IV to examine impacts of time-varying exposures. Finally, we will maximise the use of all research data by extending methods to combine and use external information to inform causal models and sensitivity analyses.
Importance: Causal inference methods - in particular instrumental variable (IV) and Mendelian randomization (MR) methods - are now straightforward to implement, can be used with summary data, and are widely used by epidemiologists and medical researchers. However, the majority of real-world clinical research settings are more complex than the standard methods allow: data are missing; samples are selected; exposures and outcomes evolve jointly over time; and data from a wide variety of sources need to be integrated. Methods for causal inference, including IV, may be biased by missing data, including individuals missing due to sample selection. Standard IV methods are not able to address complex (and possibly time-varying) relationships between exposure, covariates and outcome. Multiple studies may provide information about the same causal effect, or about different paths in a network of causal effects, and we need to develop better methods to integrate evidence from different study types in order to draw causal inferences.
Objectives:
1. Develop methods to minimise bias due to missing data
2. Develop methods to model complex exposures and outcomes
3. Develop IV methods to examine causal influences of multiple exposures
4. Integrate evidence to improve causal models
Research plans: Part 1 of this programme will develop methods to use study information, and information external to the study, to infer the missing data structure, to inform all types of causal analyses. We will then focus on methods to maximise the robustness of IV methods to different types of missing data. We will pay particular attention to two cases: two-sample IV (using individual or summary data), and the investigation of disease prognosis. Part 2 will extend current methods for modelling trajectories and variability of exposures and outcomes. We will then focus on overcoming some of the current limitations of IV methods, by using structural equation modelling (SEM) and multivariable IV to examine impacts of time-varying exposures. Finally, we will maximise the use of all research data by extending methods to combine and use external information to inform causal models and sensitivity analyses.
People |
ORCID iD |
Kate Tilling (Principal Investigator) |
Publications
Richardson TG
(2020)
Use of genetic variation to separate the effects of early and later life adiposity on disease risk: mendelian randomisation study.
in BMJ (Clinical research ed.)
Slaney C
(2023)
Association between inflammation and cognition: Triangulation of evidence using a population-based cohort and Mendelian randomization analyses.
in Brain, behavior, and immunity
Holmes MV
(2019)
Can Mendelian Randomization Shift into Reverse Gear?
in Clinical chemistry
O'Keeffe LM
(2019)
Data on trajectories of measures of cardiovascular health in the Avon Longitudinal Study of Parents and Children (ALSPAC).
in Data in brief
Hazewinkel A
(2022)
Mendelian randomization analysis of the causal impact of body mass index and waist-hip ratio on rates of hospital admission
in Economics & Human Biology
Watkins S
(2023)
Epigenetic clocks and research implications of the lack of data on whom they have been developed: a review of reported and missing sociodemographic characteristics
in Environmental Epigenetics
North T
(2019)
Using Genetic Instruments to Estimate Interactions in Mendelian Randomization Studies
in Epidemiology
Hughes RA
(2019)
Selection Bias When Estimating Average Treatment Effects Using One-sample Instrumental Variable Analysis.
in Epidemiology (Cambridge, Mass.)
Groenwold RHH
(2021)
To Adjust or Not to Adjust? When a "Confounder" Is Only Measured After Exposure.
in Epidemiology (Cambridge, Mass.)
Carlsen EØ
(2020)
Stumped by the Hump: The Curious Rise and Fall of Norwegian Birthweights, 1991-2007.
in Epidemiology (Cambridge, Mass.)
Mills HL
(2021)
Detecting Heterogeneity of Intervention Effects Using Analysis and Meta-analysis of Differences in Variance Between Trial Arms.
in Epidemiology (Cambridge, Mass.)
Lawton M
(2024)
Two sample Mendelian Randomisation using an outcome from a multilevel model of disease progression
in European Journal of Epidemiology
Yang Q
(2022)
Exploring and mitigating potential bias when genetic instrumental variables are associated with multiple non-exposure traits in Mendelian randomization.
in European journal of epidemiology
Staley JR
(2022)
A robust mean and variance test with application to high-dimensional phenotypes.
in European journal of epidemiology
Bowyer RCE
(2023)
Characterising patterns of COVID-19 and long COVID symptoms: evidence from nine UK longitudinal studies.
in European journal of epidemiology
Carter AR
(2021)
Mendelian randomisation for mediation analysis: current methods and challenges for implementation.
in European journal of epidemiology
Chong AHW
(2021)
Genetic Analyses of Common Infections in the Avon Longitudinal Study of Parents and Children Cohort.
in Frontiers in immunology
Costantini I
(2021)
Locus of Control and Negative Cognitive Styles in Adolescence as Risk Factors for Depression Onset in Young Adulthood: Findings From a Prospective Birth Cohort Study.
in Frontiers in psychology
Albers P
(2023)
Natural experiments for the evaluation of place-based public health interventions: a methodology scoping review
in Frontiers in Public Health
Cai S
(2022)
Adjusting for collider bias in genetic association studies using instrumental variable methods.
in Genetic epidemiology
Hemani G
(2022)
Collider bias from selecting disease samples distorts causal inferences.
in Genetic epidemiology
Tudball MJ
(2021)
Mendelian randomisation with coarsened exposures.
in Genetic epidemiology
O'Keeffe LM
(2020)
Age at period cessation and trajectories of cardiovascular risk factors across mid and later life.
in Heart (British Cardiac Society)
Related Projects
Project Reference | Relationship | Related To | Start | End | Award Value |
---|---|---|---|---|---|
MC_UU_00011/1 | 01/04/2018 | 31/03/2023 | £2,864,000 | ||
MC_UU_00011/2 | Transfer | MC_UU_00011/1 | 01/04/2018 | 31/03/2023 | £965,000 |
MC_UU_00011/3 | Transfer | MC_UU_00011/2 | 01/04/2018 | 31/03/2023 | £1,011,000 |
MC_UU_00011/4 | Transfer | MC_UU_00011/3 | 01/04/2018 | 31/03/2023 | £1,329,000 |
MC_UU_00011/5 | Transfer | MC_UU_00011/4 | 01/04/2018 | 31/03/2023 | £1,254,000 |
MC_UU_00011/6 | Transfer | MC_UU_00011/5 | 01/04/2018 | 31/03/2023 | £1,640,000 |
MC_UU_00011/7 | Transfer | MC_UU_00011/6 | 01/04/2018 | 31/03/2023 | £1,083,000 |
Description | Enhanced Statistical Rigour in Health Data Research |
Amount | £925,204 (GBP) |
Funding ID | 215408 |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2019 |
End | 09/2024 |
Description | Investigating non-response among young people in Understanding Society |
Amount | £43,000 (GBP) |
Organisation | Understanding Society |
Sector | Private |
Country | United Kingdom |
Start | 06/2023 |
End | 05/2024 |
Description | MR/V020641/1 Development of miDOC: an expert system and methodology for multiple imputation |
Amount | £317,762 (GBP) |
Funding ID | MR/V020641/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 07/2021 |
End | 06/2023 |
Description | Selection Bias and Mental Health: Towards an Integrated Understanding of Risk Factors for Suicide and Poor Self-rated Mental Health |
Amount | £183,553 (GBP) |
Funding ID | MQF22\22 |
Organisation | MQ Mental Health Research |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2023 |
End | 08/2026 |
Description | Understanding social transitions in emerging adulthood and pathways to later health outcomes |
Amount | £300,000 (GBP) |
Funding ID | 224114/Z/21/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2022 |
End | 08/2026 |
Description | Exeter selection bias |
Organisation | University of Exeter |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration on work examining participation bias |
Collaborator Contribution | Collaboration on work examining participation bias |
Impact | Genetic predictors of participation in optional components of UK Biobank Jessica Tyrrell, Jie Zheng, Robin Beaumont, Kathryn Hinton, Tom G Richardson, Andrew R Wood, George Davey Smith, Timothy M Frayling, Kate Tilling bioRxiv 2020.02.10.941328; doi: https://doi.org/10.1101/2020.02.10.941328 |
Start Year | 2018 |
Description | Leicester IEB |
Organisation | University of Leicester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration on methods to overcome index event bias |
Collaborator Contribution | Collaboration on methods to overcome index event bias |
Impact | Slope-Hunter: A robust method for index-event bias correction in genome-wide association studies of subsequent traits Osama Mahmoud, Frank Dudbridge, George Davey Smith, Marcus Munafo, Kate Tilling bioRxiv 2020.01.31.928077; doi: https://doi.org/10.1101/2020.01.31.928077 |
Start Year | 2019 |
Description | Swiss MI |
Organisation | ETH Zurich |
Country | Switzerland |
Sector | Academic/University |
PI Contribution | Collaborations on methods for missing data |
Collaborator Contribution | Collaborations on methods for missing data |
Impact | MSc thesis (submitted). |
Start Year | 2019 |
Description | EUROCIM 2020 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I co-organised the EUROCIM 2020 (European Causal Inference Meeting) , when the in-person event was cancelled at short notice due to COVID-19. We hosted a 2-day virtual conference from the MRC IEU, with a workshop and speakers. We had >200 attendees, and positive feedback afterwards. |
Year(s) Of Engagement Activity | 2020 |
Description | JISCB2018 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Joint International Society for Clinical Biostatistics and Australian Statistical Conference (ISCB ASC), Melbourne, August 2018. Presenting "Selection bias in Instrumental Variable (IV) analyses" by Hughes RA, Davies NM, Davey Smith G, Tilling K. |
Year(s) Of Engagement Activity | 2018 |
Description | Mendelian randomization for African scientists |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Six researchers from the MRC IEU organised a five-day course on Mendelian randomization to African researchers in Kilifi. The aim was to teach participants how to implement Mendelian randomization (MR) and how to use the IEU-developed and open-source MR-Base software platform. The UK researchers and the African scientists also spent time talking about their own research interests, stimulating potential future collaborations. Participant feedback was extremely positive with participants leaving with the skills and knowledge to apply MR in their own research. Some individuals are now planning research visits to the UK, with one having since secured a visiting fellowship and another has made a funding application. |
Year(s) Of Engagement Activity | 2022 |
URL | https://ieureka.blogs.bristol.ac.uk/2023/01/27/genetic-epidemiology-african-scientists/ |
Description | RSS2018 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Royal Statistical Society 2018 International Conference, Cardiff, September 2018. Presenting "Selection bias in Instrumental Variable (IV) analyses" by Hughes RA, Davies NM, Davey Smith G, Tilling K. |
Year(s) Of Engagement Activity | 2018 |
Description | Variability Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | 25 academics attended a workshop on Outcome Variability at MRC IEU, Bristol, with presentations from local and national researchers, and discussion about ways to take this research area forward. |
Year(s) Of Engagement Activity | 2019 |