Statistical Methods for Improving Causal Analyses
Lead Research Organisation:
University of Bristol
Department Name: UNLISTED
Abstract
We are trying to develop statistical methods to help medical researchers in their search for causes of disease. A lot of medical research involves gathering data from people and using statistical models to tell us which factors cause later health. For example, we might ask a group of people about their diet as children, as teenagers and as adults, and use this to try to tell us whether poor diet causes cancer in later life.
There are several problems with these studies that make it hard to draw conclusions. One is that people agreeing to be in a study are often different from people who don’t agree. Another is that people tend to drop out of a study over time – and again the people who drop out are often different from the people who stay. A third problem is that people change as they go through life – and we might want to know whether the cause of a disease happens at birth, or during childhood, or whether there are chances to prevent the disease even in adults.
Statistical models may not give the right answers if any of these problems occur – and this could mean that the wrong health advice is given, or the wrong treatments developed. We aim to develop methods that can overcome these real-life problems, and help medical researchers to be more confident in their conclusions about causes of disease.
There are several problems with these studies that make it hard to draw conclusions. One is that people agreeing to be in a study are often different from people who don’t agree. Another is that people tend to drop out of a study over time – and again the people who drop out are often different from the people who stay. A third problem is that people change as they go through life – and we might want to know whether the cause of a disease happens at birth, or during childhood, or whether there are chances to prevent the disease even in adults.
Statistical models may not give the right answers if any of these problems occur – and this could mean that the wrong health advice is given, or the wrong treatments developed. We aim to develop methods that can overcome these real-life problems, and help medical researchers to be more confident in their conclusions about causes of disease.
Technical Summary
Aim: The aim of this programme is to develop methods for causal inference that are robust to missing data and can investigate change over time, in order to draw unbiased conclusions about realistic problems, using complex observational data.
Importance: Causal inference methods - in particular instrumental variable (IV) and Mendelian randomization (MR) methods - are now straightforward to implement, can be used with summary data, and are widely used by epidemiologists and medical researchers. However, the majority of real-world clinical research settings are more complex than the standard methods allow: data are missing; samples are selected; exposures and outcomes evolve jointly over time; and data from a wide variety of sources need to be integrated. Methods for causal inference, including IV, may be biased by missing data, including individuals missing due to sample selection. Standard IV methods are not able to address complex (and possibly time-varying) relationships between exposure, covariates and outcome. Multiple studies may provide information about the same causal effect, or about different paths in a network of causal effects, and we need to develop better methods to integrate evidence from different study types in order to draw causal inferences.
Objectives:
1. Develop methods to minimise bias due to missing data
2. Develop methods to model complex exposures and outcomes
3. Develop IV methods to examine causal influences of multiple exposures
4. Integrate evidence to improve causal models
Research plans: Part 1 of this programme will develop methods to use study information, and information external to the study, to infer the missing data structure, to inform all types of causal analyses. We will then focus on methods to maximise the robustness of IV methods to different types of missing data. We will pay particular attention to two cases: two-sample IV (using individual or summary data), and the investigation of disease prognosis. Part 2 will extend current methods for modelling trajectories and variability of exposures and outcomes. We will then focus on overcoming some of the current limitations of IV methods, by using structural equation modelling (SEM) and multivariable IV to examine impacts of time-varying exposures. Finally, we will maximise the use of all research data by extending methods to combine and use external information to inform causal models and sensitivity analyses.
Importance: Causal inference methods - in particular instrumental variable (IV) and Mendelian randomization (MR) methods - are now straightforward to implement, can be used with summary data, and are widely used by epidemiologists and medical researchers. However, the majority of real-world clinical research settings are more complex than the standard methods allow: data are missing; samples are selected; exposures and outcomes evolve jointly over time; and data from a wide variety of sources need to be integrated. Methods for causal inference, including IV, may be biased by missing data, including individuals missing due to sample selection. Standard IV methods are not able to address complex (and possibly time-varying) relationships between exposure, covariates and outcome. Multiple studies may provide information about the same causal effect, or about different paths in a network of causal effects, and we need to develop better methods to integrate evidence from different study types in order to draw causal inferences.
Objectives:
1. Develop methods to minimise bias due to missing data
2. Develop methods to model complex exposures and outcomes
3. Develop IV methods to examine causal influences of multiple exposures
4. Integrate evidence to improve causal models
Research plans: Part 1 of this programme will develop methods to use study information, and information external to the study, to infer the missing data structure, to inform all types of causal analyses. We will then focus on methods to maximise the robustness of IV methods to different types of missing data. We will pay particular attention to two cases: two-sample IV (using individual or summary data), and the investigation of disease prognosis. Part 2 will extend current methods for modelling trajectories and variability of exposures and outcomes. We will then focus on overcoming some of the current limitations of IV methods, by using structural equation modelling (SEM) and multivariable IV to examine impacts of time-varying exposures. Finally, we will maximise the use of all research data by extending methods to combine and use external information to inform causal models and sensitivity analyses.
People |
ORCID iD |
Kate Tilling (Principal Investigator) |
Publications
Yarmolinsky J
(2022)
Genetically proxied therapeutic inhibition of antihypertensive drug targets and risk of common cancers: A mendelian randomization analysis.
in PLoS medicine
Yang Q
(2022)
Exploring and mitigating potential bias when genetic instrumental variables are associated with multiple non-exposure traits in Mendelian randomization.
in European journal of epidemiology
Wootton RE
(2022)
Decline in attention-deficit hyperactivity disorder traits over the life course in the general population: trajectories across five population birth cohorts spanning ages 3 to 45 years.
in International journal of epidemiology
Winpenny EM
(2021)
Early adulthood socioeconomic trajectories contribute to inequalities in adult cardiovascular health, independently of childhood and adulthood socioeconomic position.
in Journal of epidemiology and community health
Watkins SH
(2023)
Epigenetic clocks and research implications of the lack of data on whom they have been developed: a review of reported and missing sociodemographic characteristics.
in Environmental epigenetics
Wang G
(2022)
Investigating a Potential Causal Relationship Between Maternal Blood Pressure During Pregnancy and Future Offspring Cardiometabolic Health.
in Hypertension (Dallas, Tex. : 1979)
Vogelezang S
(2020)
Novel loci for childhood body mass index and shared heritability with adult cardiometabolic traits.
in PLoS genetics
Verhoef E
(2021)
Discordant associations of educational attainment with ASD and ADHD implicate a polygenic form of pleiotropy
in Nature Communications
Related Projects
Project Reference | Relationship | Related To | Start | End | Award Value |
---|---|---|---|---|---|
MC_UU_00011/1 | 31/03/2018 | 30/03/2023 | £2,864,000 | ||
MC_UU_00011/2 | Transfer | MC_UU_00011/1 | 31/03/2018 | 30/03/2023 | £965,000 |
MC_UU_00011/3 | Transfer | MC_UU_00011/2 | 31/03/2018 | 30/03/2023 | £1,011,000 |
MC_UU_00011/4 | Transfer | MC_UU_00011/3 | 31/03/2018 | 30/03/2023 | £1,329,000 |
MC_UU_00011/5 | Transfer | MC_UU_00011/4 | 31/03/2018 | 30/03/2023 | £1,254,000 |
MC_UU_00011/6 | Transfer | MC_UU_00011/5 | 31/03/2018 | 30/03/2023 | £1,640,000 |
MC_UU_00011/7 | Transfer | MC_UU_00011/6 | 31/03/2018 | 30/03/2023 | £1,083,000 |
Description | Enhanced Statistical Rigour in Health Data Research |
Amount | £925,204 (GBP) |
Funding ID | 215408 |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2019 |
End | 09/2024 |
Description | Investigating non-response among young people in Understanding Society |
Amount | £43,000 (GBP) |
Organisation | Understanding Society |
Sector | Private |
Country | United Kingdom |
Start | 05/2023 |
End | 05/2024 |
Description | MR/V020641/1 Development of miDOC: an expert system and methodology for multiple imputation |
Amount | £317,762 (GBP) |
Funding ID | MR/V020641/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2021 |
End | 06/2023 |
Description | Selection Bias and Mental Health: Towards an Integrated Understanding of Risk Factors for Suicide and Poor Self-rated Mental Health |
Amount | £183,553 (GBP) |
Funding ID | MQF22\22 |
Organisation | MQ Mental Health Research |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2023 |
End | 08/2026 |
Description | Understanding social transitions in emerging adulthood and pathways to later health outcomes |
Amount | £300,000 (GBP) |
Funding ID | 224114/Z/21/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2022 |
End | 08/2026 |
Description | Exeter selection bias |
Organisation | University of Exeter |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration on work examining participation bias |
Collaborator Contribution | Collaboration on work examining participation bias |
Impact | Genetic predictors of participation in optional components of UK Biobank Jessica Tyrrell, Jie Zheng, Robin Beaumont, Kathryn Hinton, Tom G Richardson, Andrew R Wood, George Davey Smith, Timothy M Frayling, Kate Tilling bioRxiv 2020.02.10.941328; doi: https://doi.org/10.1101/2020.02.10.941328 |
Start Year | 2018 |
Description | Leicester IEB |
Organisation | University of Leicester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration on methods to overcome index event bias |
Collaborator Contribution | Collaboration on methods to overcome index event bias |
Impact | Slope-Hunter: A robust method for index-event bias correction in genome-wide association studies of subsequent traits Osama Mahmoud, Frank Dudbridge, George Davey Smith, Marcus Munafo, Kate Tilling bioRxiv 2020.01.31.928077; doi: https://doi.org/10.1101/2020.01.31.928077 |
Start Year | 2019 |
Description | Swiss MI |
Organisation | ETH Zurich |
Country | Switzerland |
Sector | Academic/University |
PI Contribution | Collaborations on methods for missing data |
Collaborator Contribution | Collaborations on methods for missing data |
Impact | MSc thesis (submitted). |
Start Year | 2019 |
Description | EUROCIM 2020 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I co-organised the EUROCIM 2020 (European Causal Inference Meeting) , when the in-person event was cancelled at short notice due to COVID-19. We hosted a 2-day virtual conference from the MRC IEU, with a workshop and speakers. We had >200 attendees, and positive feedback afterwards. |
Year(s) Of Engagement Activity | 2020 |
Description | JISCB2018 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Joint International Society for Clinical Biostatistics and Australian Statistical Conference (ISCB ASC), Melbourne, August 2018. Presenting "Selection bias in Instrumental Variable (IV) analyses" by Hughes RA, Davies NM, Davey Smith G, Tilling K. |
Year(s) Of Engagement Activity | 2018 |
Description | Mendelian randomization for African scientists |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Six researchers from the MRC IEU organised a five-day course on Mendelian randomization to African researchers in Kilifi. The aim was to teach participants how to implement Mendelian randomization (MR) and how to use the IEU-developed and open-source MR-Base software platform. The UK researchers and the African scientists also spent time talking about their own research interests, stimulating potential future collaborations. Participant feedback was extremely positive with participants leaving with the skills and knowledge to apply MR in their own research. Some individuals are now planning research visits to the UK, with one having since secured a visiting fellowship and another has made a funding application. |
Year(s) Of Engagement Activity | 2022 |
URL | https://ieureka.blogs.bristol.ac.uk/2023/01/27/genetic-epidemiology-african-scientists/ |
Description | RSS2018 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Royal Statistical Society 2018 International Conference, Cardiff, September 2018. Presenting "Selection bias in Instrumental Variable (IV) analyses" by Hughes RA, Davies NM, Davey Smith G, Tilling K. |
Year(s) Of Engagement Activity | 2018 |
Description | Variability Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | 25 academics attended a workshop on Outcome Variability at MRC IEU, Bristol, with presentations from local and national researchers, and discussion about ways to take this research area forward. |
Year(s) Of Engagement Activity | 2019 |