HOD: Handling missing data and time-varying confounding in causal inference for observational event history data
Lead Research Organisation:
University College London
Department Name: Institute of Child Health
Abstract
In medicine it is often important to obtain valid estimates of the effects (both beneficial and detrimental) of a new treatment. To do this, we typically compare outcomes in a group of patients who received the new treatment (treatment group) with those who did not receive the new treatment (control group). The randomised controlled trial (RCT) is the gold standard for obtaining these estimates of treatment effects because it fairly allocates patients to the two groups, which makes them likely to be comparable prior to the start of treatment, e.g. one group will not be older or younger, sicker or healthier and so on.
However, RCTs are very expensive and complicated to run, and are not necessarily appropriate for answering all questions about the effects of treatment. For example, a drug may cause cancer as a side-effect, but the cancers may only appear after several years of treatment. It is then unlikely that an RCT would be maintained for long enough to detect this effect. It would therefore be very useful to measure the effects of treatments by looking only at data about patients who received the treatments as part of their normal care (through "observational studies").
However unlike in RCTs, the investigators have no control over the assignment of patients to different treatment regimens in observational studies and therefore groups of patients given different drugs may differ in other ways as well. For example, patients with more severe disease may be more likely to be given drugs which are good at improving the disease but have unpleasant side-effects. If there is a difference in outcome found between the groups, it is not clear whether the difference is due to the fact that the groups are different beyond just the drugs received, or whether the difference was really caused by the treatment (i.e. it was a "causal effect"). One widely used method to make groups more comparable when estimating the causal effect is to calculate propensity scores. For each patient, his/her propensity score is the predicted probability of receiving a particular treatment based on that patient's characteristics at the time the treatment decision is made. Groups of patients with the same propensity score but different treatments should, on average, be comparable for all of their characteristics, and any differences in outcome between the groups should therefore be attributable to treatment.
The aim of this project is to extend standard methods for obtaining causal treatment effects so that they can be used when important information about patient characteristics is missing and when patient's treatment changes over time. Both of these situations are common in observational studies, thus it is important to have reliable and robust ways to deal with them. We propose a programme of methodological research to address the above situations in observational studies, with a particular focus on the effect of treatments on the time to clinical events (e.g. how long does a patient survive after a surgery, or how soon after the start of a new treatment do unpleasant side-effects start appearing). This project will provide a general framework and guidelines for practitioners who use observational data in medical research.
However, RCTs are very expensive and complicated to run, and are not necessarily appropriate for answering all questions about the effects of treatment. For example, a drug may cause cancer as a side-effect, but the cancers may only appear after several years of treatment. It is then unlikely that an RCT would be maintained for long enough to detect this effect. It would therefore be very useful to measure the effects of treatments by looking only at data about patients who received the treatments as part of their normal care (through "observational studies").
However unlike in RCTs, the investigators have no control over the assignment of patients to different treatment regimens in observational studies and therefore groups of patients given different drugs may differ in other ways as well. For example, patients with more severe disease may be more likely to be given drugs which are good at improving the disease but have unpleasant side-effects. If there is a difference in outcome found between the groups, it is not clear whether the difference is due to the fact that the groups are different beyond just the drugs received, or whether the difference was really caused by the treatment (i.e. it was a "causal effect"). One widely used method to make groups more comparable when estimating the causal effect is to calculate propensity scores. For each patient, his/her propensity score is the predicted probability of receiving a particular treatment based on that patient's characteristics at the time the treatment decision is made. Groups of patients with the same propensity score but different treatments should, on average, be comparable for all of their characteristics, and any differences in outcome between the groups should therefore be attributable to treatment.
The aim of this project is to extend standard methods for obtaining causal treatment effects so that they can be used when important information about patient characteristics is missing and when patient's treatment changes over time. Both of these situations are common in observational studies, thus it is important to have reliable and robust ways to deal with them. We propose a programme of methodological research to address the above situations in observational studies, with a particular focus on the effect of treatments on the time to clinical events (e.g. how long does a patient survive after a surgery, or how soon after the start of a new treatment do unpleasant side-effects start appearing). This project will provide a general framework and guidelines for practitioners who use observational data in medical research.
Technical Summary
Observational studies play an important role in the evaluation of treatment effects on long-term outcomes, when randomised controlled trials are not feasible because of size, time, budget and ethical constraints. Because of the absence of randomisation in observational studies, it is crucial to adequately control potential confounding from various factors (time-invariant and time-varying) in order to obtain causal effects of treatments. There has been rich literature on how to control potential confounding in observational studies such as using standard techniques-propensity score (PS) methods. However, there are various important methodological issues that have not been addressed adequately in the existing literature, including 1) partially missing confounder data in PS estimation; 2) sensitivity analysis for unmeasured confounding; 3) time-varying confounding; 4) multi-state treatment and outcome processes.
In the present application we aim to propose a programme of methodological research to address the above issues for the analysis and interpretation of data from observational studies, with a particular focus on event history data. We will develop and validate diagnostic tools in measuring the balance between treatment groups in terms of both observed values of confounders and their missing data patterns. We will provide a detailed evaluation of different missing data methods and PS methods, using balance diagnostic tools developed. We will develop general Monte Carlo sensitivity analysis methods for unmeasured confounding and non-ignorable missing data in measured confounders for common models for event history data analysis. We will develop robust time-varying PS methods for obtaining causal treatment effects when there are missing data in important time-varying confounders and explore a multistate framework to handle time-varying confounding in more general treatment and outcome processes for observational event history data.
In the present application we aim to propose a programme of methodological research to address the above issues for the analysis and interpretation of data from observational studies, with a particular focus on event history data. We will develop and validate diagnostic tools in measuring the balance between treatment groups in terms of both observed values of confounders and their missing data patterns. We will provide a detailed evaluation of different missing data methods and PS methods, using balance diagnostic tools developed. We will develop general Monte Carlo sensitivity analysis methods for unmeasured confounding and non-ignorable missing data in measured confounders for common models for event history data analysis. We will develop robust time-varying PS methods for obtaining causal treatment effects when there are missing data in important time-varying confounders and explore a multistate framework to handle time-varying confounding in more general treatment and outcome processes for observational event history data.
Planned Impact
The immediate beneficiaries of this project will be the academic community involved in using data from observational studies to obtain causal effects of treatments, interventions or exposures.
In addition downstream beneficiaries will be clinicians and public health policy makers who wish to make healthcare decisions based on data from observational studies. Currently, there is no consensus in the causal inference community on how best to deal with missing data in propensity score estimation, complex time-varying confounding and unmeasured confounding in the analysis of observational event history data. This proposed project would provide a methodological framework for addressing these common problems and thus better inform healthcare decision making.
In addition downstream beneficiaries will be clinicians and public health policy makers who wish to make healthcare decisions based on data from observational studies. Currently, there is no consensus in the causal inference community on how best to deal with missing data in propensity score estimation, complex time-varying confounding and unmeasured confounding in the analysis of observational event history data. This proposed project would provide a methodological framework for addressing these common problems and thus better inform healthcare decision making.
Organisations
- University College London (Lead Research Organisation)
- Cincinnati Children's Hospital Medical Center (Collaboration)
- University College London (Collaboration)
- University of Texas at Austin (Collaboration)
- KEELE UNIVERSITY (Collaboration)
- Wellcome Trust (Collaboration)
- Fudan University (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
Publications
Hanly JG
(2016)
A Longitudinal Analysis of Outcomes of Lupus Nephritis in an International Inception Cohort Using a Multistate Model Approach.
in Arthritis & rheumatology (Hoboken, N.J.)
Ke Y
(2016)
Semi-varying coefficient multinomial logistic regression for disease progression risk prediction.
in Statistics in medicine
Li Q
(2018)
Accommodating informative dropout and death: a joint modelling approach for longitudinal and semi-competing risks data.
in Journal of the Royal Statistical Society. Series C, Applied statistics
Lin H
(2017)
Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.
in Biometrics
O'Keeffe AG
(2018)
Correlated multistate models for multiple processes: an application to renal disease progression in systemic lupus erythematosus.
in Journal of the Royal Statistical Society. Series C, Applied statistics
O'Keeffe AG
(2018)
Correlated multistate models for multiple processes: an application to renal disease progression in systemic lupus erythematosus.
in Journal of the Royal Statistical Society. Series C, Applied statistics
Description | Influenced training of practitioners or researchers - Book chapter 'Missing Confounder Data in Propensity Score Methods for Causal Inference' |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | book chapter in Statistical Causal Inferences and Their Applications in Public Health Research by Springer |
Description | Introductory statistics short courses at MRC Biostatistics Unit |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | Organise the new introductory statistics short courses at the MRC Biostatistics Unit |
Description | Child Health Research PhD Programme -Statistical methods for missing data, linkage error, complex confounding in the causal pathway analysis using linked administrative data |
Amount | £56,589 (GBP) |
Organisation | Child Health Research Appeal Trust (CHRAT) |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2016 |
End | 09/2019 |
Description | ESRC DTC PhD program - Time-varying confounders and unmeasured confounders in longitudinal causal analysis of administrative data |
Amount | £57,000 (GBP) |
Organisation | Economic and Social Research Council |
Sector | Public |
Country | United Kingdom |
Start | 08/2016 |
End | 09/2019 |
Description | MASTERPLANS |
Amount | £164,800 (GBP) |
Funding ID | MR/M01665X/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2015 |
End | 03/2019 |
Description | Cincinnati Children's Hospital |
Organisation | Cincinnati Children's Hospital Medical Center |
Country | United States |
Sector | Hospitals |
PI Contribution | Develop statistical methods for patten mixture models and sensitivity analysis for non-response two-phase sampling |
Collaborator Contribution | Expertise in missing data research |
Impact | No output yet. |
Start Year | 2016 |
Description | Fudan University |
Organisation | Fudan University |
Country | China |
Sector | Academic/University |
PI Contribution | Develop methods for causal inference using electronic health records data |
Collaborator Contribution | Develop methods for causal inference using electronic health records data |
Impact | No outcome yet |
Start Year | 2017 |
Description | Keele University |
Organisation | Keele University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Develop methods for quantile estimation using complex survey data |
Collaborator Contribution | Expertise in nutritional research and growth chart estimation |
Impact | No outcome yet |
Start Year | 2015 |
Description | MASTERPLANS consortium in lupus |
Organisation | Wellcome Trust |
Department | Wellcome Trust Centre for Cell-Matrix Research |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | Co-investigator |
Collaborator Contribution | Principle Investigator of the funded consortium |
Impact | No outcome yet |
Start Year | 2015 |
Description | The University of Texas at Austin |
Organisation | University of Texas at Austin |
Department | Department of Statistics & Data Sciences |
Country | United States |
Sector | Academic/University |
PI Contribution | Develop methods for sensitivity analysis for analysing longitudinal data missing not at random |
Collaborator Contribution | Expertise in methods for handling missing data |
Impact | One publication in Statistics in Medicine in 2015 |
Start Year | 2015 |
Description | University College London |
Organisation | University College London |
Department | Institute of Neurology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Develop multistate models for correlated processes |
Collaborator Contribution | Develop multistate models for correlated processes |
Impact | One manuscript submitted |
Start Year | 2016 |
Description | University of Cambridge, Department of Pathology |
Organisation | University of Cambridge |
Department | Department of Public Health and Primary Care |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Analyse the CPRD data for disease progression of iron deficiency following oral iron treatment in primary care of UK |
Collaborator Contribution | Expertise in nutritional research |
Impact | No outcome yet |
Start Year | 2016 |
Description | University of Cambridge, Department of Primary Care and Public Health |
Organisation | University of Cambridge |
Department | Department of Pathology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Develop joint models for non-ignorable missing data |
Collaborator Contribution | Expertise in joint models of longitudinal and time-to-event data |
Impact | Ongoing research |
Start Year | 2017 |
Description | A talk in the 14th Armitage workshop in MRC Biostatistics Unit |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Li Su gave a talk on 'Two-part model for longitudinal data' in the 14th Armitage workshop at MRC Biostatistics Unit on 17th November 2016. |
Year(s) Of Engagement Activity | 2016 |
Description | Armitage Lectures |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Annual workshop and lecture created and hosted by the MRC Biostatistics Unit, to honour the immense contributions of Professor Peter Armitage who was at the unit from 1947 to 1961, and whose work is recognised throughout the world as achieving a successful balance between methodological rigour and applied commonsense, to which all statisticians aspire. An eminent medical statistician visits for a week and works with members of the unit. The highlight is the Armitage Lecture, where more than 100 delegates attend. This event raises the unit research profile and creates new collaborations. |
Year(s) Of Engagement Activity | Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016 |
URL | https://www.mrc-bsu.cam.ac.uk/news-and-events/armitage-lectureships-and-workshops/ |
Description | Article in Brown University website |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Undergraduate students |
Results and Impact | Li Su, Senior Investigator Statistician, contributed to alumni spotlight article in Brown University website https://www.brown.edu/academics/public-health/biostatistics/news/2016-05/alumni-spotlight |
Year(s) Of Engagement Activity | 2016 |
Description | CRiSM seminar at University of Warwick, Department of Statistics |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Li Su gave a talk on 'Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function' for the CRiSM seminar series at University of Warwick, Department of Statistics in January 2016. |
Year(s) Of Engagement Activity | 2016 |
Description | Present at Farr Institute of Health Informatics Research - London, UK, July 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Bo Fu was invited to give a seminar on "Causal analysis of administrative data and methodological challenges" at Farr Institute of Health Informatics Research - London, UK, July 2016 |
Year(s) Of Engagement Activity | 2016 |
Description | Present at MRC Biostatistics Unit at Cambridge |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Bo Fu gave a seminar on "Causal analysis of administrative data and methodological challenges" at MRC Biostatistics Unit, Cambridge in Dec 2016 |
Year(s) Of Engagement Activity | 2016 |
Description | Present at School of Public Health, Fudan University, Shanghai, China, June 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Bo Fu gave a seminar at School of Public Health, Fudan University, Shanghai, China, June 2016 |
Year(s) Of Engagement Activity | 2016 |
Description | Present at the 8th International Conference of Compuational and Methodological Statistics, London, UK, Dec 2015. |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | invited talk at The 8th International Conference of Compuational and Methodological Statistics, London, UK, Dec 2015. |
Year(s) Of Engagement Activity | 2016 |
Description | present at School of Public Health, Shanghai Jiaotong University, Shanghai, China, June 2016 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Dr Bo Fu gave a seminar at • School of Public Health, Shanghai Jiaotong University, Shanghai, China, June 2016 |
Year(s) Of Engagement Activity | 2016 |