Exploring biases in propensity score analyses of Electronic Health Record (EHR) data
Lead Research Organisation:
London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health
Abstract
Causal inference is often the main aim of medical-based investigations and as medical researchers we want to be able to answer questions such as "Is this treatment effective?" and "How dangerous is this exposure?". The gold-standard for these sort of investigations has long been the Randomised Controlled Trial (RCT). However, EHRs (such as the UK Clinical Practice Research Datalink) are becoming more widely used. These are often very large and complex datasets that can provide endless opportunities to investigate important questions that we currently struggle to adequately address. One such use of EHRs is to obtain powerful estimates of both the long-term and rare effects of medications, this is often not feasible in a trial. EHRs also have the advantage of being very affordable and are an efficient way to make use of routinely collected data. However, it is often difficult to make casual estimates of effects from routinely collected data and in this project I want to investigate if these inferences can be made and if so, in what circumstances? The use of EHRs has expanded considerably recently and this is in part due to the fact that the European Union has changed legislation to say that it is mandatory for pharmaceuticals to conduct studies of either safety or effectiveness as part of drug licensing conditions. Whilst it is not specified that they should use routinely collected data, in many cases there is no other option. With EHRs being a relatively new area there is a big demand for statistical methods to be explored and developed. It is also of great importance to explore potential issues and biases that can arise through the use of such data as this can affect the ability to make accurate inferences.
The overarching aim of this project is to identify reasons for contradictory results gained from EHR data versus RCT analyses. It might be the case that the EHR analysis is biased or we may have correct answers to two different questions. For instance, maybe the treatment effect differs across populations and we are therefore estimating separate underlying effects. Initially, I plan to investigate other possible causes of bias such as unmeasured confounding, time-varying exposures, missing data and treatment switching. I will investigate these possible biases using several real world case studies, firstly replicating the study results before proceeding to conduct more complicated analyses to investigate the sensitivity of our results. By accounting for the different potential sources of bias individually we aim to isolate and identify the key methodological issues. We will then restrict our focus to these key issues and use simulation studies to assess the statistical methods developed. One such study I will investigate is from Douglas et al (2012)1 where the self-controlled case series analysis and cohort study provided different conclusions about the association between proton pump inhibitors and myocardial infarction.
This project aims to provide quantitative skills in line with the MRC cross-cutting priorities for skills provision. We will examine existing techniques being used to analyse EHR data and explore the extent to which these might be biased. Statistical methods will then be developed to tackle important sources of bias. Throughout the project, particular emphasis will be put on the application of these methods to EHRs.
To build upon the knowledge accumulated during my Mathematics undergraduate and Medical Statistics Masters degrees I plan to attend courses on causal inference (LSHTM, November 2016) and pharmacoepidemiology (McGill University, May 2017 and LSHTM, September 2017). I will also be attending research groups and seminar series at LSHTM, St George's and the Farr Institute.
References
1. Douglas I, Evans S, Hingorani A, Grosso A, Timmis A, Hemingway H , Smeeth L. Clopidogrel and interaction with proton pump inhibitors: comparision between cohort and within person study designs. BMJ 2012
The overarching aim of this project is to identify reasons for contradictory results gained from EHR data versus RCT analyses. It might be the case that the EHR analysis is biased or we may have correct answers to two different questions. For instance, maybe the treatment effect differs across populations and we are therefore estimating separate underlying effects. Initially, I plan to investigate other possible causes of bias such as unmeasured confounding, time-varying exposures, missing data and treatment switching. I will investigate these possible biases using several real world case studies, firstly replicating the study results before proceeding to conduct more complicated analyses to investigate the sensitivity of our results. By accounting for the different potential sources of bias individually we aim to isolate and identify the key methodological issues. We will then restrict our focus to these key issues and use simulation studies to assess the statistical methods developed. One such study I will investigate is from Douglas et al (2012)1 where the self-controlled case series analysis and cohort study provided different conclusions about the association between proton pump inhibitors and myocardial infarction.
This project aims to provide quantitative skills in line with the MRC cross-cutting priorities for skills provision. We will examine existing techniques being used to analyse EHR data and explore the extent to which these might be biased. Statistical methods will then be developed to tackle important sources of bias. Throughout the project, particular emphasis will be put on the application of these methods to EHRs.
To build upon the knowledge accumulated during my Mathematics undergraduate and Medical Statistics Masters degrees I plan to attend courses on causal inference (LSHTM, November 2016) and pharmacoepidemiology (McGill University, May 2017 and LSHTM, September 2017). I will also be attending research groups and seminar series at LSHTM, St George's and the Farr Institute.
References
1. Douglas I, Evans S, Hingorani A, Grosso A, Timmis A, Hemingway H , Smeeth L. Clopidogrel and interaction with proton pump inhibitors: comparision between cohort and within person study designs. BMJ 2012
People |
ORCID iD |
Elizabeth Williamson (Primary Supervisor) | |
John Tazare (Student) |
Publications
Tazare J
(2020)
Implementing high-dimensional propensity score principles to improve confounder adjustment in UK electronic health records.
in Pharmacoepidemiology and drug safety
Tazare J
(2022)
Transparency of high-dimensional propensity score analyses: Guidance for diagnostics and reporting.
in Pharmacoepidemiology and drug safety
OpenSAFELY Collaborative
(2022)
Comparison of methods for predicting COVID-19-related death in the general population using the OpenSAFELY platform.
in Diagnostic and prognostic research
Brown JP
(2021)
Proton pump inhibitors and risk of all-cause and cause-specific mortality: A cohort study.
in British journal of clinical pharmacology
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
MR/N013638/1 | 30/09/2016 | 29/09/2025 | |||
1784702 | Studentship | MR/N013638/1 | 30/09/2016 | 30/03/2021 | John Tazare |
MR/R502273/1 | 30/09/2017 | 29/09/2021 | |||
1784702 | Studentship | MR/R502273/1 | 30/09/2016 | 30/03/2021 | John Tazare |
Description | Diagnostics and reporting guidelines for high-dimensional propensity score analyses |
Organisation | Harvard University |
Country | United States |
Sector | Academic/University |
PI Contribution | We have set up a collaboration with the Division of Pharmacoepidemiology and Pharmacoeconomics at Harvard to develop guidelines for high-dimensional propensity score analyses. This has involved working with a group at Harvard who originally developed this method. We have proposed several visualisation tools and draft guidance based on our experience of high-dimensional propensity scores in UK electronic health records. |
Collaborator Contribution | Division of Pharmacoepidemiology and Pharmacoeconomics have contributed data as well as input on the suitability of our proposed tools based on vast experience of conducting high-dimensional propensity score analyses. |
Impact | The work is currently ongoing. |
Start Year | 2019 |