Exploring biases in propensity score analyses of Electronic Health Record (EHR) data

Lead Research Organisation: London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health

Abstract

Causal inference is often the main aim of medical-based investigations and as medical researchers we want to be able to answer questions such as "Is this treatment effective?" and "How dangerous is this exposure?". The gold-standard for these sort of investigations has long been the Randomised Controlled Trial (RCT). However, EHRs (such as the UK Clinical Practice Research Datalink) are becoming more widely used. These are often very large and complex datasets that can provide endless opportunities to investigate important questions that we currently struggle to adequately address. One such use of EHRs is to obtain powerful estimates of both the long-term and rare effects of medications, this is often not feasible in a trial. EHRs also have the advantage of being very affordable and are an efficient way to make use of routinely collected data. However, it is often difficult to make casual estimates of effects from routinely collected data and in this project I want to investigate if these inferences can be made and if so, in what circumstances? The use of EHRs has expanded considerably recently and this is in part due to the fact that the European Union has changed legislation to say that it is mandatory for pharmaceuticals to conduct studies of either safety or effectiveness as part of drug licensing conditions. Whilst it is not specified that they should use routinely collected data, in many cases there is no other option. With EHRs being a relatively new area there is a big demand for statistical methods to be explored and developed. It is also of great importance to explore potential issues and biases that can arise through the use of such data as this can affect the ability to make accurate inferences.
The overarching aim of this project is to identify reasons for contradictory results gained from EHR data versus RCT analyses. It might be the case that the EHR analysis is biased or we may have correct answers to two different questions. For instance, maybe the treatment effect differs across populations and we are therefore estimating separate underlying effects. Initially, I plan to investigate other possible causes of bias such as unmeasured confounding, time-varying exposures, missing data and treatment switching. I will investigate these possible biases using several real world case studies, firstly replicating the study results before proceeding to conduct more complicated analyses to investigate the sensitivity of our results. By accounting for the different potential sources of bias individually we aim to isolate and identify the key methodological issues. We will then restrict our focus to these key issues and use simulation studies to assess the statistical methods developed. One such study I will investigate is from Douglas et al (2012)1 where the self-controlled case series analysis and cohort study provided different conclusions about the association between proton pump inhibitors and myocardial infarction.
This project aims to provide quantitative skills in line with the MRC cross-cutting priorities for skills provision. We will examine existing techniques being used to analyse EHR data and explore the extent to which these might be biased. Statistical methods will then be developed to tackle important sources of bias. Throughout the project, particular emphasis will be put on the application of these methods to EHRs.
To build upon the knowledge accumulated during my Mathematics undergraduate and Medical Statistics Masters degrees I plan to attend courses on causal inference (LSHTM, November 2016) and pharmacoepidemiology (McGill University, May 2017 and LSHTM, September 2017). I will also be attending research groups and seminar series at LSHTM, St George's and the Farr Institute.

References
1. Douglas I, Evans S, Hingorani A, Grosso A, Timmis A, Hemingway H , Smeeth L. Clopidogrel and interaction with proton pump inhibitors: comparision between cohort and within person study designs. BMJ 2012

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/N013638/1 01/10/2016 30/09/2025
1784702 Studentship MR/N013638/1 01/10/2016 31/03/2021 John Tazare
MR/R502273/1 01/10/2017 30/09/2021
1784702 Studentship MR/R502273/1 01/10/2016 31/03/2021 John Tazare
 
Description Diagnostics and reporting guidelines for high-dimensional propensity score analyses 
Organisation Harvard University
Country United States 
Sector Academic/University 
PI Contribution We have set up a collaboration with the Division of Pharmacoepidemiology and Pharmacoeconomics at Harvard to develop guidelines for high-dimensional propensity score analyses. This has involved working with a group at Harvard who originally developed this method. We have proposed several visualisation tools and draft guidance based on our experience of high-dimensional propensity scores in UK electronic health records.
Collaborator Contribution Division of Pharmacoepidemiology and Pharmacoeconomics have contributed data as well as input on the suitability of our proposed tools based on vast experience of conducting high-dimensional propensity score analyses.
Impact The work is currently ongoing.
Start Year 2019