HMD: Missing data in propensity score analyses of Electronic Health Records Data

Lead Research Organisation: London School of Hygiene and Tropical Medicine

Department Name: Epidemiology and Population Health

Abstract

Electronic storage and linkage of routinely-collected health data has opened up substantial opportunities to address important questions, not least those relating to the possible harms and benefits of long term medication use. Such information is important to patients and health care professionals alike. Indeed, the expectation that EHR and related data will be used to measure medication effects is now written into EU legislation. Thus we expect the use of electronic health records for research will increase dramatically.

Using data taken from electronic health records to investigate medication effects raises substantial challenges. In particular, patients who are prescribed a particular medication will tend to be very different from those who are not. Disentangling these patient differences from effects of the medication is a key aim of observational epidemiology. This process of disentangling, which is challenging even when information concerning patient characteristics (such as their cholesterol level or age) is available, is greatly complicated when some information is unavailable.

A propensity score analysis is a statistical approach that is very useful in accounting for differing patient characteristics between patients prescribed a medication and those who are not, in order to measure effects of the medication. By modelling the process of medication prescription, propensity score methods attempt to identify patients prescribed the medication and others who are not prescribed the medication who are otherwise comparable, and measures effects of the medication by comparing health outcomes between these patients. Methods for accounting for missing patient information within a propensity score analysis, however, remain poorly understood.

Failure to adequately handle missing information in an investigation of medication effects could lead to incorrect conclusions regarding the benefits, or harms, of the medication. In order to avoid this, it is vital to develop appropriate ways of dealing with missing information within propensity score analyses. There is an established literature concerning how to handle missing information within other types of analyses, particularly those focused around modelling the outcome as a function of the patient characteristics. However, the way in which the patient characteristics are used in this outcome modelling approach and propensity score analyses differs in practically important ways. Thus the way in which missing data should be handled cannot be directly learnt from our experiences within the outcome regression modelling context.
Our proposal aims to develop guidelines for researchers undertaking these analyses to help them select an appropriate method for handling their missing data, and to understand the assumptions under which their conclusions regarding the effects of the medication are valid. As part of this, we will take sophisticated statistical methods for handling missing data that have proved themselves outside the propensity score setting, such as multiple imputation, and develop and apply them in a way that is consistent with the goals and structure of propensity score analyses.

Because medication use of a particular patient will often change over time, as will many of the patient's characteristics, it is often desirable to take this into account in the statistical analysis. This can be done through the application of an extension of the propensity score approach, called marginal structural models. A final aspect of our proposal, therefore, seeks to understand how to extend our proposed missing data methods to this setting.

Through our broad based dissemination strategy (described elsewhere) our work will be relevant to a broad range of quantitative researchers in medical and social science, in academic, pharmaceutical, regulatory and policy settings.

Technical Summary

Propensity score methods are often used to assess treatment effects in observational data, particularly where a large number of confounding variables need to be accounted for. However, the confounding variables often have a non-trivial proportion of missing values. Methods for handling missing data within propensity score analyses are relatively under-investigated; while missing data methodology is now well established for a range of standard substantive scientific models, this does not necessarily directly translate to the propensity score context, due to the differing ways in which the confounders are used.

Fundamental to analyses of partially observed data is the accessible framing of the additional assumptions entailed, and statistical methods for valid inference under these assumptions. This project brings this approach to this setting through (i) evaluation of missing value indicator methods; (ii) development of multiple imputation strategies consistent with propensity score methodologies; (iii) using these to lift the restrictions on our doubly robust estimators (robust to misspecification of confounding or substantive models) via doubly robust multiple imputation, and (iv) applying the methodologies to the practically important area of time varying confounding.

Through the dissemination strategy outlined elsewhere (including development of software) our research will enable practitioners to:
(1) Frame appropriate assumptions regarding missing data in the context of their data and research questions;
(2) Understand the impact of these on the validity of complete records and missing indicator type analyses;
(3) Choose appropriate multiple imputation models consistent with the substantive propensity score model;
(4) Apply methods with a degree of robustness to misspecification of key components of the propensity score analysis;
(5) Understand, and clearly report, the assumptions, strengths and limitations of the analyses performed.

Planned Impact

Our proposed research tackles outstanding missing data issues hindering the widespread use of propensity scores for robust inference from electronic health record data. Although motivated by assessing long-term medication effects, our findings will be applicable across the range of medical and social sciences, where routinely collected data is increasingly being used to understand effects of treatment and/or policy interventions.

It is therefore of interest to, and stands to benefit, a broad range of stakeholders in this area, including pharmaceutical companies, academic researchers in health and social science, policy makers and bodies which are responsible for pharmacovigilance, such as the Medicines and Healthcare Products Regulatory Agency (MHRA).

Through the analyses conducted by researchers working for these bodies, drawing on the insights, methods and software arising from this project, we expect the research to in turn benefit clinicians and their patients.

Funded Value:

£392,299

Funded Period:

Jan 15 - Aug 18

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/M013278/1

Principal Investigator:

Elizabeth Williamson

Health Category:

Unclassified

Organisations

People	ORCID iD
Elizabeth Williamson (Principal Investigator)
Ian Douglas (Co-Investigator)
Harry Hemingway (Co-Investigator)
James Carpenter (Co-Investigator)
Shaun Seaman (Co-Investigator)
Ian White (Co-Investigator)	http://orcid.org/0000-0002-6718-7661
Liam Smeeth (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Ali MS (2019) Propensity Score Methods in Health Technology Assessment: Principles, Extended Applications, and Recent Advances. in Frontiers in pharmacology

Blake HA (2020) Estimating treatment effects with partially observed covariates using outcome regression with missing indicators. in Biometrical journal. Biometrische Zeitschrift

Blake HA (2020) Propensity scores using missingness pattern information: a practical guide. in Statistics in medicine

Blake HA (2020) Estimating treatment effects with partially observed covariates using outcome regression with missing indicators. in Biometrical journal. Biometrische Zeitschrift

Chatton A (2022) G-computation and doubly robust standardisation for continuous-time data: A comparison with inverse probability weighting. in Statistical methods in medical research

Crellin E (2018) Trimethoprim use for urinary tract infection and risk of adverse outcomes in older patients: cohort study. in BMJ (Clinical research ed.)

Elze MC (2019) Evaluation in four cardiovascular studies. in JACC

Honeyford K (2020) Evaluating a digital sepsis alert in a London multisite hospital network: a natural experiment using electronic health record data. in Journal of the American Medical Informatics Association : JAMIA

Honeyford K (2019) Evaluating a digital sepsis alert in a London multi-site hospital network: a natural experiment using electronic health record data

Leyrat C (2021) Common Methods for Handling Missing Data in Marginal Structural Models: What Works and Why. in American journal of epidemiology

Collaboration
Software and Technical Products
Engagement Activities


Description	Australian propensity score work
Organisation	University of Melbourne
Department	Centre for Epidemiology & Biostatistics
Country	Australia
Sector	Academic/University
PI Contribution	We are advising our Australian collaborators on how to deal with the missing data issues in their data, which they are analysing using propensity score methods.
Collaborator Contribution	Our collaborators have a very interesting dataset, posing particular methodological challenges. This is helping to guide our methodological thoughts about how to handle missing data in this contexrt.
Impact	We are still analysing the data.
Start Year	2015


Description	Brigham and Women's Hospital
Organisation	Harvard University
Department	Harvard T.H. Chan School of Public Health
Country	United States
Sector	Academic/University
PI Contribution	Sebastian Schneeweiss's group, Brigham and Women's Hospital, Division of Pharmacoepidemiology and Pharmacoeconomics at the Harvard School of Public Health, developed the High Dimensional Propensity Score. Growing out of an initial MRC MRP project grant, further developed in a second, we have established a collaboration with them to further explore the HDPS in UK electronic health record data.
Collaborator Contribution	A PhD student and early career fellow both visited our collaborators in Boston. We have regular teleconferences and email exchanges regarding our collaborative projects.
Impact	Publications in process
Start Year	2018


Description	Farr Institute
Organisation	Farr Institute of Health Informatics Research
Country	United Kingdom
Sector	Academic/University
PI Contribution	Members of our team have advised a number of researchers at the Farr institute about handling missing data within propensity score analyses.
Collaborator Contribution	Our collaborators at the Farr have a range of interesting real-life examples, which are throwing up unexpected methodological challenges and enabling us to broaden our focus in our methodological work.
Impact	Analyses underway
Start Year	2015


Description	GSK
Organisation	GlaxoSmithKline (GSK)
Country	Global
Sector	Private
PI Contribution	We are working with researchers at GSK to investigate the potential of a relatively novel design (the prevalent new user design), for analyses of UK electronic health record data.
Collaborator Contribution	They have hosted a doctoral student as an intern for a number of weeks.
Impact	NA
Start Year	2017


Description	Missing data case studies
Organisation	Maastricht University (UM)
Country	Netherlands
Sector	Academic/University
PI Contribution	Our collaborators in Maastricht University were applying propensity score methods in electronic health data and wished to understand how robust their results were to the missing data methods they were using. They have provided the case study, and analysis, for an investigation of various missing data methods.
Collaborator Contribution	We have provided the methodological underpinnings of the case study and advice on the analyses.
Impact	Abstract submitted to the 2016 Conference for the International Society for Pharmacoepidemiology.
Start Year	2015


Title	R package MatchThem
Description	This R package aims to facilitate the use of multiple imputation in propensity score matched analyses.
Type Of Technology	Webtool/Application
Year Produced	2020
Open Source License?	Yes
Impact	Not known yet.
URL	https://cran.r-project.org/web/packages/MatchThem/index.html


Description	PPI forum
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	We ran a public debate and discussion about missing data; how it arises, how it impacts on medical research and how we deal with it in analyses. It was called "Listening to the silence: What does unrecorded information in the electronic health record tell us?" There was a lively debate about how missing data arises, and how researchers should go about investigating and thinking about missingness mechanisms. We have written a report about the utility of patient and public involvement in such methodological questions and submitted an abstract to the 2016 International Population Data Linkage Conference.
Year(s) Of Engagement Activity	2016