HOD1: Comparative Effectiveness Research using Observational Data:Methodological Developments and a Roadmap (CER-OBS)
Lead Research Organisation:
University College London
Department Name: Institute of Child Health
Abstract
There is increased availability of linked coded patient data that are collected in the course of clinical care and held electronically in formats and within safe environments that protect their anonymity. This creates unique opportunities for research into the benefits and harms of treatments when prescribed in "real-world" clinical practice ", as opposed to those identified (or not) when tested in clinical trials (which may involve limited numbers and/or types of patients). Evidence generated from routine data is however not immune from controversy, because of the challenges posed by being based on information that is not collected for research purposes. Indeed, errors and incompleteness often affect these data, with the timing and frequency of their collection also potentially biasing their analysis. Our research proposal attempts to address these potential biases by adopting and extending a novel approach to the analysis of routine data. It consists of emulating the ideal clinical trial for the efficacy of a treatment using routinely collected data (the "emulate the target trial" (ETT) approach). To make this happen, the patient population, treatment strategy, follow-up procedures and disease assessment need to be identified from the routine data, with the effectiveness measure, that the target trial would pursue, specified. To do this explicitly is not straightforward. Indeed, there are several examples where the design of studies based on routine data has introduced bias, mostly via incorrect definitions of the patient population or treatment received. Also, studying the effectiveness of treatments to be sustained over time requires estimation of effects that measure the impact of adherence to treatment, and not just its initiation. This poses analytical challenges, usually addressed by the methods that are not necessarily the most suitable, in particularly when data quality issues are also to be addressed.
For these reasons, we propose to:
A) Create a roadmap for the assessment and generation of evidence of comparative effectiveness (or harms) formulated within the ETT framework.
B) Provide easy access to the most flexible of the estimation approaches for comparative effectiveness, g-estimation, and extend it to address the challenges posed by data quality, including facilitating sensitivity analyses.
C) Use exemplars from linked UK databases to illustrate the application of ETT to CER. The treatment regimens being examined are: intensive versus less intensive cardiovascular disease prevention and glycaemic control in type 2 diabetes patients; antibiotics in infancy and subsequent asthma risk in childhood; and palivizumab, a monoclonal antibody, in high risk infants and later hospitalization due to bronchiolitis.
These exemplars will illustrate the advantages of implementing the ETT approach when comparing treatment regimens in terms of estimating their benefits and harms (e.g. in the first example preventing heart disease but potentially increasing hypoglycaemic episodes). In comparison to more traditional uses of routine clinical data, ETT has greater transparency of purpose (addressing the same question as the target trial), which acts as a guide for the study design and analysis. The ability to 'enrol' large numbers of patients who receive their treatment in the "real-world", as delivered by the NHS, and who are followed for many years, as opposed to a limited period, is the great advantage of this approach over equivalent (i.e. pragmatic) randomised clinical trials.
Overall this research will provide tools to aid users of research results (e.g. NICE) in assessing the quality of the available evidence from observational studies, and to guide applied researchers in the implementation of the ETT framework, including designing the study that corresponds to the clinical question, and explicitly dealing with data quality issues while adopting flexible and robust estimation approaches.
For these reasons, we propose to:
A) Create a roadmap for the assessment and generation of evidence of comparative effectiveness (or harms) formulated within the ETT framework.
B) Provide easy access to the most flexible of the estimation approaches for comparative effectiveness, g-estimation, and extend it to address the challenges posed by data quality, including facilitating sensitivity analyses.
C) Use exemplars from linked UK databases to illustrate the application of ETT to CER. The treatment regimens being examined are: intensive versus less intensive cardiovascular disease prevention and glycaemic control in type 2 diabetes patients; antibiotics in infancy and subsequent asthma risk in childhood; and palivizumab, a monoclonal antibody, in high risk infants and later hospitalization due to bronchiolitis.
These exemplars will illustrate the advantages of implementing the ETT approach when comparing treatment regimens in terms of estimating their benefits and harms (e.g. in the first example preventing heart disease but potentially increasing hypoglycaemic episodes). In comparison to more traditional uses of routine clinical data, ETT has greater transparency of purpose (addressing the same question as the target trial), which acts as a guide for the study design and analysis. The ability to 'enrol' large numbers of patients who receive their treatment in the "real-world", as delivered by the NHS, and who are followed for many years, as opposed to a limited period, is the great advantage of this approach over equivalent (i.e. pragmatic) randomised clinical trials.
Overall this research will provide tools to aid users of research results (e.g. NICE) in assessing the quality of the available evidence from observational studies, and to guide applied researchers in the implementation of the ETT framework, including designing the study that corresponds to the clinical question, and explicitly dealing with data quality issues while adopting flexible and robust estimation approaches.
Technical Summary
The potential gains for enhancing comparative effectiveness research (CER) by exploiting linked health-related databases are indisputable. However, the potential pitfalls that may arise from not properly accounting for the administrative nature of this information have fuelled a debate on its utility. A promising approach that addresses (some of) these concerns advocates the implementation of trial design principles when exploiting observational data in CER. It consists of emulating the trial that would be designed to study the effectiveness of a treatment by specifying the target population and target comparative measures, and then handling/analysing the observational data to replicate them. There are still several unmet challenges for a robust adoption of this approach, however:
Study design:
Erroneous decisions at any step of the construction of the emulated trial may affect the robustness of the reported evidence. Awareness of potential methodological pitfalls is essential for the interpretation and delivery of evidence.
Data quality:
The coarseness often affecting observational data and the likely dependence of the timing of the observations on factors related to the disease evolution, impact on the robustness of the adopted estimation approach.
Estimation:
Most CER involves treatments sustained over time and requires implementing g-methods for estimation. G-estimation of structural nested models is the most flexible, particularly for dealing with multiple confounders and/or time-points, but has not been fully exploited in applications, nor extended to deal with data coarseness and dependent follow-up.
Given these challenges, this application aims to:
A) Create a roadmap for the assessment and robust delivery of evidence that adopts the "emulate the target trial" approach.
B) Extend g-estimation to deal with data coarseness and dependent follow-up.
C) Use clinically relevant exemplars to illustrate both approach and methodological developments.
Study design:
Erroneous decisions at any step of the construction of the emulated trial may affect the robustness of the reported evidence. Awareness of potential methodological pitfalls is essential for the interpretation and delivery of evidence.
Data quality:
The coarseness often affecting observational data and the likely dependence of the timing of the observations on factors related to the disease evolution, impact on the robustness of the adopted estimation approach.
Estimation:
Most CER involves treatments sustained over time and requires implementing g-methods for estimation. G-estimation of structural nested models is the most flexible, particularly for dealing with multiple confounders and/or time-points, but has not been fully exploited in applications, nor extended to deal with data coarseness and dependent follow-up.
Given these challenges, this application aims to:
A) Create a roadmap for the assessment and robust delivery of evidence that adopts the "emulate the target trial" approach.
B) Extend g-estimation to deal with data coarseness and dependent follow-up.
C) Use clinically relevant exemplars to illustrate both approach and methodological developments.
Planned Impact
This research proposal, if awarded, would have several beneficiaries. These include:
(a) The broader research community
This research will develop formal tools for the assessment of comparative effectiveness studies arising from electronic health records, as well as develop new tools appropriate for the type of linked health data now increasingly available. This will enhance the ability of all researchers, particularly those involved in systemic reviews and bioinformatics, to generate, evaluate and compare research outputs from different sources. This project will also demonstrate the feasibility of close collaborations between experts in different academic disciplines (statistical methodologists, epidemiologists, health informaticians, and clinicians) and will create a network of users via the dedicated website and planned short courses. This environment will also enhance the professional development of future generations of data science researchers.
(b) Evaluators and policy makers
This project will improve the exploitation of routine health records for the evaluation of evidence on clinical effectiveness (and/or harms) as experienced in health and other services (the "real world"). One additional gain will be the ability to reconcile inconsistencies in the available evidence when this arises from differences in key features of the design and/or analysis of the observational data, as highlighted in the proposal. The results from the three exemplars will have implications for policymakers involved in the prevention of premature mortality and CHD risk in T2 diabetic patients, understanding wider effects of antibiotic use and the potential for preventing onset of asthma in children, and the benefits of ensuring adherence to effective treatment of high risk infants hospitalised with acute bronchiolitis. These exemplars will also act as catalysts for setting up investigations into the effectiveness and harms of other treatments in different populations.
(c) Clinicians and patients
The guidelines, methodological developments and tools we plan to provide will aid researchers to address questions regarding benefits and harms of treatments and interventions for which randomised controlled trials (RCTs) may be unfeasible, not yet available, or unethical. It will also allow investigation of more diverse and larger study populations than is usually achievable with RCTs and hence be relevant to the "the real world" and hence to clinicians and patients in the very near future. Our exemplars are targeting questions of clinical uncertainty and which are of interest to patients and the public as well as to clinicians, commissioners and decision makers.
(a) The broader research community
This research will develop formal tools for the assessment of comparative effectiveness studies arising from electronic health records, as well as develop new tools appropriate for the type of linked health data now increasingly available. This will enhance the ability of all researchers, particularly those involved in systemic reviews and bioinformatics, to generate, evaluate and compare research outputs from different sources. This project will also demonstrate the feasibility of close collaborations between experts in different academic disciplines (statistical methodologists, epidemiologists, health informaticians, and clinicians) and will create a network of users via the dedicated website and planned short courses. This environment will also enhance the professional development of future generations of data science researchers.
(b) Evaluators and policy makers
This project will improve the exploitation of routine health records for the evaluation of evidence on clinical effectiveness (and/or harms) as experienced in health and other services (the "real world"). One additional gain will be the ability to reconcile inconsistencies in the available evidence when this arises from differences in key features of the design and/or analysis of the observational data, as highlighted in the proposal. The results from the three exemplars will have implications for policymakers involved in the prevention of premature mortality and CHD risk in T2 diabetic patients, understanding wider effects of antibiotic use and the potential for preventing onset of asthma in children, and the benefits of ensuring adherence to effective treatment of high risk infants hospitalised with acute bronchiolitis. These exemplars will also act as catalysts for setting up investigations into the effectiveness and harms of other treatments in different populations.
(c) Clinicians and patients
The guidelines, methodological developments and tools we plan to provide will aid researchers to address questions regarding benefits and harms of treatments and interventions for which randomised controlled trials (RCTs) may be unfeasible, not yet available, or unethical. It will also allow investigation of more diverse and larger study populations than is usually achievable with RCTs and hence be relevant to the "the real world" and hence to clinicians and patients in the very near future. Our exemplars are targeting questions of clinical uncertainty and which are of interest to patients and the public as well as to clinicians, commissioners and decision makers.
Publications
De Stavola B
(2022)
Framing Causal Questions in Life Course Epidemiology
in Annual Review of Statistics and Its Application
Goetghebeur E
(2020)
Formulating causal questions and principled statistical answers.
in Statistics in medicine
Tompsett D
(2023)
Target Trial Emulation and Bias Through Missing Eligibility Data: An Application to a Study of Palivizumab for the Prevention of Hospitalization Due to Infant Respiratory Illness.
in American journal of epidemiology
Tompsett D
(2022)
gesttools: General Purpose G-Estimation in R
in Observational Studies
Tompsett D
(2022)
Target Trial Emulation and Bias Through Missing Eligibility Data: An Application to a Study of Palivizumab for the Prevention of Hospitalization due to Infant Respiratory Illness.
in American Journal of Epidemiology
Zylbersztejn A
(2022)
Access to palivizumab among children at high risk of respiratory syncytial virus complications in English hospitals.
in British journal of clinical pharmacology
Title | General Purpose g-Estimation for End of Study or Time-Varying Outcomes |
Description | R Package available in GitHub |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | It is facilitating the application of causal inference methods in the presence of time varying exposures and time-varying confounding |
URL | https://cran.r-project.org/web/packages/gesttools/index.html |
Title | Gesttool- CRAN Manual |
Description | CRAN Manual for R Package |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | Too early to say- but the need is there |
URL | https://cran.r-project.org/web/packages/gesttools/index.html |
Title | R package "gesttools" |
Description | This is an R software package, accepted onto CRAN, that allows for general purpose g-estimation of longitudinal datasets. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | Has been uploaded to CRAN and therefore ma be used by the general public |
URL | https://cran.r-project.org/web/packages/gesttools/index.html |
Title | gestool |
Description | This is a suite of R functions that are relevant for the project- one seminar already given about this, beta test version shared, paper in draft |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | Currently only used within the current project |
Description | ISCB 2021 Conference: Invited online poster session: Target Trial Emulation and Missing Eligibility Data: A study of Palivizumab for child respiratory illness" |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was an invited poster session over a live zoom like call. People would enter the chatroom and I would explain the poster to them. The poster detailed briefly the work of target trial emulation with missing eligibility data in the case of palivizumab prescriptions in newborns. Some discussion was given which helped direct the subsequent research article to be published. |
Year(s) Of Engagement Activity | 2021 |
URL | https://easychair.org/smart-program/ISCB2021/ |
Description | In what ways might target trial emulation improve your research? Some reflections |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | This was a seminar presented at the |Statistical Group at UCL |
Year(s) Of Engagement Activity | 2022 |
Description | Invite talk (Data Integration and Modelling in Observational Studies Meeting, Royal Statistical SOciety) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | The invited talk given by Bianca De Stavola was entitled "Comparative Effectiveness Research using Observational Data". It presented the background for this new grant, and concered a recently proposed framework, "target trail emulation (TTE)" for investigation of comparative effectiveness based on observational data. The talk reviewed the essential components of this approach and discussed the challenges in implementing them using as exemplar the investigation of the effectiveness of a particular immunization strategy for the prevention of hospital admission in at-risk infants, based on linked hospital and prescription data. |
Year(s) Of Engagement Activity | 2019 |
Description | Invited Conference talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Many challenges posed by performing TTE have been discussed in the literature , in particular w.r.t. survival time bias The talk highlighted additional ones: (i) missing eligibility data are common but often ignored: dealing with the likely selection bias requires careful considerations (ii) There are some emulation steps that are often kept 'hidden' but may be very influential The recommendation was that these issues should be acknowledged and added to the description of the emulated trial |
Year(s) Of Engagement Activity | 2020 |
URL | https://rss.org.uk/training-events/events/rss-2020-online-conference/ |
Description | Invited Conference talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | International conference that involves methodological researchers from statistics and econometrics |
Year(s) Of Engagement Activity | 2021 |
URL | http://www.cmstatistics.org/conferences.php |
Description | Invited Conference talk, |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | International Conference where I presented current issues with survival analysis |
Year(s) Of Engagement Activity | 2020 |
URL | https://iscb2020.info/ |
Description | Invited Conference talk, |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This was at international conference on childhood cancer linked to a collaborative group of data resurces |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.i4c2022.unito.it/ |
Description | Invited Online presentation: RSS conference 2021: Target Trial Emulation and Missing Eligibility Data: A study of Palivizumab for child respiratory illness Contributed: Missing data, 09:00-10:00 Thursday, 9 September, 2021 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was a 20 minute online presentation work on Target Trial emulation and missing eligibility data in the context of Palivizumab prescription. Its intended audience were mainly academic and research related, but could also included those in industry. The audience asked some questions as to the technical details of the work. This helped in eventually producing a paper which is in the final stages of hopefully being published soon. |
Year(s) Of Engagement Activity | 2021 |
Description | Invited Talk at LSHTM |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This was a prestigious invited lecture held yearly at the London School of Hygiene and Tropical Medicine dedicated to the work of Sir Austin Bradford Hill |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.lshtm.ac.uk/newsevents/events/30th-bradford-hill-memorial-lecture |
Description | Invited seminar at York University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | This was a methodoogical seminar given by Bianca De Stavola that described the framework of target trial emulation (TTE), a framework which is increasingly adopted in comparative effectiveness research. TTE has multiple advantages, starting from the clarity of explicitly specifying the hypothetical target experimental trial for the questions of interest. However, the challenges of analysing observational data to address causal questions, and in particular of dealing with time-varying confounding and informative censoring, require careful selection of the estimand(s) of interest and, consequently of the causal model from which the estimand can be identified, as well as of the estimation methods. The talk will touched upon some of these challenges using as illustration an investigation of the cardiovascular risk of type 2 diabetes patients undergoing 2nd line therapy. Data on general practitioner consultations from 147 East London GPs from 2012 to 2017 will be used. |
Year(s) Of Engagement Activity | 2020 |
Description | Invited seminar at a NASH event (UCL) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | A talk entitled "G Estimation in R" given by Daniel Tompstett, reporting on new R functions written to perform g-estimation. |
Year(s) Of Engagement Activity | 2019 |
Description | Invited talk at Harvard University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This is a prestigious and highly sought seminar series |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.hsph.harvard.edu/epidemiology/epi-seminar-series/ |
Description | Invited talk at IARC in Lyon (FRance) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Policymakers/politicians |
Results and Impact | talk to French epidemiologists who have been less aware of modern causal inference |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.iarc.who.int/events/ |
Description | Invited talk at LA Sapienza University (Roma, Italy) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | A seminar that introduced the ideas of target trial emulation to a new audience |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.dss.uniroma1.it/it |
Description | Invited talk at McGill University (Canada) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Title: Challenges in Emulating Target Trials Topic: The framework of target trial emulation (TTE) is increasingly adopted when researchers wish to address causal questions using observational data. |
Year(s) Of Engagement Activity | 2020 |
URL | http://www.mcgill.ca/epi-biostat-occh/news-events/seminars/biostatistics |
Description | Short Course on Target Trial Emulation |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | introduction to Target Trial Emulation for Comparative Effectiveness Research 25 October 2022 Course description This introductory course is for anyone wishing to understand how comparisons of the effectiveness of alternative therapies or interventions can be performed using real world data (RWD) when adopting the framework of target trial emulation (TTE). RWD are data on the everyday experiences of individuals that are collected through surveys, cohort studies, administrative and clinical. These data are observational, as opposed to experimental. Because of this, using them to address causal questions such as those of comparative effectiveness raises many concerns and difficulties. In this course we will describe the main sources of bias affecting RWD, describe how TTE can address some of them, and discuss its application in group discussions and computer practicals (in Stata and R). Learning objectives To develop an understanding of the main challenges arising from using RWD in comparative effectiveness research and how to implement TTE to address at least some of them. Course Structure The course will consist of three live lectures and two practicals. Participants are expected to be familiar with directed causal diagrams and regression models. |
Year(s) Of Engagement Activity | 2022 |
Description | Three Step Latent Class Analysis in R and STATA; Talk at the UK Stata Conference 2022 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | We described th4 implementation of 3-step and 2-step latent class analysis using Stata and R |
Year(s) Of Engagement Activity | 2022 |