Enhancing the design and analysis of cluster randomised trials using machine learning

Lead Research Organisation: London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health

Abstract

Trials are important to evaluate the safety and efficacy of new treatments or interventions. One type of trial, called a cluster randomised trial (CRT) uses pre-existing groups of individuals - known as clusters - who are randomly allocated to different treatments. It means that every member of the same cluster, for instance members of the same family or patients from the same hospital, will receive the same treatment. This type of trial is very useful in real world settings where individual randomisation to treatments is not possible or the intervention is naturally applied to a whole cluster. However, this type of trial requires the use of specific analysis methods to account for the similarity in the response to treatments among members of the same cluster. Moreover, cluster randomised trials are often prone to bias, which can be corrected at the analysis stage with the use of additional information about the individuals and clusters included in the trial. This additional information can also be used to improve the precision of the trial, and to identify the individuals who will most benefit from the intervention being tested. Therefore, it is crucial to select the important variables and use appropriate statistical methods accounting for these variables to obtain an accurate estimation of the treatment efficacy and safety. However, the best way to do so remains unknown, especially in an era where there is a large amount of medical information available. In this fellowship I will draw on the emerging field of machine learning to address these methodological challenges and to determine how routinely collected medical data can be best used to improve the design of CRTs. I will use several existing trial datasets to achieve this, including CRTs in the fields of pharmacy and psychiatry, as well as data from the England Cancer Registry. Drawing on these multiple datasets but also using mathematical developments and computer-based simulation studies, I will develop and evaluate methods to improve the generalisability of CRTs, their precision, and bring us one step closer to a more personalized medicine approach for patients.

Technical Summary

Pragmatic trials emerged in response to concerns that clinical trials, traditionally designed to assess efficacy, were failing to inform clinical practice. Assessing benefits and harms in highly selected patient populations under well-trained experienced clinical teams can lead to over-estimation of benefit and underestimation of harm in practice. Pragmatic trials, conversely, aim to assess benefits, harms, and cost-effectiveness in real-world settings, and to identify subgroups for whom the intervention is most effective. Cluster Randomised Trials (CRTs) have emerged as an important design for pragmatic trials. However, there are a number of methodological challenges that must be addressed in order for CRTs to fulfil their promise, including enhancing their generalizability, reducing bias, improving precision, and the identification of individualized intervention effects. In this fellowship I will gain theoretical and practical training in the emerging area of machine learning (ML) to address the above challenges. To achieve this, I will:
1. Develop a framework to emulate CRTs from observational studies to evaluate the effect of cluster-level interventions and compare parametric and non-parametric methods, including ML approaches, to estimate inverse-probability-weights to address confounding;
2. Provide practical guidance for researchers on how to exploit ML to improve the selection and adjustment for covariates in CRTs, in order to both increase precision by reducing the intra-cluster correlation and to minimise bias by adjusting for confounding;
3. Propose and evaluate ML methods to study heterogeneity in intervention effects in CRTs and estimate subgroup effects while maintaining valid confidence intervals.
These objectives will be achieved by developing, extending and assessing ML methods using a mixture of theory, simulation studies and applications to case studies, including CRTs in pharmacy and psychiatry and observational data from Cancer Registries.

Planned Impact

During this fellowship, I will develop novel statistical methodology using a variety of methods, including machine learning (ML) to enhance the design and analysis of cluster randomised trials (CRTs). This research will focus in particular on reducing bias and increasing precision of CRTs, identifying subgroup of patients benefitting the most from the intervention and using external routinely collected data to better inform the design of CRTs.

Immediate beneficiaries of the project include researchers involved in planning and running CRTs, as well as statisticians with an interest in the methodology of CRTs. This research will provide tools and information which will enable researchers to harness the potential of ML to improve the efficiency of their trials.

Patients would also be major beneficiaries of this research. Designing more efficient CRTs and gaining statistical power with the use of ML algorithms can lead to trials conducted on smaller sample sizes, meaning that fewer patients would be exposed to potentially harmful interventions. Furthermore, smaller trials are usually shorter, which would reduce the delay between the start of the CRT and the moment the treatment or intervention studied is provided to the patients in practice. As such, patients would be treated more quickly. Smaller and shorter trials are also less costly, which will also be beneficial to trial funders, such as research councils and charities. My third objective, which focuses on the identification of subgroup effects, will be a step towards a more personalised medicine. Therefore, my research could allow patients to receive the optimal treatment based on their individual characteristics.

Through the development of a new framework for the emulation of CRTs from observational data, this work will take advantage of the wealth of medical information routinely collected to evaluate the effect of interventions in real-life settings, making use of existing and usually large data rather than collecting new information. This will allow researchers to obtain accurate results in a more timely manner but also replace expensive pilot CRTs usually conducted before the start of a larger-scale CRT.

Publications

10 25 50

publication icon
Besançon L (2021) Open science saves lives: lessons from the COVID-19 pandemic. in BMC medical research methodology

publication icon
Billot L (2024) How should a cluster randomized trial be analyzed? in Journal of Epidemiology and Population Health

 
Description Member of the DSMC of a stepped-wedge cluster trial
Geographic Reach Europe 
Policy Influence Type Participation in a guidance/advisory committee
URL https://clinicaltrials.gov/ct2/show/NCT03892148
 
Description ROBEST: Ensuring robustness of evidence in public health research for increased policy impact: widened use of advanced causal inference techniques
Amount £420,279 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2022 
End 03/2025
 
Description Collaboration - Dr Sophie Pilleron 
Organisation Luxembourg Institute of Health
Country Luxembourg 
Sector Academic/University 
PI Contribution I contributed to the computing part of the project and performed a simulation study to evaluate the impact of immortal-time bias among older and younger cancer patients.
Collaborator Contribution The partners came up with the concept of the study, the hypotheses as well as the data used for the analysis.
Impact Pilleron S, Maringe C, Morris EJA, Leyrat C. Immortal-time bias in older vs younger age groups: a simulation study with application to a population-based cohort of patients with colon cancer. Br J Cancer. 2023 Feb 9. doi: 10.1038/s41416-023-02187-0. Multi-disciplinary collaboration involving epidemiologist (SP and EJAM)
Start Year 2021
 
Title R package MatchThem 
Description This R package aims to facilitate the use of multiple imputation in propensity score matched analyses. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Not known yet. 
URL https://cran.r-project.org/web/packages/MatchThem/index.html
 
Description ISCB conference presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Following presentation at an international biostatistics conference:

LEYRAT C, DIAZORDAZ K, WILLIAMSON E. Covariate adjustment in randomised trials: when and how? 41th Annual Conference of the International Society for Clinical Biostatistics. August 2020, virtual conference.
Year(s) Of Engagement Activity 2020
URL https://iscb2020.info/
 
Description Live discussion on Open Science 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Participation in the live event "Open Science saves lives" on the channel LeGrandLabo on 29 September 2020.

This one hour discussion focussed on the need for better research practices (following the Open Science principles) during the COVID-19 pandemic.
Year(s) Of Engagement Activity 2020
URL https://www.youtube.com/watch?v=lFtsB-9E5EU
 
Description Presentation at Hopital st Louis 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation on emulated trial (around 20 attendants) followed by constructive discussion on next steps and potential collaborations
Year(s) Of Engagement Activity 2022
 
Description Webinar QuanTIM 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This webinar focused on emulated trials for observational studies and presented some of the work conducted during my fellowship. This led to my invitation to be a keynote speaker in an upcoming conference of the French Society for Pharmacology.
Year(s) Of Engagement Activity 2022
URL https://sesstim.univ-amu.fr/fr/content/webinar-quantim-clemence-leyrat
 
Description Website on cluster randomised trials 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Development and maintenance of a website dedicated to cluster randomised trials, hosted by QMUL, London:
https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/

The aim is to help clinicians, trialists and statisticians design high quality cluster randomised trials.
Year(s) Of Engagement Activity 2020,2021
URL https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/