Enhancing the design and analysis of cluster randomised trials using machine learning

Lead Research Organisation: London School of Hygiene and Tropical Medicine

Department Name: Epidemiology and Population Health

Abstract

Trials are important to evaluate the safety and efficacy of new treatments or interventions. One type of trial, called a cluster randomised trial (CRT) uses pre-existing groups of individuals - known as clusters - who are randomly allocated to different treatments. It means that every member of the same cluster, for instance members of the same family or patients from the same hospital, will receive the same treatment. This type of trial is very useful in real world settings where individual randomisation to treatments is not possible or the intervention is naturally applied to a whole cluster. However, this type of trial requires the use of specific analysis methods to account for the similarity in the response to treatments among members of the same cluster. Moreover, cluster randomised trials are often prone to bias, which can be corrected at the analysis stage with the use of additional information about the individuals and clusters included in the trial. This additional information can also be used to improve the precision of the trial, and to identify the individuals who will most benefit from the intervention being tested. Therefore, it is crucial to select the important variables and use appropriate statistical methods accounting for these variables to obtain an accurate estimation of the treatment efficacy and safety. However, the best way to do so remains unknown, especially in an era where there is a large amount of medical information available. In this fellowship I will draw on the emerging field of machine learning to address these methodological challenges and to determine how routinely collected medical data can be best used to improve the design of CRTs. I will use several existing trial datasets to achieve this, including CRTs in the fields of pharmacy and psychiatry, as well as data from the England Cancer Registry. Drawing on these multiple datasets but also using mathematical developments and computer-based simulation studies, I will develop and evaluate methods to improve the generalisability of CRTs, their precision, and bring us one step closer to a more personalized medicine approach for patients.

Technical Summary

Pragmatic trials emerged in response to concerns that clinical trials, traditionally designed to assess efficacy, were failing to inform clinical practice. Assessing benefits and harms in highly selected patient populations under well-trained experienced clinical teams can lead to over-estimation of benefit and underestimation of harm in practice. Pragmatic trials, conversely, aim to assess benefits, harms, and cost-effectiveness in real-world settings, and to identify subgroups for whom the intervention is most effective. Cluster Randomised Trials (CRTs) have emerged as an important design for pragmatic trials. However, there are a number of methodological challenges that must be addressed in order for CRTs to fulfil their promise, including enhancing their generalizability, reducing bias, improving precision, and the identification of individualized intervention effects. In this fellowship I will gain theoretical and practical training in the emerging area of machine learning (ML) to address the above challenges. To achieve this, I will:
1. Develop a framework to emulate CRTs from observational studies to evaluate the effect of cluster-level interventions and compare parametric and non-parametric methods, including ML approaches, to estimate inverse-probability-weights to address confounding;
2. Provide practical guidance for researchers on how to exploit ML to improve the selection and adjustment for covariates in CRTs, in order to both increase precision by reducing the intra-cluster correlation and to minimise bias by adjusting for confounding;
3. Propose and evaluate ML methods to study heterogeneity in intervention effects in CRTs and estimate subgroup effects while maintaining valid confidence intervals.
These objectives will be achieved by developing, extending and assessing ML methods using a mixture of theory, simulation studies and applications to case studies, including CRTs in pharmacy and psychiatry and observational data from Cancer Registries.

Planned Impact

During this fellowship, I will develop novel statistical methodology using a variety of methods, including machine learning (ML) to enhance the design and analysis of cluster randomised trials (CRTs). This research will focus in particular on reducing bias and increasing precision of CRTs, identifying subgroup of patients benefitting the most from the intervention and using external routinely collected data to better inform the design of CRTs.

Immediate beneficiaries of the project include researchers involved in planning and running CRTs, as well as statisticians with an interest in the methodology of CRTs. This research will provide tools and information which will enable researchers to harness the potential of ML to improve the efficiency of their trials.

Patients would also be major beneficiaries of this research. Designing more efficient CRTs and gaining statistical power with the use of ML algorithms can lead to trials conducted on smaller sample sizes, meaning that fewer patients would be exposed to potentially harmful interventions. Furthermore, smaller trials are usually shorter, which would reduce the delay between the start of the CRT and the moment the treatment or intervention studied is provided to the patients in practice. As such, patients would be treated more quickly. Smaller and shorter trials are also less costly, which will also be beneficial to trial funders, such as research councils and charities. My third objective, which focuses on the identification of subgroup effects, will be a step towards a more personalised medicine. Therefore, my research could allow patients to receive the optimal treatment based on their individual characteristics.

Through the development of a new framework for the emulation of CRTs from observational data, this work will take advantage of the wealth of medical information routinely collected to evaluate the effect of interventions in real-life settings, making use of existing and usually large data rather than collecting new information. This will allow researchers to obtain accurate results in a more timely manner but also replace expensive pilot CRTs usually conducted before the start of a larger-scale CRT.

Funded Value:

£301,682

Funded Period:

Mar 20 - Mar 24

Funder:

MRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

MR/T032448/1

Principal Investigator:

Clemence Leyrat

Health Category:

Unclassified

Organisations

People	ORCID iD
Clemence Leyrat (Principal Investigator / Fellow)	http://orcid.org/0000-0002-4097-4577

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 > >|

10 25 50

Baptiste PJ (2022) Effects of ACE inhibitors and angiotensin receptor blockers: protocol for a UK cohort study using routinely collected electronic health records with validation against the ONTARGET trial. in BMJ open

Baptiste PJ (2024) Effectiveness and risk of ARB and ACEi among different ethnic groups in England: A reference trial (ONTARGET) emulation analysis using UK Clinical Practice Research Datalink Aurum-linked data. in PLoS medicine

Besançon L (2020) Open Science Saves Lives: Lessons from the COVID-19 Pandemic

Besançon L (2021) Open science saves lives: lessons from the COVID-19 pandemic. in BMC medical research methodology

Besançon L (2021) Open science saves lives: lessons from the COVID-19 pandemic

Bettega F (2022) Application of Inverse-Probability-of-Treatment Weighting to Estimate the Effect of Daytime Sleepiness in Patients with Obstructive Sleep Apnea. in Annals of the American Thoracic Society

Bettega F (2024) Use and reporting of inverse-probability-of-treatment weighting for multicategory treatments in medical research: a systematic review. in Journal of clinical epidemiology

Bhate K (2023) Long-term oral antibiotic use in people with acne vulgaris in UK primary care: a drug utilization study. in The British journal of dermatology

Bhate K (2021) Is there an association between long-term antibiotics for acne and subsequent infection sequelae and antimicrobial resistance? A systematic review. in BJGP open

Bidulka P (2020) Stopping renin-angiotensin system blockers after acute kidney injury and risk of adverse outcomes: parallel population-based cohort studies in English and Swedish routine care. in BMC medicine

Policy Influence
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	Giens Workshop
Geographic Reach	Europe
Policy Influence Type	Participation in a guidance/advisory committee
URL	https://www.ateliersdegiens.org/


Description	Member of the DSMC of a stepped-wedge cluster trial
Geographic Reach	Europe
Policy Influence Type	Participation in a guidance/advisory committee
URL	https://clinicaltrials.gov/ct2/show/NCT03892148


Description	Policy document on Covid19 and science
Geographic Reach	Multiple continents/international
Policy Influence Type	Citation in other policy documents
Impact	Our paper provides recommendations and options for policy action to improve the resilience of national science systems


Description	ROBEST: Ensuring robustness of evidence in public health research for increased policy impact: widened use of advanced causal inference techniques
Amount	£420,279 (GBP)
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	03/2022
End	03/2025


Description	Collaboration - Dr Sophie Pilleron
Organisation	Luxembourg Institute of Health
Country	Luxembourg
Sector	Academic/University
PI Contribution	I contributed to the computing part of the project and performed a simulation study to evaluate the impact of immortal-time bias among older and younger cancer patients.
Collaborator Contribution	The partners came up with the concept of the study, the hypotheses as well as the data used for the analysis.
Impact	Pilleron S, Maringe C, Morris EJA, Leyrat C. Immortal-time bias in older vs younger age groups: a simulation study with application to a population-based cohort of patients with colon cancer. Br J Cancer. 2023 Feb 9. doi: 10.1038/s41416-023-02187-0. Multi-disciplinary collaboration involving epidemiologist (SP and EJAM)
Start Year	2021


Description	Collaboration - Duplicate^2
Organisation	University of Grenoble
Country	France
Sector	Academic/University
PI Contribution	This new collaboration aims to study the feasibility and reproducibility of target trial emulation (including cluster trial emulation) from the French SNDS data. I will provide methodological advice informed by my fellowship research.
Collaborator Contribution	The partners are leading this working group and provide the resources to conduct the research.
Impact	No output yet but grant application in preparation
Start Year	2023


Description	Collaboration - SAP for CRTs
Organisation	University of Birmingham
Country	United Kingdom
Sector	Academic/University
PI Contribution	Through this collaboration with multiple partners, I have contributed to the developement of new guidelines for the statistical analysis plan of cluster randomised trials
Collaborator Contribution	The main collaborators drafted the original guidelines and organised a series of DELPHIs and consensus meetings to come up to the final version
Impact	- Publication of the protocol: "Guidelines for the Content of Statistical Analysis Plans in Clinical Trials: Protocol for an Extension to Cluster Randomized Trials". Karla Hemming; Jacqueline Y Thompson; Richard L Hooper; Obioha C Ukoumunne; Fan Li; Agnes Caille; Brennan C Kahan; Clemence Leyrat; Micheal J Graylin; Nuredin I Mohammed; Jennifer A Thompson; Bruno Giraudeau; Elizabeth L Turner; Samuel I Watson; Beatriz P Goulão; Jessica Kasza; Andrew B Forbes; Andrew J Copas; Monica Taljaard. 2025. Trials. In Press
Start Year	2023


Title	R package MatchThem
Description	This R package aims to facilitate the use of multiple imputation in propensity score matched analyses.
Type Of Technology	Webtool/Application
Year Produced	2020
Open Source License?	Yes
Impact	Not known yet.
URL	https://cran.r-project.org/web/packages/MatchThem/index.html


Description	Expert panel member - Observational data for drug evaluation
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Involvement in a panel of experts providing recommendations on how observationnal studies should be used by medicine agencies for drug licensing
Year(s) Of Engagement Activity	2021,2024


Description	INSERM workshop on cluster randomised trials
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Clémence Leyrat lead a session on cluster randomised trials at a professional workshop including around 40 participants, in Bordeaux June 2023
Year(s) Of Engagement Activity	2023
URL	https://ateliersinserm.dakini-pco.com/en/workshop.274.cluster.randomized.trials.and.within.person.ra...


Description	ISCB conference presentation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Following presentation at an international biostatistics conference: LEYRAT C, DIAZORDAZ K, WILLIAMSON E. Covariate adjustment in randomised trials: when and how? 41th Annual Conference of the International Society for Clinical Biostatistics. August 2020, virtual conference.
Year(s) Of Engagement Activity	2020
URL	https://iscb2020.info/


Description	ISCB presentation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation at the 45th Annual Conderence of the International Society for Clinical Biostatistics. Title: Emulation of target cluster trials of complex interventions: Estimands, methods and application
Year(s) Of Engagement Activity	2024


Description	Live discussion on Open Science
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Participation in the live event "Open Science saves lives" on the channel LeGrandLabo on 29 September 2020. This one hour discussion focussed on the need for better research practices (following the Open Science principles) during the COVID-19 pandemic.
Year(s) Of Engagement Activity	2020
URL	https://www.youtube.com/watch?v=lFtsB-9E5EU


Description	Poster - Cluster trial emulation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Poster presentation on Target Trial Emulation at the Cander Data Conference (CRUK) in Manchester, 27-28 February 2024
Year(s) Of Engagement Activity	2024


Description	Presentation at Hopital st Louis
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	Presentation on emulated trial (around 20 attendants) followed by constructive discussion on next steps and potential collaborations
Year(s) Of Engagement Activity	2022


Description	SIOG webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Webinar entitled: Making the most of observational data to estimate causal effects using target trial emulation: practical examples Audience: International Society of Geriatric Oncology
Year(s) Of Engagement Activity	2024


Description	Seminar Target Trial Emulation - INSERM SHERE, Tours, France
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	Invited presentation on Cluster Target Trial emulation to ~20 researchers and Phd students
Year(s) Of Engagement Activity	2024


Description	Webinar QuanTIM
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This webinar focused on emulated trials for observational studies and presented some of the work conducted during my fellowship. This led to my invitation to be a keynote speaker in an upcoming conference of the French Society for Pharmacology.
Year(s) Of Engagement Activity	2022
URL	https://sesstim.univ-amu.fr/fr/content/webinar-quantim-clemence-leyrat


Description	Website on cluster randomised trials
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Development and maintenance of a website dedicated to cluster randomised trials, hosted by QMUL, London: https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/ The aim is to help clinicians, trialists and statisticians design high quality cluster randomised trials.
Year(s) Of Engagement Activity	2020,2021
URL	https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/