Enhancing the design and analysis of cluster randomised trials using machine learning
Lead Research Organisation:
London School of Hygiene & Tropical Medicine
Department Name: Epidemiology and Population Health
Abstract
Trials are important to evaluate the safety and efficacy of new treatments or interventions. One type of trial, called a cluster randomised trial (CRT) uses pre-existing groups of individuals - known as clusters - who are randomly allocated to different treatments. It means that every member of the same cluster, for instance members of the same family or patients from the same hospital, will receive the same treatment. This type of trial is very useful in real world settings where individual randomisation to treatments is not possible or the intervention is naturally applied to a whole cluster. However, this type of trial requires the use of specific analysis methods to account for the similarity in the response to treatments among members of the same cluster. Moreover, cluster randomised trials are often prone to bias, which can be corrected at the analysis stage with the use of additional information about the individuals and clusters included in the trial. This additional information can also be used to improve the precision of the trial, and to identify the individuals who will most benefit from the intervention being tested. Therefore, it is crucial to select the important variables and use appropriate statistical methods accounting for these variables to obtain an accurate estimation of the treatment efficacy and safety. However, the best way to do so remains unknown, especially in an era where there is a large amount of medical information available. In this fellowship I will draw on the emerging field of machine learning to address these methodological challenges and to determine how routinely collected medical data can be best used to improve the design of CRTs. I will use several existing trial datasets to achieve this, including CRTs in the fields of pharmacy and psychiatry, as well as data from the England Cancer Registry. Drawing on these multiple datasets but also using mathematical developments and computer-based simulation studies, I will develop and evaluate methods to improve the generalisability of CRTs, their precision, and bring us one step closer to a more personalized medicine approach for patients.
Technical Summary
Pragmatic trials emerged in response to concerns that clinical trials, traditionally designed to assess efficacy, were failing to inform clinical practice. Assessing benefits and harms in highly selected patient populations under well-trained experienced clinical teams can lead to over-estimation of benefit and underestimation of harm in practice. Pragmatic trials, conversely, aim to assess benefits, harms, and cost-effectiveness in real-world settings, and to identify subgroups for whom the intervention is most effective. Cluster Randomised Trials (CRTs) have emerged as an important design for pragmatic trials. However, there are a number of methodological challenges that must be addressed in order for CRTs to fulfil their promise, including enhancing their generalizability, reducing bias, improving precision, and the identification of individualized intervention effects. In this fellowship I will gain theoretical and practical training in the emerging area of machine learning (ML) to address the above challenges. To achieve this, I will:
1. Develop a framework to emulate CRTs from observational studies to evaluate the effect of cluster-level interventions and compare parametric and non-parametric methods, including ML approaches, to estimate inverse-probability-weights to address confounding;
2. Provide practical guidance for researchers on how to exploit ML to improve the selection and adjustment for covariates in CRTs, in order to both increase precision by reducing the intra-cluster correlation and to minimise bias by adjusting for confounding;
3. Propose and evaluate ML methods to study heterogeneity in intervention effects in CRTs and estimate subgroup effects while maintaining valid confidence intervals.
These objectives will be achieved by developing, extending and assessing ML methods using a mixture of theory, simulation studies and applications to case studies, including CRTs in pharmacy and psychiatry and observational data from Cancer Registries.
1. Develop a framework to emulate CRTs from observational studies to evaluate the effect of cluster-level interventions and compare parametric and non-parametric methods, including ML approaches, to estimate inverse-probability-weights to address confounding;
2. Provide practical guidance for researchers on how to exploit ML to improve the selection and adjustment for covariates in CRTs, in order to both increase precision by reducing the intra-cluster correlation and to minimise bias by adjusting for confounding;
3. Propose and evaluate ML methods to study heterogeneity in intervention effects in CRTs and estimate subgroup effects while maintaining valid confidence intervals.
These objectives will be achieved by developing, extending and assessing ML methods using a mixture of theory, simulation studies and applications to case studies, including CRTs in pharmacy and psychiatry and observational data from Cancer Registries.
Planned Impact
During this fellowship, I will develop novel statistical methodology using a variety of methods, including machine learning (ML) to enhance the design and analysis of cluster randomised trials (CRTs). This research will focus in particular on reducing bias and increasing precision of CRTs, identifying subgroup of patients benefitting the most from the intervention and using external routinely collected data to better inform the design of CRTs.
Immediate beneficiaries of the project include researchers involved in planning and running CRTs, as well as statisticians with an interest in the methodology of CRTs. This research will provide tools and information which will enable researchers to harness the potential of ML to improve the efficiency of their trials.
Patients would also be major beneficiaries of this research. Designing more efficient CRTs and gaining statistical power with the use of ML algorithms can lead to trials conducted on smaller sample sizes, meaning that fewer patients would be exposed to potentially harmful interventions. Furthermore, smaller trials are usually shorter, which would reduce the delay between the start of the CRT and the moment the treatment or intervention studied is provided to the patients in practice. As such, patients would be treated more quickly. Smaller and shorter trials are also less costly, which will also be beneficial to trial funders, such as research councils and charities. My third objective, which focuses on the identification of subgroup effects, will be a step towards a more personalised medicine. Therefore, my research could allow patients to receive the optimal treatment based on their individual characteristics.
Through the development of a new framework for the emulation of CRTs from observational data, this work will take advantage of the wealth of medical information routinely collected to evaluate the effect of interventions in real-life settings, making use of existing and usually large data rather than collecting new information. This will allow researchers to obtain accurate results in a more timely manner but also replace expensive pilot CRTs usually conducted before the start of a larger-scale CRT.
Immediate beneficiaries of the project include researchers involved in planning and running CRTs, as well as statisticians with an interest in the methodology of CRTs. This research will provide tools and information which will enable researchers to harness the potential of ML to improve the efficiency of their trials.
Patients would also be major beneficiaries of this research. Designing more efficient CRTs and gaining statistical power with the use of ML algorithms can lead to trials conducted on smaller sample sizes, meaning that fewer patients would be exposed to potentially harmful interventions. Furthermore, smaller trials are usually shorter, which would reduce the delay between the start of the CRT and the moment the treatment or intervention studied is provided to the patients in practice. As such, patients would be treated more quickly. Smaller and shorter trials are also less costly, which will also be beneficial to trial funders, such as research councils and charities. My third objective, which focuses on the identification of subgroup effects, will be a step towards a more personalised medicine. Therefore, my research could allow patients to receive the optimal treatment based on their individual characteristics.
Through the development of a new framework for the emulation of CRTs from observational data, this work will take advantage of the wealth of medical information routinely collected to evaluate the effect of interventions in real-life settings, making use of existing and usually large data rather than collecting new information. This will allow researchers to obtain accurate results in a more timely manner but also replace expensive pilot CRTs usually conducted before the start of a larger-scale CRT.
People |
ORCID iD |
Clemence Leyrat (Principal Investigator / Fellow) |
Publications
Besançon L
(2020)
Open Science Saves Lives: Lessons from the COVID-19 Pandemic
Besançon L
(2021)
Open science saves lives: lessons from the COVID-19 pandemic.
in BMC medical research methodology
Bettega F
(2022)
Application of Inverse-Probability-of-Treatment Weighting to Estimate the Effect of Daytime Sleepiness in Patients with Obstructive Sleep Apnea
in Annals of the American Thoracic Society
Bhate K
(2023)
Long-term oral antibiotic use in people with acne vulgaris in UK primary care: a drug utilization study.
in The British journal of dermatology
Billot L
(2024)
How should a cluster randomized trial be analyzed?
in Journal of Epidemiology and Population Health
Blake H
(2020)
Estimating treatment effects with partially observed covariates using outcome regression with missing indicators
in Biometrical Journal
Brown JP
(2023)
Association Between Fluoroquinolone Use and Hospitalization With Aortic Aneurysm or Aortic Dissection.
in JAMA cardiology
Description | Member of the DSMC of a stepped-wedge cluster trial |
Geographic Reach | Europe |
Policy Influence Type | Participation in a guidance/advisory committee |
URL | https://clinicaltrials.gov/ct2/show/NCT03892148 |
Description | ROBEST: Ensuring robustness of evidence in public health research for increased policy impact: widened use of advanced causal inference techniques |
Amount | £420,279 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2022 |
End | 03/2025 |
Description | Collaboration - Dr Sophie Pilleron |
Organisation | Luxembourg Institute of Health |
Country | Luxembourg |
Sector | Academic/University |
PI Contribution | I contributed to the computing part of the project and performed a simulation study to evaluate the impact of immortal-time bias among older and younger cancer patients. |
Collaborator Contribution | The partners came up with the concept of the study, the hypotheses as well as the data used for the analysis. |
Impact | Pilleron S, Maringe C, Morris EJA, Leyrat C. Immortal-time bias in older vs younger age groups: a simulation study with application to a population-based cohort of patients with colon cancer. Br J Cancer. 2023 Feb 9. doi: 10.1038/s41416-023-02187-0. Multi-disciplinary collaboration involving epidemiologist (SP and EJAM) |
Start Year | 2021 |
Title | R package MatchThem |
Description | This R package aims to facilitate the use of multiple imputation in propensity score matched analyses. |
Type Of Technology | Webtool/Application |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | Not known yet. |
URL | https://cran.r-project.org/web/packages/MatchThem/index.html |
Description | ISCB conference presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Following presentation at an international biostatistics conference: LEYRAT C, DIAZORDAZ K, WILLIAMSON E. Covariate adjustment in randomised trials: when and how? 41th Annual Conference of the International Society for Clinical Biostatistics. August 2020, virtual conference. |
Year(s) Of Engagement Activity | 2020 |
URL | https://iscb2020.info/ |
Description | Live discussion on Open Science |
Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Participation in the live event "Open Science saves lives" on the channel LeGrandLabo on 29 September 2020. This one hour discussion focussed on the need for better research practices (following the Open Science principles) during the COVID-19 pandemic. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.youtube.com/watch?v=lFtsB-9E5EU |
Description | Presentation at Hopital st Louis |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation on emulated trial (around 20 attendants) followed by constructive discussion on next steps and potential collaborations |
Year(s) Of Engagement Activity | 2022 |
Description | Webinar QuanTIM |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This webinar focused on emulated trials for observational studies and presented some of the work conducted during my fellowship. This led to my invitation to be a keynote speaker in an upcoming conference of the French Society for Pharmacology. |
Year(s) Of Engagement Activity | 2022 |
URL | https://sesstim.univ-amu.fr/fr/content/webinar-quantim-clemence-leyrat |
Description | Website on cluster randomised trials |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Development and maintenance of a website dedicated to cluster randomised trials, hosted by QMUL, London: https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/ The aim is to help clinicians, trialists and statisticians design high quality cluster randomised trials. |
Year(s) Of Engagement Activity | 2020,2021 |
URL | https://clusterrandomisedtrials.qmul.ac.uk/who-we-are/ |