Geostatistical design and analysis of randomised evaluations with a geographic basis

Lead Research Organisation: University of Birmingham

Department Name: Institute of Applied Health Research

Abstract

Studies that randomly allocate individual people to receive a treatment or an alternative comparator allow us to estimate what does and does not happen when patients receive a treatment and hence estimate its effect. The success of the randomised study has led to the development of studies that instead randomise groups of people, or "clusters", such as villages, classrooms of children, or residents of nursing homes, to receive an intervention together. Cluster-based studies are useful as in many contexts individuals in a cluster will likely be similar and interact with one another. An intervention applied to one individual in a cluster could have indirect effects on other members of that cluster, which would undermine studies that randomise individuals, but not cluster-based ones.

Many randomised studies observe study participants or clusters at multiple points in time, perhaps before and after an intervention is applied. In the statistical literature, there has been a lot of analysis about how to deal with how the data we capture changes over time - things are likely to be less similar the further apart in time they're measured, for example. Capturing the effects of time is important to making sure our studies are designed well and analysed properly. However, for randomised studies there has been little analysis about how to deal with data varying over space - the closer things are the more similar they are likely to be - and so there is little guidance on the best design when this is likely to matter.

This project will consider how to design and analyse studies where a "cluster" is created based on where people live, typically by including people close to a possible intervention location. An example would be a study of the effect of installing new wells in a city in a low-income country and including people who live close to possible well locations in each cluster. In these studies, space matters. Measurements of outcomes from people who live near to one another are likely to be more similar than if they lived far apart as, for example, people can spread infectious disease to one another. However, we normally assume that it does not matter how far apart the people in a cluster are from one another nor how far from the intervention they are. While this approach does not necessarily lead to errors in the estimates of an intervention's effects, it can mean we are less precise than we need to be, requiring larger, more expensive studies. It also means we do not learn about how the effect of an intervention changes over space, an important consideration if we want to roll-out the intervention in the real-world.

We will adapt methods from the field of geospatial statistics to develop methods for the spatial design and analysis of cluster trials. Explicitly accounting for space also opens up the door to a novel type of randomised study in which, instead of randomly assigning patients or clusters to receive an intervention, we randomly choose a location for an intervention. We call this a "spatial trial" and it has potential benefits for evaluating how well interventions work in places where natural clusters do not exist. For example, if a city were rolling-out new wells across the city to numerous locations.

Our work is primarily statistical and consists of analysing how different statistical models work in a randomised study design. To enable the use of the new methods we will produce software that will run in standard statistical packages and provide detailed documentation and examples that we will make available online. We see particular benefit for "implementation science" research, which aims to study what happens with "real-world" interventions. Our work will aid in designing ways these interventions can be rolled out so that their effects can be reliably measured. However, any academic field that designs studies of interventions over an area will benefit, including agriculture, economics, and ecology.

Technical Summary

This project will develop methods for the design and analysis of randomised trials with a geographical basis - studies that include individuals on the basis of their location and where an intervention has a specific location. Cluster randomised trials are the most common method of evaluations for these studies. Spatial effects are typically ignored in these designs: neither spatial correlation nor spatially varying intervention effects are considered. This can lead to inefficient cluster design, treatment effects that vary arbitrarily with cluster design, and the inefficient sampling of participants, among other problems. We will develop and evaluate methods for the design and analysis of cluster trials that explicitly account for spatial effects and develop and assess a novel study design in which intervention locations are randomised across a continuous area.

WP1 will adapt geostatistical models to the cluster randomised trial and develop methods for treatment effect estimation, power and sample size analysis, cluster design, and approaches to contamination. We will evaluate the methods using simulated data.

WP2 will extend the methods of WP1 to develop the "spatial trial" design in which interventions are randomly located or randomised to fixed locations or to a specific sequence of locations over time. We will document the different variations of spatial trial design and examine power and sample size, treatment effect estimators, restricted randomisation schemes, and efficient sampling methods. We will evaluate the methods using simulated data.

WP3 will develop a software package implementing the methods with accompanying documentation for public release.

WP4 will re-analyse a cluster trial of a sanitation intervention in urban and rural areas. We will also work with academic and implementation partners to provide training in these methods and develop proposals for trials using these methods.

Funded Value:

£484,706

Funded Period:

Aug 21 - Aug 24

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/V038591/1

Principal Investigator:

Samuel Watson

Health Category:

Unclassified

Organisations

People	ORCID iD
Samuel Watson (Principal Investigator)
Karla Hemming (Co-Investigator)	http://orcid.org/0000-0002-2226-6550
Richard Lilford (Co-Investigator)	http://orcid.org/0000-0002-0634-984X
Semira Manaseki-Holland (Co-Investigator)
Peter Diggle (Co-Investigator)	http://orcid.org/0000-0003-3521-5020

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Hemming K (2025) Guidelines for the content of statistical analysis plans in clinical trials: protocol for an extension to cluster randomized trials. in Trials

I. Watson S (2021) Efficient design of geographically-defined clusters with spatial autocorrelation in Journal of Applied Statistics

Lilford R (2021) Methodological issues in economic evaluations of emergency transport systems in low-income and middle-income countries in BMJ Global Health

Napit IB (2021) An individual randomised efficacy trial of autologous blood products, leukocyte and platelet-rich fibrin (L-PRF), to promote ulcer healing in leprosy in Nepal: the TABLE trial protocol. in Trials

Rego R (2021) The Impact of Diarrhoea Measurement Methods for Under-Fives in Low and Middle Income Countries on Reported Diarrhoea Rates: A Systematic Review and Meta-Analysis of Methodological and Primary Empirical Studies in SSRN Electronic Journal

Shrestha D (2021) Evaluation of a self-help intervention to promote the health and wellbeing of marginalised people including those living with leprosy in Nepal: a prospective, observational, cluster-based, cohort study with controls. in BMC public health

Thompson J (2025) Estimating relative risks and risk differences in randomised controlled trials: a systematic review of current practice. in Trials

Watson S (2022) Low cost and real-time surveillance of enteric infection and diarrhoeal disease using rapid diagnostic tests: A pilot study

Watson S (2022) Evaluation of Combinatorial Optimisation Algorithms for c-Optimal Experimental Designs with Correlated Observations

Watson S (2024) Additional file 1 of Modelling wound area in studies of wound healing interventions

Collaboration
Software and Technical Products


Description	D-STRESS
Organisation	King's College London
Department	King's Centre for Global Health
Country	United Kingdom
Sector	Academic/University
PI Contribution	I am the lead statistician on this NIHR funded project, which is a cluster randomised trial of a mental health intervention for people with diabetes. Our proposed trial design and analysis uses state-of-the-art, rigorous methods, which have been developed and refined by our research team.
Collaborator Contribution	Our collaborators provide the clinical and related non-statistical expertise to the partnership.
Impact	The funding has only just started so no outputs are available to report yet. It is multi-disciplinary including experts in endocrinology, qualitative and mixed research methods, and mental health.
Start Year	2023


Title	clustertrial.app
Description	clustertrial.app is an online, standalone application that provides a suite of design and analysis tools for cluster randomised trials. Prior to this application, access to new methods to estimate power, sample size, or optimal designs for cluster randomised trials were limited to a large number of "Shiny" apps, each with a narrow aim. The new software unifies all the previous methods and includes analyses including small sample corrections, optimal design algorithms, and new tools not available elsewhere.
Type Of Technology	Webtool/Application
Year Produced	2023
Open Source License?	Yes
Impact	The software is available online to all researchers and ensures easy access to methods for the accurate estimation of sample size and power.
URL	http://www.clustertrial.app


Title	crctStepdown
Description	This is an R package that provides permutation tests with and without multiple test corrections for generalised linear mixed models and generalised linear models.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This package provides a set of novel methods including permutation test based confidence intervals and multiple testing corrections for generalised linear mixed models.
URL	https://cran.r-project.org/web/packages/crctStepdown/index.html


Title	glmmrBase
Description	glmmrBase is a package for the R statistical programming language, which is available on CRAN. The package provides a suite of tools for the specification, analysis, and fitting of generalised linear mixed models.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	The software provides the first general implementation of Markov Chain Monte Carlo Maximum Likelihood model fitting for generalised linear mixed models as well as unifying a range of tools for design analysis for a broad range of models.
URL	https://cran.r-project.org/web/packages/glmmrBase/index.html


Title	glmmrOptim
Description	glmmrOptim is an R package that builds on glmmrBase, and that implements a wide range of algorithms to identify c-optimal experimental designs for generalised linear mixed models.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	This is the first package to implement algorithms to identify optimal experimental designs for studies with correlation between experimental units as well as other tools.
URL	https://cran.r-project.org/web/packages/glmmrOptim/index.html


Title	rbobyqa
Description	This is a package for R that provides C++ headers that implement the BOBYQA algorithm for derivative free function minimisation.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Used in a wide variety of other packages involving model fitting.
URL	https://cran.r-project.org/web/packages/rbobyqa/index.html


Title	rts2
Description	There are a wide variety of sources of case data that are finely spatially-resolved and time stamped to monitor disease epidemiology in real-time. These data may include daily hospital admissions, positive tests, or calls to public health telephone services. Geospatial statistical models provide a principled basis for generating predictions of disease epidemiology and its spatial and temporal distribution from these data. rts2 provides a set of functions to conduct disease surveillance using a real-time feed of spatial or spatio-temporal data of cases. The package includes Bayesian and maximum likelihood methods resulting in 20 different approaches to estimating the model. It also provides new functionality for spatially aggregated data. The software is now available on CRAN as version 0.7.2.
Type Of Technology	Webtool/Application
Year Produced	2020
Open Source License?	Yes
Impact	The software is currently being used by local authorities in the midlands and North East to support disease surveillance platforms for Covid-19. We are discussing with local and national partners in Sub-Saharan Africa (Kenya, Nigeria, and Malawi) and South Asia (Bangladesh) about incorporation of the software into local disease surveillance platforms for malaria and other conditions. The software has been used by our team for the analysis of Covid-19 data in Birmingham and to support other researchers looking into these data. We have used the software to analyse malnutrition outcomes in Birmingham for some forthcoming research. These tools also form the basis for a proposed project to map leprosy incidence in India.
URL	https://cran.r-project.org/web/packages/rts2/index.html


Title	sparseChol
Description	This is an R package that provides a C++ implementation of algorithms to generate the Cholesky decomposition for sparse matrices.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	This package supports sparse matrix methods in several other R packages.
URL	https://cran.r-project.org/web/packages/sparseChol/index.html

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications