Geostatistical design and analysis of randomised evaluations with a geographic basis
Lead Research Organisation:
University of Birmingham
Department Name: Institute of Applied Health Research
Abstract
Studies that randomly allocate individual people to receive a treatment or an alternative comparator allow us to estimate what does and does not happen when patients receive a treatment and hence estimate its effect. The success of the randomised study has led to the development of studies that instead randomise groups of people, or "clusters", such as villages, classrooms of children, or residents of nursing homes, to receive an intervention together. Cluster-based studies are useful as in many contexts individuals in a cluster will likely be similar and interact with one another. An intervention applied to one individual in a cluster could have indirect effects on other members of that cluster, which would undermine studies that randomise individuals, but not cluster-based ones.
Many randomised studies observe study participants or clusters at multiple points in time, perhaps before and after an intervention is applied. In the statistical literature, there has been a lot of analysis about how to deal with how the data we capture changes over time - things are likely to be less similar the further apart in time they're measured, for example. Capturing the effects of time is important to making sure our studies are designed well and analysed properly. However, for randomised studies there has been little analysis about how to deal with data varying over space - the closer things are the more similar they are likely to be - and so there is little guidance on the best design when this is likely to matter.
This project will consider how to design and analyse studies where a "cluster" is created based on where people live, typically by including people close to a possible intervention location. An example would be a study of the effect of installing new wells in a city in a low-income country and including people who live close to possible well locations in each cluster. In these studies, space matters. Measurements of outcomes from people who live near to one another are likely to be more similar than if they lived far apart as, for example, people can spread infectious disease to one another. However, we normally assume that it does not matter how far apart the people in a cluster are from one another nor how far from the intervention they are. While this approach does not necessarily lead to errors in the estimates of an intervention's effects, it can mean we are less precise than we need to be, requiring larger, more expensive studies. It also means we do not learn about how the effect of an intervention changes over space, an important consideration if we want to roll-out the intervention in the real-world.
We will adapt methods from the field of geospatial statistics to develop methods for the spatial design and analysis of cluster trials. Explicitly accounting for space also opens up the door to a novel type of randomised study in which, instead of randomly assigning patients or clusters to receive an intervention, we randomly choose a location for an intervention. We call this a "spatial trial" and it has potential benefits for evaluating how well interventions work in places where natural clusters do not exist. For example, if a city were rolling-out new wells across the city to numerous locations.
Our work is primarily statistical and consists of analysing how different statistical models work in a randomised study design. To enable the use of the new methods we will produce software that will run in standard statistical packages and provide detailed documentation and examples that we will make available online. We see particular benefit for "implementation science" research, which aims to study what happens with "real-world" interventions. Our work will aid in designing ways these interventions can be rolled out so that their effects can be reliably measured. However, any academic field that designs studies of interventions over an area will benefit, including agriculture, economics, and ecology.
Many randomised studies observe study participants or clusters at multiple points in time, perhaps before and after an intervention is applied. In the statistical literature, there has been a lot of analysis about how to deal with how the data we capture changes over time - things are likely to be less similar the further apart in time they're measured, for example. Capturing the effects of time is important to making sure our studies are designed well and analysed properly. However, for randomised studies there has been little analysis about how to deal with data varying over space - the closer things are the more similar they are likely to be - and so there is little guidance on the best design when this is likely to matter.
This project will consider how to design and analyse studies where a "cluster" is created based on where people live, typically by including people close to a possible intervention location. An example would be a study of the effect of installing new wells in a city in a low-income country and including people who live close to possible well locations in each cluster. In these studies, space matters. Measurements of outcomes from people who live near to one another are likely to be more similar than if they lived far apart as, for example, people can spread infectious disease to one another. However, we normally assume that it does not matter how far apart the people in a cluster are from one another nor how far from the intervention they are. While this approach does not necessarily lead to errors in the estimates of an intervention's effects, it can mean we are less precise than we need to be, requiring larger, more expensive studies. It also means we do not learn about how the effect of an intervention changes over space, an important consideration if we want to roll-out the intervention in the real-world.
We will adapt methods from the field of geospatial statistics to develop methods for the spatial design and analysis of cluster trials. Explicitly accounting for space also opens up the door to a novel type of randomised study in which, instead of randomly assigning patients or clusters to receive an intervention, we randomly choose a location for an intervention. We call this a "spatial trial" and it has potential benefits for evaluating how well interventions work in places where natural clusters do not exist. For example, if a city were rolling-out new wells across the city to numerous locations.
Our work is primarily statistical and consists of analysing how different statistical models work in a randomised study design. To enable the use of the new methods we will produce software that will run in standard statistical packages and provide detailed documentation and examples that we will make available online. We see particular benefit for "implementation science" research, which aims to study what happens with "real-world" interventions. Our work will aid in designing ways these interventions can be rolled out so that their effects can be reliably measured. However, any academic field that designs studies of interventions over an area will benefit, including agriculture, economics, and ecology.
Technical Summary
This project will develop methods for the design and analysis of randomised trials with a geographical basis - studies that include individuals on the basis of their location and where an intervention has a specific location. Cluster randomised trials are the most common method of evaluations for these studies. Spatial effects are typically ignored in these designs: neither spatial correlation nor spatially varying intervention effects are considered. This can lead to inefficient cluster design, treatment effects that vary arbitrarily with cluster design, and the inefficient sampling of participants, among other problems. We will develop and evaluate methods for the design and analysis of cluster trials that explicitly account for spatial effects and develop and assess a novel study design in which intervention locations are randomised across a continuous area.
WP1 will adapt geostatistical models to the cluster randomised trial and develop methods for treatment effect estimation, power and sample size analysis, cluster design, and approaches to contamination. We will evaluate the methods using simulated data.
WP2 will extend the methods of WP1 to develop the "spatial trial" design in which interventions are randomly located or randomised to fixed locations or to a specific sequence of locations over time. We will document the different variations of spatial trial design and examine power and sample size, treatment effect estimators, restricted randomisation schemes, and efficient sampling methods. We will evaluate the methods using simulated data.
WP3 will develop a software package implementing the methods with accompanying documentation for public release.
WP4 will re-analyse a cluster trial of a sanitation intervention in urban and rural areas. We will also work with academic and implementation partners to provide training in these methods and develop proposals for trials using these methods.
WP1 will adapt geostatistical models to the cluster randomised trial and develop methods for treatment effect estimation, power and sample size analysis, cluster design, and approaches to contamination. We will evaluate the methods using simulated data.
WP2 will extend the methods of WP1 to develop the "spatial trial" design in which interventions are randomly located or randomised to fixed locations or to a specific sequence of locations over time. We will document the different variations of spatial trial design and examine power and sample size, treatment effect estimators, restricted randomisation schemes, and efficient sampling methods. We will evaluate the methods using simulated data.
WP3 will develop a software package implementing the methods with accompanying documentation for public release.
WP4 will re-analyse a cluster trial of a sanitation intervention in urban and rural areas. We will also work with academic and implementation partners to provide training in these methods and develop proposals for trials using these methods.
Publications
I Watson S
(2022)
Efficient design of geographically-defined clusters with spatial autocorrelation.
in Journal of applied statistics
Lilford R
(2021)
Methodological issues in economic evaluations of emergency transport systems in low-income and middle-income countries.
in BMJ global health
Watson SI
(2023)
Evaluation of combinatorial optimisation algorithms for c-optimal experimental designs with correlated observations.
in Statistics and computing
Watson SI
(2023)
Optimal study designs for cluster randomised trials: An overview of methods and results.
in Statistical methods in medical research
Watson SI
(2021)
Design and analysis of three-arm parallel cluster randomized trials with small numbers of clusters.
in Statistics in medicine
Description | D-STRESS |
Organisation | King's College London |
Department | King's Centre for Global Health |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I am the lead statistician on this NIHR funded project, which is a cluster randomised trial of a mental health intervention for people with diabetes. Our proposed trial design and analysis uses state-of-the-art, rigorous methods, which have been developed and refined by our research team. |
Collaborator Contribution | Our collaborators provide the clinical and related non-statistical expertise to the partnership. |
Impact | The funding has only just started so no outputs are available to report yet. It is multi-disciplinary including experts in endocrinology, qualitative and mixed research methods, and mental health. |
Start Year | 2023 |
Title | clustertrial.app |
Description | clustertrial.app is an online, standalone application that provides a suite of design and analysis tools for cluster randomised trials. Prior to this application, access to new methods to estimate power, sample size, or optimal designs for cluster randomised trials were limited to a large number of "Shiny" apps, each with a narrow aim. The new software unifies all the previous methods and includes analyses including small sample corrections, optimal design algorithms, and new tools not available elsewhere. |
Type Of Technology | Webtool/Application |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | The software is available online to all researchers and ensures easy access to methods for the accurate estimation of sample size and power. |
URL | http://www.clustertrial.app |
Title | crctStepdown |
Description | This is an R package that provides permutation tests with and without multiple test corrections for generalised linear mixed models and generalised linear models. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This package provides a set of novel methods including permutation test based confidence intervals and multiple testing corrections for generalised linear mixed models. |
URL | https://cran.r-project.org/web/packages/crctStepdown/index.html |
Title | glmmrBase |
Description | glmmrBase is a package for the R statistical programming language, which is available on CRAN. The package provides a suite of tools for the specification, analysis, and fitting of generalised linear mixed models. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | The software provides the first general implementation of Markov Chain Monte Carlo Maximum Likelihood model fitting for generalised linear mixed models as well as unifying a range of tools for design analysis for a broad range of models. |
URL | https://cran.r-project.org/web/packages/glmmrBase/index.html |
Title | glmmrOptim |
Description | glmmrOptim is an R package that builds on glmmrBase, and that implements a wide range of algorithms to identify c-optimal experimental designs for generalised linear mixed models. |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | This is the first package to implement algorithms to identify optimal experimental designs for studies with correlation between experimental units as well as other tools. |
URL | https://cran.r-project.org/web/packages/glmmrOptim/index.html |
Title | rbobyqa |
Description | This is a package for R that provides C++ headers that implement the BOBYQA algorithm for derivative free function minimisation. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Used in a wide variety of other packages involving model fitting. |
URL | https://cran.r-project.org/web/packages/rbobyqa/index.html |
Title | rts2 |
Description | There are a wide variety of sources of case data that are finely spatially-resolved and time stamped to monitor disease epidemiology in real-time. These data may include daily hospital admissions, positive tests, or calls to public health telephone services. Geospatial statistical models provide a principled basis for generating predictions of disease epidemiology and its spatial and temporal distribution from these data. rts2 provides a set of functions to conduct disease surveillance using a real-time feed of spatial or spatio-temporal data of cases. The package includes Bayesian and maximum likelihood methods resulting in 20 different approaches to estimating the model. It also provides new functionality for spatially aggregated data. The software is now available on CRAN as version 0.7.2. |
Type Of Technology | Webtool/Application |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | The software is currently being used by local authorities in the midlands and North East to support disease surveillance platforms for Covid-19. We are discussing with local and national partners in Sub-Saharan Africa (Kenya, Nigeria, and Malawi) and South Asia (Bangladesh) about incorporation of the software into local disease surveillance platforms for malaria and other conditions. The software has been used by our team for the analysis of Covid-19 data in Birmingham and to support other researchers looking into these data. We have used the software to analyse malnutrition outcomes in Birmingham for some forthcoming research. These tools also form the basis for a proposed project to map leprosy incidence in India. |
URL | https://cran.r-project.org/web/packages/rts2/index.html |
Title | sparseChol |
Description | This is an R package that provides a C++ implementation of algorithms to generate the Cholesky decomposition for sparse matrices. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | This package supports sparse matrix methods in several other R packages. |
URL | https://cran.r-project.org/web/packages/sparseChol/index.html |