Geostatistical design and analysis of randomised evaluations with a geographic basis

Lead Research Organisation: University of Birmingham
Department Name: Institute of Applied Health Research

Abstract

Studies that randomly allocate individual people to receive a treatment or an alternative comparator allow us to estimate what does and does not happen when patients receive a treatment and hence estimate its effect. The success of the randomised study has led to the development of studies that instead randomise groups of people, or "clusters", such as villages, classrooms of children, or residents of nursing homes, to receive an intervention together. Cluster-based studies are useful as in many contexts individuals in a cluster will likely be similar and interact with one another. An intervention applied to one individual in a cluster could have indirect effects on other members of that cluster, which would undermine studies that randomise individuals, but not cluster-based ones.

Many randomised studies observe study participants or clusters at multiple points in time, perhaps before and after an intervention is applied. In the statistical literature, there has been a lot of analysis about how to deal with how the data we capture changes over time - things are likely to be less similar the further apart in time they're measured, for example. Capturing the effects of time is important to making sure our studies are designed well and analysed properly. However, for randomised studies there has been little analysis about how to deal with data varying over space - the closer things are the more similar they are likely to be - and so there is little guidance on the best design when this is likely to matter.

This project will consider how to design and analyse studies where a "cluster" is created based on where people live, typically by including people close to a possible intervention location. An example would be a study of the effect of installing new wells in a city in a low-income country and including people who live close to possible well locations in each cluster. In these studies, space matters. Measurements of outcomes from people who live near to one another are likely to be more similar than if they lived far apart as, for example, people can spread infectious disease to one another. However, we normally assume that it does not matter how far apart the people in a cluster are from one another nor how far from the intervention they are. While this approach does not necessarily lead to errors in the estimates of an intervention's effects, it can mean we are less precise than we need to be, requiring larger, more expensive studies. It also means we do not learn about how the effect of an intervention changes over space, an important consideration if we want to roll-out the intervention in the real-world.

We will adapt methods from the field of geospatial statistics to develop methods for the spatial design and analysis of cluster trials. Explicitly accounting for space also opens up the door to a novel type of randomised study in which, instead of randomly assigning patients or clusters to receive an intervention, we randomly choose a location for an intervention. We call this a "spatial trial" and it has potential benefits for evaluating how well interventions work in places where natural clusters do not exist. For example, if a city were rolling-out new wells across the city to numerous locations.

Our work is primarily statistical and consists of analysing how different statistical models work in a randomised study design. To enable the use of the new methods we will produce software that will run in standard statistical packages and provide detailed documentation and examples that we will make available online. We see particular benefit for "implementation science" research, which aims to study what happens with "real-world" interventions. Our work will aid in designing ways these interventions can be rolled out so that their effects can be reliably measured. However, any academic field that designs studies of interventions over an area will benefit, including agriculture, economics, and ecology.

Technical Summary

This project will develop methods for the design and analysis of randomised trials with a geographical basis - studies that include individuals on the basis of their location and where an intervention has a specific location. Cluster randomised trials are the most common method of evaluations for these studies. Spatial effects are typically ignored in these designs: neither spatial correlation nor spatially varying intervention effects are considered. This can lead to inefficient cluster design, treatment effects that vary arbitrarily with cluster design, and the inefficient sampling of participants, among other problems. We will develop and evaluate methods for the design and analysis of cluster trials that explicitly account for spatial effects and develop and assess a novel study design in which intervention locations are randomised across a continuous area.

WP1 will adapt geostatistical models to the cluster randomised trial and develop methods for treatment effect estimation, power and sample size analysis, cluster design, and approaches to contamination. We will evaluate the methods using simulated data.

WP2 will extend the methods of WP1 to develop the "spatial trial" design in which interventions are randomly located or randomised to fixed locations or to a specific sequence of locations over time. We will document the different variations of spatial trial design and examine power and sample size, treatment effect estimators, restricted randomisation schemes, and efficient sampling methods. We will evaluate the methods using simulated data.

WP3 will develop a software package implementing the methods with accompanying documentation for public release.

WP4 will re-analyse a cluster trial of a sanitation intervention in urban and rural areas. We will also work with academic and implementation partners to provide training in these methods and develop proposals for trials using these methods.

Publications

10 25 50

publication icon
Watson SI (2023) Optimal study designs for cluster randomised trials: An overview of methods and results. in Statistical methods in medical research

 
Title crctStepdown 
Description This is an R package that provides permutation tests with and without multiple test corrections for generalised linear mixed models and generalised linear models. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This package provides a set of novel methods including permutation test based confidence intervals and multiple testing corrections for generalised linear mixed models. 
URL https://cran.r-project.org/web/packages/crctStepdown/index.html
 
Title glmmrBase 
Description glmmrBase is a package for the R statistical programming language, which is available on CRAN. The package provides a suite of tools for the specification, analysis, and fitting of generalised linear mixed models. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact The software provides the first general implementation of Markov Chain Monte Carlo Maximum Likelihood model fitting for generalised linear mixed models as well as unifying a range of tools for design analysis for a broad range of models. 
URL https://cran.r-project.org/web/packages/glmmrBase/index.html
 
Title glmmrOptim 
Description glmmrOptim is an R package that builds on glmmrBase, and that implements a wide range of algorithms to identify c-optimal experimental designs for generalised linear mixed models. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact This is the first package to implement algorithms to identify optimal experimental designs for studies with correlation between experimental units as well as other tools. 
URL https://cran.r-project.org/web/packages/glmmrOptim/index.html
 
Title rbobyqa 
Description This is a package for R that provides C++ headers that implement the BOBYQA algorithm for derivative free function minimisation. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Used in a wide variety of other packages involving model fitting. 
URL https://cran.r-project.org/web/packages/rbobyqa/index.html
 
Title sparseChol 
Description This is an R package that provides a C++ implementation of algorithms to generate the Cholesky decomposition for sparse matrices. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This package supports sparse matrix methods in several other R packages. 
URL https://cran.r-project.org/web/packages/sparseChol/index.html