Efficient geostatistical sampling to estimate the fraction of the population recovered from Covid-19
Lead Research Organisation:
University of Birmingham
Department Name: Institute of Applied Health Research
Abstract
The Covid-19 pandemic has a long course to run. Its successful management by governments and other international agencies will require statistical tools for real-time monitoring of the evolution of the pandemic over space and time. How covid-19 spreads across an urban area over time, for example whether there are small or large numbers of clusters, how large they are spatially, and how rapidly they grow, is poorly understood. Understanding local phenomena can also support other research programmes and provide evidence to support future lockdown policies, for example how localised lockdowns need to be (city-wide versus neighbourhoods) and for how long. Local authorities may also use this evidence in support of highly targeted partial lockdown policies (such as differential application of the national Covid alert scale for different areas). Data sources that identify the location of cases can be used to generate predictions of the spread of Covid-19 cases over time and space, which will facilitate the implementation of localised policies to contain the spread of the virus. The aim of this project is to adapt statistical methods for this purpose and develop software for their implementation.
This project will develop software for the real-time surveillance of Covid-19 that can be used with any georeferenced and time stamped data. We will use data on hospital attendances and admissions for Covid-19 to develop, calibrate, and test our software and models. We will build on state-of-the-art geostatistical software developed by the co-applicants to produce estimates and predictions of incidence or the "R number" across an area of interest based on available data sources. These outputs can also support the design of scheme to sample the population for testing when such programmes are rolled out, for which we will also include functionality.
This project will develop software for the real-time surveillance of Covid-19 that can be used with any georeferenced and time stamped data. We will use data on hospital attendances and admissions for Covid-19 to develop, calibrate, and test our software and models. We will build on state-of-the-art geostatistical software developed by the co-applicants to produce estimates and predictions of incidence or the "R number" across an area of interest based on available data sources. These outputs can also support the design of scheme to sample the population for testing when such programmes are rolled out, for which we will also include functionality.
Technical Summary
The Covid-19 pandemic has a long course to run. Its successful management by governments and other international agencies will require statistical tools for real-time monitoring of the evolution of the pandemic over space and time. The aims of this project are to develop statistical tools and software for real-time covid-19 surveillance using georeferenced and time-stamped data from covid-19 cases or testing programmes.
To summarise our statistical approach, we assume that across our area of interest there is an underlying, unobserved prevalence of people who have had the disease. At a single time-point we observe spatially-referenced data, which may include only cases from routine hospital records, or testing data with dichotomous outcomes. Where the data provide both numerators and denominators we will use the binomial generalised geostatistical model. Where only numerators are available we will use a Poisson approximation to the binomial with small-area population counts as offsets. We will including a set of spatially varying-covariates, and the unobserved process which is specified as a spatio-temporal Gaussian process.
The outputs of our analyses will be predictions of incidence, cumulative incidence, and the effective reproductive rate (the "R number") and their spatial distribution. The output will identify areas where there is a high probability of high reproductive rates (R>1, for example). Unbiased estimates of prevalence at a particular point in time from nationwide testing programmes will be used to correct for the spatial biases that may result from the use of routine, non-randomised data. We will also provide functionality to support random sampling schemes.
To summarise our statistical approach, we assume that across our area of interest there is an underlying, unobserved prevalence of people who have had the disease. At a single time-point we observe spatially-referenced data, which may include only cases from routine hospital records, or testing data with dichotomous outcomes. Where the data provide both numerators and denominators we will use the binomial generalised geostatistical model. Where only numerators are available we will use a Poisson approximation to the binomial with small-area population counts as offsets. We will including a set of spatially varying-covariates, and the unobserved process which is specified as a spatio-temporal Gaussian process.
The outputs of our analyses will be predictions of incidence, cumulative incidence, and the effective reproductive rate (the "R number") and their spatial distribution. The output will identify areas where there is a high probability of high reproductive rates (R>1, for example). Unbiased estimates of prevalence at a particular point in time from nationwide testing programmes will be used to correct for the spatial biases that may result from the use of routine, non-randomised data. We will also provide functionality to support random sampling schemes.
Publications
Futter A
(2020)
Nuclear war, public health, the COVID-19 epidemic: Lessons for prevention, preparation, mitigation, and education
in Bulletin of the Atomic Scientists
Watson S
(2020)
Randomised evaluation of government health programmes does present a challenge to standard research ethics frameworks
in Journal of Medical Ethics
Watson SI
(2021)
Real time disease surveillance in R: rts2 vignette
Watson SI
(2020)
Revising ethical guidance for the evaluation of programmes and interventions not initiated by researchers.
in Journal of medical ethics
Watson SI
(2021)
Spatio-temporal analysis of the first wave of Covid-19 hospitalisations in Birmingham, UK
in BMJ Open
Watson SI
(2022)
Evaluations of water, sanitation and hygiene interventions should not use diarrhoea as (primary) outcome.
in BMJ global health
Description | The main outputs of this work have been new statistical methods and tools to support real-time disease surveillance using routinely-collected case data. We have continued to develop and expand the methods developed in this project, and the resulting statistical software is publicly available. We also provided proof-of-principle research showing that routinely collected case data in the NHS could be used to monitor disease risk evolving on short time scales. |
Exploitation Route | We see the outcomes of this research being taken forward in two ways. Academically, the methods used in this project have led to a number of questions about the best statistical approaches to predicting disease risk efficiently. There are limited comparisons currently, and so future research will be able to better inform practitioners about how to efficiently predict risk with routinely collected data. Non-academically, we wish to integrate the statistical and computational methods of this project into real-world health system data to support public health surveillance efforts. We are aiming for our first real-world application to focus on a country in Sub-Saharan Africa where there is a high-risk of epidemic from many diseases, and where new electronic health record systems are being integrated into existing services. |
Sectors | Healthcare Pharmaceuticals and Medical Biotechnology |
URL | https://cran.r-project.org/web/packages/rts2/index.html |
Description | The methods and results have been investigated to be integrated into public platforms for disease surveillance. We have been discussing with partners in Malawi development of integrated disease surveillance with routine health data collection, and seeking funding to support this. We have substantially expanded the software released as part of this project to incorporate novel estimation methods that provide further efficiencies. This research has also contributed to new methodology around using spatially-aggregated data and estimating spatial effects of public health interventions. |
First Year Of Impact | 2021 |
Sector | Healthcare |
Impact Types | Policy & public services |
Description | Integration into Local Public Health Platforms |
Geographic Reach | Local/Municipal/Regional |
Policy Influence Type | Implementation circular/rapid advice/letter to e.g. Ministry of Health |
Impact | The software created as part of this project provides a reliable and efficient method of analysing real-time data on disease incidence. We have supported the ongoing integration of these tools into the surveillance systems for Birmingham and in the North East. The aim is to produce public-facing outputs as well as analyses to support public health policy. |
Description | Geostatistical design and analysis of randomised evaluations with a geographic basis |
Amount | £484,706 (GBP) |
Funding ID | MR/V038591/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2021 |
End | 08/2024 |
Title | rts2 |
Description | There are a wide variety of sources of case data that are finely spatially-resolved and time stamped to monitor disease epidemiology in real-time. These data may include daily hospital admissions, positive tests, or calls to public health telephone services. Geospatial statistical models provide a principled basis for generating predictions of disease epidemiology and its spatial and temporal distribution from these data. rts2 provides a set of functions to conduct disease surveillance using a real-time feed of spatial or spatio-temporal data of cases. The package includes Bayesian and maximum likelihood methods resulting in 20 different approaches to estimating the model. It also provides new functionality for spatially aggregated data. The software is now available on CRAN as version 0.7.2. |
Type Of Technology | Webtool/Application |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | The software is currently being used by local authorities in the midlands and North East to support disease surveillance platforms for Covid-19. We are discussing with local and national partners in Sub-Saharan Africa (Kenya, Nigeria, and Malawi) and South Asia (Bangladesh) about incorporation of the software into local disease surveillance platforms for malaria and other conditions. The software has been used by our team for the analysis of Covid-19 data in Birmingham and to support other researchers looking into these data. We have used the software to analyse malnutrition outcomes in Birmingham for some forthcoming research. These tools also form the basis for a proposed project to map leprosy incidence in India. |
URL | https://cran.r-project.org/web/packages/rts2/index.html |