Novel Bayesian methods for comparing and evaluating infectious disease models in the light of partially observed data.

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

Mathematical models for the spread of infectious diseases are used to make predictions of the size of the epidemic, to improve understanding of the mechanisms of transmission and to develop effective strategies for control. For such models to provide useful insights it is vital that they are appropriate for the disease being modelled and well supported by data. This project will develop the techniques necessary to compare and evaluate infectious disease models, even when not all of the information has been observed, as is typically the case in practice.

Historically it has been extremely challenging to fit epidemic models to data due to the fact that key characteristics of disease transmission, such as infection times and who infected whom, are rarely known. The advent of data imputation techniques such as Markov chain Monte Carlo (MCMC) have made it possible to reconstruct this missing information and successfully fit a single epidemic model to data, albeit at the cost of considerable computational effort. The challenging task of developing tools that enable us to learn from data which models are the most appropriate for each disease is the subject of this project. Three aspects of this question will be addressed: A) model comparison, B) model evaluation and influence, and C) the illustration of these methods through a series of applications.

A. Model comparison techniques allow statisticians to quantify the evidence in favour of competing scientific hypotheses, where each hypothesis can be represented as a different model. For example we might wish to learn whether every individual in a population is equally infectious or if there is heterogeneity in infectivity between individuals. Each of these hypotheses can be represented by an epidemic model and the model that is best supported by the data indicates which hypothesis we should believe. Unfortunately for models fitted using MCMC it is technically very challenging to calculate the evidence in favour of each model and so new methods specific to epidemics, as well as refinements to existing generic methods, are needed.

B. Model evaluation techniques can be used to quantify how well (and in what areas) a single model fits the data. Without proper model assessment, model comparison can be misleading, if for example none of the models being considered adequately explain the data. Influential data are observations that have a disproportionate influence over the fitted model parameters, and therefore any subsequent predictions made using the fitted model. Currently there are few quantitative tools for identifying influential data in epidemics, despite the fact that during the SARS epidemic in 2001 it was observed that a small number of super-spreading events played a key role in the transmission dynamics. Being able to identify when such events occur can have major repercussions on the effectiveness of control policies. This project will develop methods to identify influential individuals and time periods in epidemic data by extending the role of the recently developed Bayesian latent residuals and adapting existing methods used for other types of statistical models.

C. Finally, several notable applications will be considered. The new methodology is particularly important for emerging data, which allows new epidemiological dimensions (e.g. pathogen genetics) to be explored; for emerging diseases, when little is known about the epidemiology of infection; and for uncovering seasonal drivers of infection. Examples from all three application areas will be considered using existing datasets that could yield new insights when analysed with the new methodology. These applications will help to disseminate the new techniques and will ensure that any new methodology can be readily applied in practice.

Technical Summary

Modern computational statistical methods, such as Markov chain Monte Carlo (MCMC), have revolutionised our ability to perform parameter inference for infectious disease models from partially observed data. This project will translate these advances into improvements in methodology for model comparison and model evaluation.

In a Bayesian framework, effective estimation of the marginal likelihood or model evidence is the key requirement to be able to perform model comparisons. This research project will develop importance sampling techniques that efficiently exploit the hierarchical structure of epidemic models to calculate the model evidence from partially observed data. In pilot work using discrete time models and small populations, this approach proved to be an order of magnitude more efficient than the leading alternatives. Important outcomes will be a computationally scalable version of this approach to large populations and a comparison with generic methods.

Effective model evaluation methods for epidemics are an essential requirement if policy recommendations are to be underpinned by data. This project will extend the recent concept of Bayesian latent residuals to enable the concept of influential data to be defined for partially observed epidemics. Furthermore a direct link will be established between individual data and the posterior distribution by reweighting samples from the posterior. Together these approaches to quantifying the influence of single observations, individuals or time periods will provide a better understanding of how infectious disease model predictions depend on data.

To ensure that the proposed methodology remains relevant to epidemiologists, applications in emerging data, emerging diseases and determining the seasonal drivers of infection will be pursued. Specific examples (pneumococcal infection, pandemic influenza and soil transmitted helminths) will illustrate the research and enable us to disseminate our results widely.

Planned Impact

The cornerstone of the pathways to impact for this multidisciplinary research project is to translate novel statistical methodology for the comparison and evaluation of infectious disease models into insights in the epidemiology of infection. This will be achieved through close collaboration between the project team and leading researchers working on important applications in the areas of emerging data, emerging diseases and the discovery of seasonal drivers of infection. Specific applications will be used to illustrate the new techniques and to produce novel epidemiological insights into these disease systems. These applications will include Streptococcus pneumoniae infection in children, pandemic influenza and soil transmitted helminths. The investigators will also seek out further applications during the project. The software produced during the project will be made publicly available in the form of an R package. Subsequent applications of the new methodology by other researchers mean that this research has enormous potential to impact many infectious diseases of humans, animals and plants. In addition, the new statistical techniques developed in this project may find applications in fields other than epidemiology.

As well as an improved epidemiological understanding of the processes of infection, a more data-driven approach to epidemic modelling will allow more appropriate models to be developed, with more accurate predictions of the likely extent of an outbreak. These improvements in epidemic modelling will enable the design of more effective control strategies, which would ultimately lead to public health benefits, such as a reduced burden of disease in human, animal and plant populations, and greater food security. Finally, by establishing the concept of influence in epidemic data, and by providing tools to quantify the influence of different individuals or different time periods on the inferred disease parameters, this project has the potential to lead to more efficient data collection and improved design of future infectious disease studies.

Publications

10 25 50
 
Description Seminar presentation (Oxford BDI) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Promote novel methodology
Year(s) Of Engagement Activity 2018