Novel simulation-based statistical inference with applications to epidemic models

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics


Parametric models play a key role in statistical modelling. Parametric models assume that there is an underlying model giving rise to the data we observe with the data dependent upon certain parameters and random quantities. For example, for the spread of a disease, the model parameters dictate the infectiveness of the disease but who becomes infected will depend upon the model setup and randomness. In practice, we rarely know the parameters of the model and a key element of statistics is to obtain good estimates of the parameters. In Bayesian statistics the parameters have a posterior distribution which quantifies the uncertainty in the parameters of the model. By studying the posterior distribution we can calculate any summary statistics of the parameters we are interested in. However, a major drawback of Bayesian statistics is that the posterior distribution is rarely available in a form which we can easily use. There are a number of approaches for obtaining samples from the posterior distribution, the most common of which is MCMC. Recently a range of practical problems in statistical genetics have been identified where MCMC can either not be used or it is particularly difficult to do so. A solution has been provided in the form of the ABC (approximate Bayesian computation) algorithm. The ABC algorithm uses simulation from the model with parameters chosen via an appropriate mechanism, often the prior distribution, to estimate the parameters. (The prior distribution represents our prior beliefs about the model parameters.) The ABC algorithm formalises the idea that we simulate from the model with different parameters, accepting those parameter values which lead to simulated data in close agreement with the observed data.

Both the MCMC and the ABC algorithms are iterative algorithms producing a single parameter from the posterior distribution at each iteration. Recently the investigator has introduced a new ABC algorithm which produces a set of parameters from the posterior distribution at each iteration. This new ABC algorithm is shown to be considerably more efficient than standard ABC algorithms, and has straightforwardly been applied to the analysis of epidemic models for the spread of infectious diseases. The aim of the proposed research is two-fold. Firstly, to develop more efficient MCMC and ABC algorithms which obtain sets of values from the posterior distribution. This should lead to more robust parameter estimation with lower uncertainty in parameter estimates based upon a sample of a given size. Secondly, to apply the new methods to a range of epidemic models to gain a better understanding of the spread of infectious diseases. In particular, the development of procedures which are easy to use and interpret by non-experts.

Planned Impact

One of the key aims of the research is to design algorithms which as well as being efficient are easy to implement. For example, the coupled ABC algorithm for household epidemics is non-trivial to code, its implementation is straightforward, in that, there is no fine tuning required from the practitioner. The algorithm does the hard work in identifying the sets of parameter values consistent with the data with the only required input being the data. A prior distribution can be specified but the coupled ABC algorithm is particularly effective with a uniform (improper) prior. The key goal of this project is to design easy to use algorithms wherever possible. That is, a non-expert will be able to input the data and obtain meaningful output without the need to understand the underlying mechanism of the algorithm. Also the algorithm will be made available and transparent so that statisticians can edit and develop the code for their own problem specific applications. In summary, a black box approach is likely to be helpful to public health practitioners, whereas for more expert users accessability of computer code will be helpful.

The epidemic examples have driven forward the research into the coupled ABC algorithm and will continue to do so here. Public interest in the spread of infectious diseases is substantial and readily seen in the extensive media coverage of swine flu, foot-and-mouth disease, SARS and HIV-Aids to name but a few. There is considerable interest in, and the need for, the development of statistical approaches to analyse the spread of infectious diseases. In particular, in the control of epidemic outbreaks and the eradication of endemic diseases. Efficient and accurate estimation of epidemic model parameters can lead to improved and more effective control measures. In other words, better understanding of disease spread allows for cost-effective control strategies to be devised, such as, for example, targeted vaccination programmes (swine flu) and targeted culling strategies (foot-and-mouth disease). Those interested in public and animal health include various areas of government such as DEFRA and local authorities, health services and the Health Protection Agency. Parameter estimation can be informative in its own right but for many epidemics, especially for diseases in progress (mid-outbreak), what is of greater interest is the long-term behaviour of the disease, such as the severity of the epidemic outbreak and will the disease, unchecked, become endemic? An effective mechanism for assessing the long-term behaviour of a disease is forward simulation of the epidemic process given the model, parameters and current state of the population. It is straightforward to extend the current research to incorporate forward simulation of future behaviour since the parameter estimation techniques are based upon (forward) simulation of the epidemic. Furthermore with a small amount of additional work problem specific code can be developed which in addition assesses the effects of various control strategies. This will involve more inputs from the practitioner but will ostensibly be better known or more easily interpretable quantities such as vaccine efficacy or details of a new culling or quarantine strategy.

The impact on the understanding of infectious disease spread will follow quickly from the development of the research. Other fields of research, and hence, public impact are likely to be more long-term. As mentioned in the case for support the ABC algorithm was introduced for, and has primarily been used in, the analysis of statistical problems in genetics. There is the potential for the proposed work to assist in the development of efficient algorithms for genetic problems and have an impact there. The proposed research should help in making simulation-based inferential methods even more popular, increasing the range of (statistical) applications and thus the impact of the methodology on society.


10 25 50
Description We have developed algorithms for performing statistical analysis of complex epidemic data sets. The methods developed are for models which have previously proved problematic for analysis and efficient methods for analysing temporally observed epidemics. Although the work is motivated by epidemic data set examples, the generic algorithms are more widely applicable.

The work has motivated exploring independence samplers for epidemic models. This in turn has led to some important new findings on the scaling (choice of) independence samplers.

The project has led to five papers. Two of which have been published with two more accepted for publication.
Exploitation Route The newly developed algorithms are widely applicable for epidemic data sets. In particular, the efficient MCMC algorithms for temporally observed epidemics are designed to be applied by non-experts (non-statisticians) as they automatically tune themselves to give an efficient algorithm. The forward simulation MCMC algorithm has significant potential as an alternative to the ABC algorithm for a variety of models. There are a number of improvements and refinements which can be made.

The work on independence samplers offers fresh insight into these algorithms which are applicable in a wide range of settings.
Sectors Agriculture, Food and Drink,Communities and Social Services/Policy,Environment,Healthcare