Combining epidemiological and phylogenetic models of infectious disease dynamics
Lead Research Organisation:
University of Cambridge
Department Name: Veterinary Medicine
Abstract
Viruses such as influenza A and HIV-1 mutate extremely rapidly, such that the viruses in one individual are genetically different from those in another individual. While this presents significant hurdles to develop effective vaccines, the genetic variation can be used to determine the extent to which viruses in different individuals are related, and to generate a `family tree', or phylogeny, of viruses. Viral phylogenies contain a great deal of information about the past spread of the virus, but this information is difficult to extract, as many factors can affect viral transmission, such as the probability of infection per contact, and the duration of infection.
This project will combine epidemiological models of infectious diseases, which are commonly used tools to consider how incidence and prevalence changes over time, with models of viral evolution. We will endeavour to make our models as biologically realistic as possible, allowing us to consider the different pathways via which viruses may spread over a geographic area, as well as helping us to understand how the transmission of viruses may be affected by genetic changes in the virus.
This project will combine epidemiological models of infectious diseases, which are commonly used tools to consider how incidence and prevalence changes over time, with models of viral evolution. We will endeavour to make our models as biologically realistic as possible, allowing us to consider the different pathways via which viruses may spread over a geographic area, as well as helping us to understand how the transmission of viruses may be affected by genetic changes in the virus.
Technical Summary
Coalescent models are commonly used to model the population dynamics of viruses using viral sequence data. This approach is attractive epidemiologically, as information on the past transmission dynamics of a virus can be obtained even using a single, cross-sectional sample of viruses. Coalescent models are also appealing computationally, as only the evolutionary past of the sample of sequences needs to be considered, rather than that of the population from which the sample has been obtained. However, these coalescent models originate from considering the dynamics of single populations or species, and while these models can be fitted to viral sequence data, the resulting parameter estimates are extremely hard to interpret, as they are so far abstracted from meaningful epidemiological quantities. For example, we have previously shown that the `effective population size' of a viral epidemic is not, as is commonly assumed, proportional to the number of infected individuals, rather it is related to both the incidence and the prevalence.
We propose the development of evolutionary models that incorporate explicit models of viral transmission, considering factors such as geographic spread of viruses, differences in sampling effort, demographic stochasticity and selection. While progress has recently been made in some of these areas, we argue that the failure to consider the details of the transmission process may lead to incorrect conclusions being drawn. This additional flexibility in allowing biological realism comes at the cost of considering the evolutionary dynamics of the entire population via simulation-based techniques, and part of the research associated with addressing the main aims involves alleviating at least some of this computational cost, partly through the adoption of approaches which parallelize easily, and partly through algorithmic development.
We propose the development of evolutionary models that incorporate explicit models of viral transmission, considering factors such as geographic spread of viruses, differences in sampling effort, demographic stochasticity and selection. While progress has recently been made in some of these areas, we argue that the failure to consider the details of the transmission process may lead to incorrect conclusions being drawn. This additional flexibility in allowing biological realism comes at the cost of considering the evolutionary dynamics of the entire population via simulation-based techniques, and part of the research associated with addressing the main aims involves alleviating at least some of this computational cost, partly through the adoption of approaches which parallelize easily, and partly through algorithmic development.
Planned Impact
Impact Summary
Development of the models will improve our understanding and interpretation of epidemiological dynamics obtained using viral sequence data. The proposed models will be designed to scale up to large sequence datasets using high-performance, parallel computing facilities, yet will provide a simple interface to scientists for their data analysis. Dissemination of results will be by the usual publication routes and conferences, but we will also particularly discuss our results with our collaborators on other projects.
Wider impact
The emergence or re-emergence of viral species or strains, whether by chance natural events, zoonotic transmissions or selection pressure due to theraputic drug use or vaccination, can present a significant risk to human and/or animal populations. Consequently it is important to track the transmission of viral pathogens, both through space and time, and the proposed programme enables this. Ultimately, epidemiological insights resulting from analyses with the proposed software can be used by medical professionals and policy makers. This project may also have an economic impact, by contributing to reducing the detrimental effects of infectious disease on human and animal health and productivity, by increasing understanding of pathogen evolution.
Impact timescales
The immediate impact of this research is likely to be a re-evaluation of previouus results. During the methodological development associated with the project, we will also present new results on the molecular epidemiology of HIV, hepatitis C, and West Nile Virus. We will release software implementing the methods on a regular, incremental basis over the course of the project, to maximise the exposure of these approaches. We are confident that these methods will be well received by the scientific community, and towards the end of the project, we will provide training in the use of the methods. Such sessions are likely to be important in assessing the ongoing needs of the scientific community in these methods, beyond those of our own projects.
Development of the models will improve our understanding and interpretation of epidemiological dynamics obtained using viral sequence data. The proposed models will be designed to scale up to large sequence datasets using high-performance, parallel computing facilities, yet will provide a simple interface to scientists for their data analysis. Dissemination of results will be by the usual publication routes and conferences, but we will also particularly discuss our results with our collaborators on other projects.
Wider impact
The emergence or re-emergence of viral species or strains, whether by chance natural events, zoonotic transmissions or selection pressure due to theraputic drug use or vaccination, can present a significant risk to human and/or animal populations. Consequently it is important to track the transmission of viral pathogens, both through space and time, and the proposed programme enables this. Ultimately, epidemiological insights resulting from analyses with the proposed software can be used by medical professionals and policy makers. This project may also have an economic impact, by contributing to reducing the detrimental effects of infectious disease on human and animal health and productivity, by increasing understanding of pathogen evolution.
Impact timescales
The immediate impact of this research is likely to be a re-evaluation of previouus results. During the methodological development associated with the project, we will also present new results on the molecular epidemiology of HIV, hepatitis C, and West Nile Virus. We will release software implementing the methods on a regular, incremental basis over the course of the project, to maximise the exposure of these approaches. We are confident that these methods will be well received by the scientific community, and towards the end of the project, we will provide training in the use of the methods. Such sessions are likely to be important in assessing the ongoing needs of the scientific community in these methods, beyond those of our own projects.
Organisations
Publications
Bourret V
(2017)
Adaptation of avian influenza virus to a swine host
Bourret V
(2017)
Adaptation of avian influenza virus to a swine host.
in Virus evolution
Brayne A
(2017)
Genotype-Specific Evolution of Hepatitis E Virus
Brayne AB
(2017)
Genotype-Specific Evolution of Hepatitis E Virus.
in Journal of virology
Dearlove BL
(2016)
Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections.
in The ISME journal
Dearlove BL
(2015)
Measuring Asymmetry in Time-Stamped Phylogenies.
in PLoS computational biology
Dearlove BL
(2017)
Biased phylodynamic inferences from analysing clusters of viral sequences.
in Virus evolution
Eames K
(2015)
Six challenges in measuring contact networks for use in modelling.
in Epidemics
Description | Molecular Epidemiology of Viruses Course |
Geographic Reach | Europe |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | A course was ran to train researchers in the use of the R programming language to conduct molecular epidemiological analysis of viral sequence data, at the Gulbenkian Institute near Lisbon, Portugal, in 2015. |
URL | http://sdwfrost.github.io/mevr/ |
Description | Cambridge-Africa Alborada Research Fund |
Amount | £12,991 (GBP) |
Organisation | University of Cambridge |
Department | Alborada Research Fund |
Sector | Academic/University |
Country | United Kingdom |
Start | 09/2016 |
End | 09/2017 |
Description | Genetics Society Summer Studentship |
Amount | £1,712 (GBP) |
Organisation | The Genetics Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 06/2014 |
End | 09/2014 |
Description | International Exchanges Scheme |
Amount | £6,250 (GBP) |
Funding ID | IE160720 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 12/2016 |
End | 11/2017 |
Description | Isaac Newton Trust |
Amount | £34,500 (GBP) |
Funding ID | 16.07(d) |
Organisation | University of Cambridge |
Department | Isaac Newton Trust |
Sector | Academic/University |
Country | United Kingdom |
Start | 06/2016 |
End | 06/2017 |
Description | UK-Indonesia Joint Health Research |
Amount | £266,705 (GBP) |
Funding ID | MR/P017541/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2017 |
End | 09/2019 |
Title | Gillespie.jl |
Description | This is a Julia library to simulate stochastic models (e.g. epidemiological models) in the Julia programming language. It is notable for both its simplicity and speed. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The paper describing the release of this tool has already been cited twice, and is becoming increasingly used for research purposes. |
URL | http://github.com/sdwfrost/Gillespie.jl |
Title | PDMP.jl |
Description | This is a Julia library for simulating piecewise deterministic Markov processes, a general class of stochastic processes that allows one to simulate, for example, seasonally forced stochastic epidemiological models. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | None as yet |
URL | http://github.com/sdwfrost/PDMP.jl |
Title | distributions.nim |
Description | This is a library for the Nim programming language that provides the basic building blocks - random numbers and distributions - for simulating stochastic processes. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | None at present |
URL | http://github.com/sdwfrost/distributions |
Title | liblsoda |
Description | This is a refactoring of the widely used LSODA algorithm for numerical solution of ordinary differential equations. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The library has already been incorporated into higher level libraries in R and Julia by other researchers. |
URL | http://github.com/sdwfrost/liblsoda |
Title | libtn93 |
Description | This is a portable C library for calculating genetic distances between sequences according to the TN93 model of sequence evolution. It is notable for its speed, and that it can be used in conjunction with high level languages such as Python, R, and Julia. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | None as yet |
URL | http://github.com/sdwfrost/libtn93 |
Title | saphy: sequential analysis of phylogenies |
Description | This is a R software library to analyse phylogenetic data in an 'on-line' fashion, with taxa added sequentially over time. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | None at present |
URL | http://github.com/hackout3/saphy |
Title | treeImbalance |
Description | Phylogenetic trees of viruses sampled from different individuals provide clues to the dynamics of transmission. The extent to which the tree is asymmetric may be influenced by biological factors such as differences in infectiousness or contact rates between individuals, but also by nuisance factors such as the pattern of sampling. We have devised a simple statistical test for asymmetry, which controls for sampling patterns and potentially complex temporal dynamics by conditioning on the sampling and coalescence times in a phylogeny, and can also detect whether specific clades in the phylogeny drive patterns of asymmetry. We have developed an open-source R package for detecting asymmetry in time-sampled phylogenetic trees using this test. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | None as yet. |
URL | https://github.com/bdearlove/treeImbalance |
Title | treedater |
Description | This is a method to infer time-calibrated phylogenies from sequence data, developed in collaboration with Erik Volz at Imperial College London. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | None at present |
URL | http://github.com/emvolz/treedater |
Title | PANGEA-HIV methods comparison |
Description | The PANGEA-HIV consortium, funded by the Bill and Melinda Gates Foundation, is investigating the dynamics of HIV transmission in sub-Saharan Africa, and will generate a large number of full length HIV genomes to provide insights into transmission. As a prelude to the release of the data, simulated datasets have been generated by researchers at Imperial College London and the University of Edinburgh to provide a testbed for different methods. We have generated a database of analyses and 'metadata', such as reconstructed phylogenetic trees. |
Type Of Material | Database/Collection of data |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | The results of our analyses will be presented at an upcoming meeting in December, together with results from other groups. |
URL | https://github.com/sdwfrost/pangea |
Title | Artificial Neural Networks for Viral Lineages (ANVIL) |
Description | ANVIL is a software package to identify viral lineages based on a supervised set of sequences. ANVIL uses neural networks to infer the genotype of short segments of sequence, from which it can conclude the genotype of a virus, and determine whether there has been inter-subtype recombination. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | None as yet |
URL | https://github.com/asmmhossain/ANVIL |
Title | Gillespie.jl |
Description | Gillespie.jl is a library for the programming language Julia that implements Gillespie's stochastic simulation algorithm, a widely used approach for stochastic simulation |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | None at present; the library is being used for research purposes |
URL | https://github.com/sdwfrost/Gillespie.jl |
Title | OutbreakTools |
Description | OutbreakTools is an R package for the analysis and visualisation of epidemiological data. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | None as yet |
URL | https://sites.google.com/site/therepiproject/r-pac/about |
Title | PDMP.jl |
Description | PDMP.jl is a library written in the Julia programming language to perform simulation of piecewise deterministic Markov processes; examples of this include stochastic simulations with time-varying rates and hybrid discrete/continuous systems. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | None as yet |
URL | https://github.com/rveltz/PDMP.jl |
Title | Phlow |
Description | We are developing a workflow to streamline phylogenetic analyses of viral sequence datasets |
Type Of Technology | Webtool/Application |
Year Produced | 2014 |
Impact | None at present; a publication describing the software is in preparation |
URL | https://github.com/asmmhossain/phlow |
Title | Pipelign |
Description | An automated pipeline for generating multiple sequence alignments of viral sequences. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | None as yet |
URL | https://github.com/asmmhossain/pipelign |
Title | epiwidgets |
Description | A collection of dynamic 'widgets' useful for visualising epidemiological data |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | None as yet |
URL | https://github.com/sdwfrost/epiwidgets |
Title | mathmodels |
Description | A collection of dynamic widgets for demonstrating mathematical models of epidemiology, genetics, and ecology |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | This was used in a sixth-form extension course for BSix College in Hackney. |
URL | https://github.com/sdwfrost/mathmodels |
Title | merlin |
Description | merlin is a software library for the R programming language to aid in the analysis of sequence data, particularly that of viruses, in molecular epidemiology studies |
Type Of Technology | Software |
Year Produced | 2013 |
Impact | This package was used as part of a training course in bioinformatics held in Cambridge |
URL | https://r-forge.r-project.org/projects/merlin/ |
Title | nextHIV |
Description | nextHIV is a platform for real-time HIV surveillance, that combines clinical data with HIV sequence data, and automatically processes the data and generates an interactive report that can be shared with public health officials and policy makers. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | A new collaboration with researchers at UNC-Chapel Hill has begun to use nextHIV as a platform for an intervention trial to determine whether HIV sequence data can be harnessed for guiding prevention. |
URL | https://github.com/sdwfrost/nextHIV |
Title | treeImbalance |
Description | treeImbalance is a software library for the R programming language to calculate measures of imbalance in phylogenetic trees, and assess their statistical significance using permutation tests. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | None as yet; a manuscript employing this approach is currently in preparation |
URL | https://github.com/bdearlove/treeImbalance |