A theory of how epidemic dynamics shape pathogen phylogenies

Lead Research Organisation: Imperial College London
Department Name: Dept of Mathematics


Next-generation sequencing technology is now enabling the genomes of many organisms, including many infectious agents, to be sequenced quickly and inexpensively. Vast amounts of sequence data are being generated for important infectious diseases, including viruses such as HIV and influenza and now also for a number of bacteria, including C. difficile, S. aureus and M. tuberculosis. These data will be used to understand how pathogens are spreading and evolving, including how they are adapting to the introduction of vaccines and how, where, and how quickly drug resistance is emerging.While there is a mathematical theory of epidemiology which is the primary modelling tool for the study of infections, it has largely been focussed on understanding the spreading dynamics of a single, static pathogen, for example in terms of its basic reproduction number (defined as the number of secondary infections caused by a single infectious case in a fully susceptible population). But this theory offers very little about pathogen evolution, and still less about how to predict, model and interpret the vast quantities of genetic data that are becoming available. While there are mathematical models focussing on trait evolution, these do not focus on complex epidemic dynamics, and also do not explicitly predict features of sequence data. In contrast, there is a long tradition of mathematical population genetics aiming to answer questions such as how allele frequency is maintained, but these do not account for the kinds of structure that arise when the pathogen population is constrained by epidemic dynamics in human hosts. In this proposal, I aim to bridge this gap, and develop a theory of how pathogen sequences are related to the underlying epidemic dynamics. Characterising the relationship between epidemic dynamics and pathogen sequence data will require new mathematics, first in the form of the construction and analysis of models describing both, and then in the form of theorems relating the structure of epidemic dynamics to quantitative features of the phylogenetic trees used to summarise sequence data. These theoretical tools will be crucial as genomic data become increasingly available. This proposal provides an approach to making substantial progress in this direction through four core Objectives, moving from relating competition between several distinct, fixed pathogen strains to the resulting phylogenies, to explicitly modelling the relationship between host transmission and pathogen evolution. I will also develop better quantitative measures to identify similarities between phylogenetic trees on different datasets. This work will provide a theoretical underpinning linking epidemic population dynamics to sequence data, and will have a wide range of applications in addition to its new theoretical developments. Sequence data for many pathogens are currently being generated rapidly, and the analysis of these data is expected to benefit not only our understanding of pathogen evolution, but our ability to intervene for the benefit of public health. For example, the UK CRC Modernising Medical Microbiology Consortium is using whole-genome sequencing technology combined with population-based sampling, focusing on M. tb, norovirus, C. difficile and S. aureus. I have also been asked to provide modelling expertise for a community-based study in which HIV sequence data will be collected alongside individuals' sexual network data, providing a unique dataset for linking spreading patterns to viral sequences. My work under this grant would provide new insights into what these data mean for how these pathogens are spreading, which in turn will provide researchers and health policy makers with information to design improved prevention measures.

Planned Impact

The main beneficiaries of this work are academic, although I will be involved with a number of public engagement activities as well. The proposal's impact spans a number of academic fields, including mathematics and statistics, epidemiology, evolutionary biology and genomics, as outlined in the Academic Beneficiaries section. Furthermore, this work will ultimately have the potential for application in clinical settings, as personalised medicine develops. Novel methods for the interpretation of pathogen genome sequence data will not only provide crucial insight into pathogen evolution but will inform efforts to design vaccines and develop new drugs. While the core of this proposal is in the development of new theoretical tools and is in mathematics, the impact will be much broader. To ensure that the relevant experts are engaged with this work, the results will be communicated to relevant research communities through publications and attendance at conferences as well as through personal correspondence. The PneumoCarr community, for example, has a mandate to reduce pneumococcal carriage using new vaccines. Multiple forms of pneumococcus exist and compete with each other; how the pathogen population will respond to the vaccine is far from clear. For any new vaccine, a strong theoretical and empirical understanding of how the serotype composition of the pneumococcal population responds to increasing vaccine coverage will be critical in reducing carriage in the long term, and it is here that the results of my work will be relevant. I have already made contact with researchers at the Health Protection Agency and at PneumoCarr who are working on vaccine development and vaccine policy recommendations. The PneumoCarr project is only one example of the wide range of academic beneficiaries that I will approach. The grant includes funds for UK travel and some of this will be used to visit beneficiaries in Oxford, Cambridge and Edinburgh to ensure that I am able to communicate my findings directly, in addition to publications and conference presentations. Internationally, such communication will be done during visits to Harvard funded by both this proposal and additional support from T. Cohen, and through colleagues including C. Fraser and M. Lipsitch who work closely with the Health Protection Agency and the American Centre for Disease Control. Infectious diseases are of great relevance to the general public and to public health, and are frequently active topics in the media. I believe that it is important and enjoyable to engage the public, and have found that doing so presents the opportunity to share my own excitement about mathematics, epidemiology and evolution with the public in diverse ways. Over the course of the grant, I will give a Twilight Talk aimed at sharing new advances in sequencing, and what this means for understanding disease, with the public. Twilight Talks are an opportunity to share research in a relaxed setting over a glass of wine, and to continue the discussion informally. Science Cafes are another informal approach to sharing new research with the public and I plan to participate in at least one such event. I will also pursue public engagement activities in an educational context. For example, I recently designed a workshop for a science outreach event aimed at young women, called Skirting Science, held on April 30, 2010, and will participate again in 2011 and/or 2012, incorporating ideas about genes and sequences for pathogens in the future. The workshop presents the mathematics of networks in a fun format, using mazes and friendship networks. The last part of the workshop is a game simulating the spread of H1N1 ``swine'' flu over a network that the students make themselves.


10 25 50
Description I developed theory for how epidemic dynamics shape phylogenies. This links in to the field of phylodynamics. We consider how the theory should adapt to consider the dynamics of host contact networks. We explored the role of ecological competition between pathogen strains and found that it can strongly affect the long-term levels of antibiotic resistance, as well as pathogens' evolution in the face of human intervention. We also found that phylogenetic tree shape contains information about the spreading process, and developed a classifier to read tree shape information and predict patterns of spread.
Exploitation Route The work has implications for predictive models, which are used to inform policy for management of pathogens.
Sectors Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

Description The British Columbia Centre for Disease Control used the results of this work in classifying tuberculosis outbreaks. Ongoing analysis of their TB sequence data is proceeding.
First Year Of Impact 2013
Sector Healthcare
Impact Types Policy & public services

Description EPSRC Fellowship
Amount £1,002,244 (GBP)
Funding ID EP/K026003/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2013 
End 09/2018
Description Topological features of TB phylogenies 
Organisation Provincial Health Services Authority (PHSA)
Department British Columbia Centre for Disease Control (BCCDC)
Country Canada 
Sector Public 
PI Contribution We are collaborating on developing novel approaches to the analysis of whole-genome sequence data for TB outbreaks. Dr Gardy has provided access to whole-genome sequence data for TB outbreaks in British Columbia, together with expertise on SNP calling and data analysis, TB epidemiology and the role of phylogenetic data in public health.
Start Year 2012
Title OutbreakTools 
Description R package for outbreak analysis 
Type Of Technology Webtool/Application 
Year Produced 2014 
Impact Improves outbreak analysis by allowing multiple data sources, visualisation and statistical modelling in one platform. 
URL http://sites.google.com/site/therepiproject/r-pac/about