Integrating genomic surveillance and ecological modelling to maximise pneumococcal vaccine efficacy

Lead Research Organisation: Imperial College London
Department Name: School of Public Health

Abstract

Streptococcus pneumoniae, or the pneumococcus, is a bacterium found harmlessly living inside the noses of around half of young children in the UK. When they reach other parts of the body, pneumococci can cause harmful infections such as pneumonia, sepsis or meningitis. This is largely attributable to these bacteria having a capsule that protects them from the immune system. In 2010, the UK introduced a vaccine (called PCV13) that protected against 13 of the approximately 100 known capsule types. This eliminated most of these 13 capsule types from both disease and harmless carriage in infants. However, S. pneumoniae strains not affected by the vaccine increased in number to replace the lost capsule types. These strains did not cause disease in infants so frequently, and therefore PCV13 has reduced the amount of childhood pneumococcal disease. However, the replacing strains appear to be more likely to cause disease in adults, who catch the bacteria from healthy children. Hence the amount of adult pneumococcal disease has gone up since PCV13.

Public Health England (PHE) lead the evaluation of PCV13 in the UK, and their surveillance of pneumococcal disease means they have the largest collection of well-characterised S. pneumoniae bacteria in the world. This project would select isolates from this collection to study using whole genome sequencing, to understand how the genetics of the bacterial population changed before and after vaccines (PCV13, and similar earlier versions) were introduced. New methods of DNA sequence analysis would be employed to merge UK data with that from research work around the world. These would enable the global migration patterns of S. pneumoniae strains to be traced, identifying the main origins of strains that have recently emerged in the UK.

These genetic data will also enable mathematical modelling of the changes in circulating strains caused by PCV13. We have specific hypotheses about the genetics that underlies the changes after vaccination, but previously these have only been tested against bacteria collected from healthy children. This project will expand our models to incorporate the most harmful strains, which are rarely found in the nose. This is critical for understanding why PCV13 had effects on adult disease in the UK that were not common in other countries. This project will also test how accurately these models forecast ongoing trends in S. pneumoniae disease as new surveillance data are collected by PHE. This will help predict whether trends in the overall level of disease are likely to change in the next few years.

We would make the model easily accessible to other scientists, such that they could improve and update it. By continually improving the model, we hope to use it as a tool for identifying the risks associated with each of the next generation of vaccines against S. pneumoniae, which are currently being developed. This would help ensure the UK made the right choice to avoid unintended consequences, as occurred with PCV13, and minimise the national burden of S. pneumoniae disease in both infants and adults. Scientists involved in the project serve on UK and international bodies that advise on the use of these vaccines, and therefore our results will be communicated to relevant agencies around the world. We will also explore whether our models and methods could be helpful in analysing other bacteria, particularly focusing on those for which new vaccines are being developed.

Technical Summary

Streptococcus pneumoniae (the pneumococcus) is a nasopharyngeal commensal and respiratory pathogen that causes thousands of invasive pneumococcal disease (IPD) cases in the UK annually. Polysaccharide conjugate vaccines (PCVs) have reduced the incidence of infant IPD through eliminating most vaccine-type pneumococcal serotypes. However, their replacement with non-vaccine serotypes has caused the UK IPD incidence in the elderly to rise in recent years. This project will employ a combination of genomics and mathematical modelling to understand these contrasting demographic trends, not commonly observed in other countries. The underlying hypothesis is that carried S. pneumoniae population dynamics will be governed both by PCV-induced herd immunity and multi-locus negative frequency-dependent selection (NFDS), a model of bacterial ecology in which NFDS maintains common accessory genome loci at their pre-vaccination 'equilibrium frequencies'. These will be established by sequencing historic IPD isolates from infants and adults at timepoints matched to carriage surveys, for which genomic data will also be available. The multi-locus NFDS model will be reimplemented to combine carriage and IPD data to enable population dynamic modelling that includes the highly-invasive strains driving post-PCV13 disease trends in UK adults. Recently-developed scalable genomic surveillance methods will be used to integrate these data with global research collections, to understand the contribution of global strain migration patterns to post-PCV population restructuring, and ongoing IPD surveillance, to enable evaluation of models' forecasting accuracy. The model and data will be made flexible and open source, to provide the opportunity for other researchers to test alternative ecological model structures. Ultimately this work aims to aid policy decisions with regard to the introduction of PCV formulations, and guide the design of superior vaccines.

Planned Impact

Public health agencies will benefit from the research by improving their capacity to undertake genomic surveillance and monitor the population dynamics of high-risk strains. The project will involve collaborations to improve current sequence analysis techniques and build on databases generated by research projects analysing global bacterial populations. This will enable research methods to be developed such that they are appropriate for application to national vaccine surveillance programmes. Once validated and implemented through the online pathogen.watch platform, these methods will be quick and convenient for infectious disease epidemiologists to use for analysis and sharing of genomic data between countries. WHO working groups, workshops and online tutorials will be used to disseminate information on how to use these new approaches. The species-agnostic nature of these methods means they can easily be deployed across other bacterial pathogens with complex population structures. Hence, during the project, the work will have a valuable role in developing and communicating improved molecular epidemiology methods.

Those working in healthcare systems will benefit from the research through the development of models that forecast pneumococcal disease trends. This project will aim to develop open-source, easy-to-use ecological models that can be integrated with, and validated against, surveillance data. These can be used to predict trends, such as whether the presently observed rise in adult pneumococcal disease will plateau, or continue rising. Any advance warnings of substantial changes can trigger awareness campaigns or health system preparedness, if appropriate. As these models may be applicable to similarly diverse bacterial pathogens (e.g. they have been applied to understanding the emergence of multidrug-resistant Escherichia coli), they promise to provide early warning of a variety of infectious disease challenges.

Policy makers at the national and international level will benefit from the research in the coming years, when the next generation of pneumococcal vaccines are introduced. The application of these predictive models as a tool for evaluating different available vaccine formulations will enable the selection of the option most likely to minimise disease in each country, in spite of the problematic serotype replacement process. This is critical for reducing the burden of bacterial disease. The research team includes representatives of both UK and WHO vaccine policy advisory groups, to ensure the efficacy of modelling approaches can be communicated to those who might find them useful, to improve the effectiveness of policy-making within the UK and across other national public health agencies.

Over longer timescales, the commercial private sector will also benefit from the research. The ecological modelling itself will allow the identification of hypothetical vaccine formulations that are optimal, based on the UK S. pneumoniae population. This project will therefore recommend which antigens should be prioritised for inclusion in the next generation of pneumococcal vaccines. More generally, the models developed in this work will enable pharmaceutical companies to consider the ecology of complex bacterial species in the design of partial-coverage or strain-specific vaccines from the outset, reducing the risks of unforeseen consequences that may result from the introduction of such interventions.

Ultimately, the wider public will benefit from the research through improvements to the nation's health. This should initially result from better-informed policy-making, and later should reflect the manufacture of improved vaccine designs. During the project, any public communication regarding the important issues raised by this research will be managed by PHE, to ensure any potentially controversial messages are delivered in an appropriate manner.
 
Description Presentation to JCVI subcommittee on pneumococcal vaccination
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Title Gubbins phylogenetic software 
Description Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of short-term bacterial evolution, and can be run in only a few hours on alignments of hundreds of bacterial genome sequences. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact [14:52] Croucher, Nicholas J Impact: new version updates the algorithm to use the latest phylogenetic software, to allow for the continuing use of this software, which has already been cited >800 times. 
URL https://github.com/sanger-pathogens/gubbins
 
Title PopPUNK 
Description PopPUNK is software for bacterial genomic epidemiology. It can rapidly calculate core and accessory distances between genomes, use these distances to cluster genomes, assign clusters to new genomes using an existing database, and produce visualisations of these outputs. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact Impact: release of a new version that extends the application of this algorithm to larger bacterial datasets, and to datasets of non-bacterial pathogens. 
URL https://github.com/johnlees/PopPUNK
 
Title progressionEstimation Rstan package 
Description This package uses Bayesian models implemented in stan to estimate the rates at which microbes progress from carriage to disease using case and carrier data. 
Type Of Material Computer model/algorithm 
Year Produced 2022 
Provided To Others? Yes  
Impact Application of estimates from this package are being used to evaluate alternative vaccination policies. 
URL https://github.com/nickjcroucher/progressionEstimation
 
Description Genomic epidemiology methods development 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Co-development of software (https://github.com/bacpop/PopPUNK) and web resources (https://beebop.dide.ic.ac.uk).
Collaborator Contribution Advising and contributing to software development and data analysis.
Impact Online tool for genomic epidemiology (https://beebop.dide.ic.ac.uk) and associated software.
Start Year 2022
 
Description Pneumococcal protein vaccine feasibility 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution Identification of candidate antigens to use in a protein vaccine against pneumococci.
Collaborator Contribution They are identifying the diversity of these proteins in circulating pneumococcal populations and ascertaining the immunogenicity of these antigens in an animal model of disease.
Impact Award of a grant from Bactivac that funds the experimental work.
Start Year 2022
 
Description Pneumococcal vaccine modelling collaboration 
Organisation London School of Hygiene and Tropical Medicine (LSHTM)
Country United Kingdom 
Sector Academic/University 
PI Contribution We are now working jointly with LSHTM, alongside PHE, to model the serotype replacement processes expected to occur when new pneumococcal conjugate vaccines are introduced.
Collaborator Contribution Prof. Elizabeth Miller and Dr. Stefan Flasche provide expert knowledge of pneumococcal epidemiology and transmission dynamic modelling.
Impact The generation of a pneumococcal genomic dataset, processed in collaboration between Imperial, PHE and LSHTM: https://www.mdpi.com/2073-4425/10/9/687/htm.
Start Year 2020
 
Title BeeBOP - online surveillance tool 
Description Online surveillance tool for pathogen genomic epidemiology. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Tool used globally for surveillance of pneumococcal disease. 
URL https://beebop.dide.ic.ac.uk/
 
Title Gubbins 
Description Since the introduction of high-throughput, second-generation DNA sequencing technologies, there has been an enormous increase in the size of datasets being used for estimating bacterial population phylodynamics. Although many phylogenetic techniques are scalable to hundreds of bacterial genomes, methods which have been used for mitigating the effect of horizontal sequence transfer on phylogenetic reconstructions cannot cope with these new datasets. Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of short-term bacterial evolution, and can be run in only a few hours on alignments of hundreds of bacterial genome sequences. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact gubbins has been cited in over 1000 publications for genomic epidemiology of bacterial pathogens by other research group globally 
 
Title progressionEstimation 
Description This package uses Bayesian models implemented in stan to estimate the rates at which microbes progress from carriage to disease using case and carrier data. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Not yet recognised 
 
Description Pneumococcal genomic surveillance workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Describe the development of a new genomic epidemiology tool, demonstrate its intended capabilities, and ask for feedback for future development.
Year(s) Of Engagement Activity 2022
URL https://isppd.kenes.com/workshops-2022/
 
Description Seminars at Vienna University and Drexel University 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Participation in invited seminars to postgraduate students to discuss applications of genomic epidemiology tools and models.
This sparked discussions and questions from audience members and positive feedback was received from participants and organisers
Year(s) Of Engagement Activity 2021