Understanding the transmission of tuberculosis using Mycobacterium tuberculosis sequence data

Lead Research Organisation: London Sch of Hygiene and Trop Medicine
Department Name: Infectious and Tropical Diseases

Abstract

Tuberculosis is an infectious disease that causes a high public health burden. The World Health Organisation estimates that there are nine million new cases and nearly two million deaths each year. Establishing who transmits to whom and where is fundamental to disease control. By comparing the genetic profiles of tuberculosis-causing bacteria within large population-based studies we can identify likely transmissions based on the similarity of the strains. Our work will focus initially on a district of Malawi where tuberculosis disease prevalence has been high, and data and tuberculosis samples from nearly 2500 patients have been collected over twenty years. The inferred genetic profiles will use a higher proportion of the genome than used previously, allowing more accurate reconstructions of transmission chains. Factors influencing transmissibility will then be assessed directly by looking at the patterns of data within each transmission chain. These factors may be genetic variations within the bacteria themselves, or differences in the human hosts, including age, sex, HIV infection and treatment, and contact patterns. We will attempt to confirm any insights into these factors using data collected and generated from other populations. Ultimately, an improved understanding of the genetic and other processes underlying transmissibility could lead to the development of improved control measures. These could include novel drug or vaccine targets, or identification of geographical or socially determined hotspots of transmission.

Technical Summary

Tuberculosis, caused by Mycobacterium tuberculosis, is an important global public health issue. Understanding the factors underlying transmission is essential for disease control, but surprisingly little is known about the role of pathogen genomic variation, where most infections occur, the importance of host factors, or who transmits to whom. M. tuberculosis can be grouped into seven lineages or phylogenetic clades, and further into sub-lineages, which may vary in propensity to transmit and cause disease. By sequencing M. tuberculosis in a population-based setting it is possible to construct transmission chains using phylogenetic-based algorithms: strains with near identical genomes are most likely to be due to a transmission event, and will appear on the same branch of the phylogenetic tree. However, there are few large population-based studies in high tuberculosis prevalence areas that can apply long-term large-scale whole genome sequencing. Here we focus on the Karonga study in northern Malawi where substantial epidemiological data have been collected and ~2500 sequenced samples are available spanning a 20-year period. Sequencing studies to date have been limited to single nucleotide polymorphisms and have ignored insertions and deletions and highly variable regions such as the pe/ppe gene families. By including all these variations and considering all the tuberculosis cases in the Karongan population, it will be possible to build a probabilistic model of transmissions, and hence assess effects of pathogen variation and host factors (age, sex, HIV status, contact patterns, proximity) on transmissibility using regression-based methods. Further, it will be possible to adopt a genome-wide approach to identify loci in the bacteria that are associated with or under evolutionary selective pressure from a transmissibility phenotype. Loci identified as being associated with transmissibility will be validated using sequence data from other populations.

Planned Impact

The economy
Advances in sequencing technology now allow the genomic characterization of M.tuberculosis (the cause of tuberculosis (TB)) on an unprecedented scale, and have the potential to greatly accelerate research aimed at understanding the biology of the bacterium, its phylogeny and the epidemiology of the disease. The knowledge generated in the project and application of the research could ultimately benefit the pharmaceutical industry and those developing TB diagnostics and vaccines, as well as communities both in the UK and overseas exposed to the disease. Ultimately, through reduced occurrence of TB, the knowledge gained in this study could improve the health and wealth of the nation and globally. The methods used in this project could have application beyond TB, so help more widely in the control and prevention of infectious diseases in both humans and animals, with associated economic benefits.

The general public
M.tuberculosis is a major cause of disease, killing ~2 million people globally each year, and drug resistant forms of TB and HIV are making control difficult. Genomics insights into transmission could lead ultimately to improved control measures adopted globally. The project therefore specifically addresses the MRC strategic aim to impact positively on global health, and to assist with bringing the health impacts of fundamental research to people more quickly.

Academic and industrial organisations
New sequencing technologies have the ability to generate vast amounts of data, but there is a need to translate this information into knowledge useable by other research scientists and industry. Our work will provide tools useful for genomic data analysis and modeling, which can be utilized across infectious diseases and in different settings. An understanding of genomic variation underlying transmission could lead to laboratory experiments for M. tuberculosis pathogenesis and host interaction, improved tests for detecting transmissible M. tuberculosis, and insights for academics involved in policy formulation. Scientific developments arising would enhance the commercial private sector for the production of diagnostics, vaccines and other control measures. We have links with some of these companies (e.g. GSK) and where required will use licensing agreements through the LSHTM technology transfer office to ensure pipelines to vaccine or other translation tool production and exploitation are in place. Developing a basic understanding of the genomic pathways in this study will not only be important for understanding virulence and transmission mechanisms in M.tuberculosis, but has practical applications for other mycobacteria including M.bovis - the cause of TB in humans and cows.
Any technology developed may have enormous implications for policy makers for future disease outbreaks and impact on exports.

Training opportunities
The proposal will employ and train and develop a scientist with diverse experience with an 'omic mentality that can be applied in academia, the public sector and industry. The multidisciplinary project team will add to the UK science base in an important and economically vital research area. The researchers working on the project will develop team working and project management skills, which they can apply in all employment sectors. Importantly, the scope for multidisciplinary interactions in this proposal should not be underestimated. The researcher employed to carry out the planned activities will have unique opportunities for engagement with experts (e.g. in the LSHTM TB Centre) in TB biology, biotechnology, clinical care, genomic epidemiology, and public health. Thus, our proposal will impact on the creation of human resources that could subsequently be employed in challenging interdisciplinary projects in industry, academia and government.

Publications

10 25 50

publication icon
Benavente ED (2018) A reference genome and methylome for the Plasmodium knowlesi A1-H.1 line. in International journal for parasitology

 
Description BBSRC UK-Philippines Swine & Poultry Research Initiative
Amount £600,000 (GBP)
Funding ID BB/R013063/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 05/2018 
End 03/2021
 
Description MRC - NSTDA - Newton: UK-Thailand Joint Initiative on Infectious Diseases
Amount £600,000 (GBP)
Funding ID MR/R020973/1 
Organisation Medical Research Council (MRC) 
Sector Academic/University
Country United Kingdom
Start 04/2018 
End 04/2021
 
Description MRC Newton UK-PCHRD Joint Research Health Call. Using host-responses and pathogen genomics to improve TB diagnostics
Amount £750,000 (GBP)
Funding ID N/A 
Organisation Newton Fund 
Sector Public
Country United Kingdom
Start 10/2018 
End 09/2021
 
Description Newton Institutional Links Grant
Amount £279,000 (GBP)
Funding ID 261868591 
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2017 
End 04/2019
 
Description Newton Researcher Links Workshop Grants (Infectious Disease 'Omics (Philippines) )
Amount £60,000 (GBP)
Funding ID 2017-RLWK8-10671 
Organisation Newton Fund 
Sector Public
Country United Kingdom
Start 01/2018 
End 12/2018
 
Description Newton Researcher Links Workshop Grants (Infectious Disease 'Omics (Philippines) )
Amount £60,000 (GBP)
Funding ID Ref. 2017-RLWK9-110970 
Organisation Newton Fund 
Sector Public
Country United Kingdom
Start 04/2018 
End 12/2018
 
Title Algorithm for disentangling mixed infections using Mycobacterium tuberculosis whole genome sequencing data 
Description Using whole genome sequencing data we developed an algorithm for detecting mixed infections, estimating the multiplicity, and inferring parental sequences. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact This will have impact as mixed infections are important. A manuscript is under review, and the software will be made available shortly. 
 
Title Methods to infer mixed infections 
Description This is an approach to characterising the extent of mixed infection in a sample. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Others are citing our work and we are providing software tools for implementation on TB and other pathogens with genomic data. 
 
Title Tools for inferring transmission chains 
Description This is a tool to infer transmission chains from whole genome sequence data, and identify mutations associated with transmissibility. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? No  
Impact This will have impact as disease transmission is an important research area. A manuscript is in preparation, and software will be made available. 
 
Title Karonga sequencing database 
Description All raw and processed sequence data across ~3000 Mycobacterium tuberculosis. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Scientific publications. 
 
Description Sequencing - GIS 
Organisation Agency for Science, Technology and Research (A*STAR)
Department Genome Institute of Singapore
Country Singapore 
Sector Academic/University 
PI Contribution Samples for pacino sequencing
Collaborator Contribution Sequencing data.
Impact Sequence data, and scientific publications.
Start Year 2016
 
Description A genomics workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact It was a 3 day genomics capacity building workshop and two day symposium (July 2017) at the University of Philippines, which was attended by >150 people.
Year(s) Of Engagement Activity 2017
URL https://www.up.edu.ph/index.php/ups-genome-center-holds-international-workshop-on-epidemiology-of-in...
 
Description Capacity building workshop in genomics in LSHTM 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact 30 people attended a workshop covering TB genomics, which covered the analysis of transmission chains, phylogenies and association studies. A follow-up grant application was submitted that covered Malawi TB host and pathogen genomics.
Year(s) Of Engagement Activity 2018