Prediction of phenotype from genotype with respect to bacterial infection

Lead Research Organisation: University of Nottingham
Department Name: School of Veterinary Medicine and Sci

Abstract

The DNA sequence of a bacterial genome (genotype) encodes all the traits (phenotypes) displayed by an individual organism. DNA sequences can now be read quickly and cost effectively, providing new opportunities to exploit this information in the control of bacterial diseases.
This project will produce the information required to successfully predict the ability of a bacterium to cause disease (virulence) directly from its genome. To achieve this, it will be necessary to identify the bacterial components required for virulence, their variation and determine how such variation impacts the virulence phenotype.
This investigation will focus on bovine mastitis, the most common infectious disease of dairy cattle (~1 million cases /year in the UK). Mastitis typically results from intramammary infection by one of a variety of different bacterial species; the most common in the UK being Streptococcus uberis (~30% of all cases) and this bacterium will be the subject of this study.
Signs of mastitis result from spiralling inflammation due to the inability of the immune defences to clear the infection. The disease results in damage to milk producing tissue and under production of milk and production of milk that is unfit for human consumption. In the UK alone, mastitis results in lost milk production equivalent to ~1M tonnes. The endemic nature of the disease creates major inefficiencies within dairy production, leading to substantial economic impact and significant effects on animal welfare and the environmental sustainability of dairy farming. Control (prevention and treatment) of mastitis to its current level in the UK requires ~3 tonnes of antibiotic /year and new, more effective methods of prevention / treatment are required urgently.
In this project, a combination of mutation and DNA sequencing technology will be used to identify genes in S. uberis that contribute to the initial stages of disease. Genetic lesions (mutations) introduced in to the bacterium (mutant) will be located on the genome by DNA sequencing. Analysis of 100,000 bacterial mutants, each carrying a different mutation, before and after colonisation of the mammary gland will permit identification of genes required for colonisation. The same technology will be used to identify genes required at specific stages of colonisation (growth in milk, resistance to immune defences) in laboratory models.
Not all strains of S. uberis are equally virulent. Some readily cause mastitis, while others only transiently infect the mammary gland in low numbers. We will sequence the genome of 600 strains of S. uberis from diseased and non-diseased animals and quantify the ability of 500 of these to colonise the mammary gland, grow in milk and resist the mammary gland immune defences. These two data sets will be analysed in a variety ways. A detailed comparison of those genes previously identified as important for virulence will determine those individual and combinations of gene variants that align with virulence. Also, a computer based analysis will compare short lengths of each genome sequence to all other genome sequences to provide a profile of DNA sequences that defines more virulent from less virulent in a quantifiable manner.
A combination of these analyses will be used to predict the virulence of 100 S. uberis strains directly from their genome sequence and these strains will be tested in the relevant model system to determine the accuracy of the prediction.
Conducting this investigation will not only provide an example study of how genome sequences may be related to virulence in an effective and useful manner, but will also generate a robust platform of information that will underpin scientifically robust development of interventions to control a major infectious disease with substantial implications for animal welfare and environmentally sustainable food production

Technical Summary

A combination of insertional mutation mapping and detailed genomic analysis will be used to identify genetic variation with causal links to virulence in the major mastitis pathogen, Streptococcus uberis.
Sequences essential for the initial stages of pathogenesis: bacterial colonisation of the bovine mammary gland (growth in milk and survival/proliferation in the presence of the resident mammary gland leukocytes) will be determined using the mutation mapping system (PIMMS).
The same models of virulence will be used to determine the phenotypes of 500 strains of S. uberis. As bacterial fitness (number) is the read out for each phenotype, this procedure will be streamlined by pooling strains that may be distinguished by sequencing strain specific alleles. A phenotypic Index (PIx) will be generated for each strain by comparison of sequence read number to that obtained from an internal standard strain.
Sequence variation relating to PIx will be analysed in a number of ways. Phenotype specific MLST (phMLST) will be established on PubMLST by selection of sequences deemed to be essential for each phenotype. Indexation of alleles of essential genes using the principles of MLST will facilitate identification of those that segregate with virulence and permit strain comparisons by phMLST within a portable system. Further investigation of sequences in relation to virulence will be conducted using the GWAS tool, pyseer. Genomes fragmented into k-mers (DNA words of length k) will be aligned within the 500 genome sequenced and phenotyped strains. Selection by PIx will identify sequences that define a phenotype and quantify the effect of sequence variation within the population.
Using a combination of the outputs from pyseer, phMLST and PIMMS, virulence phenotypes will be predicted directly from genome sequences. Strains with different predicted phenotypes will be analysed in the in vitro and in vivo models to determine accuracy and direct refinement of the predictions.

Planned Impact

This programme of research will deliver impact to different groups of beneficiaries throughout the project and beyond, through a route of exploitation. These beneficiaries include; the scientific community; the pharmaceutical and diagnostics industries; the veterinary profession, the science funding bodies and the general public.
Scientific community - Benefit will be accrued via establishment and dissemination of accessible new technologies reagents and data. This project will provide a practical example of how the current bottleneck between the ability to generate genome sequence data and functional evaluation of genetic variation may be overcome. By sharing data and tools through a proven route of data dissemination (PubMLST) it will provide evidence of how to catalogue and evaluate such data within an existing and well established framework. Application of the described technology will accelerate knowledge generation, and as the use of functional genomic data becomes more widespread it will feed directly into the growth in artificial intelligent analysis systems.
Pharmaceutical and diagnostic industries - Benefits will be acquired through the knowledge gained on host pathogen interaction. Objective 1 of this project will provide a list of candidates against which vaccines and/or new therapeutics may be developed. Objective 2 will refine these data indicating the potential variation of such targets within the population and objectives 2 and 3 will produce information regarding the relevance of these variants to the phenotype, thereby indicating enhanced or diminished functionality. The ability to predict virulence accurately will impact the potential use and development of diagnostics. Currently, intramammary infection can be detected readily and early treatment (prior to signs of disease) would enhance animal welfare. However, early treatment of infections that are likely to resolve (without disease) would significantly increase the use of antibiotics. Due to the requirement to discard milk from treated animals such "over treatment" would further reduce the efficiency / sustainability of milk production. Therefore, the ability to confidently predict the outcome an infection would permit re-evaluation of the current diagnostic approach and management of intramammary infection of the dairy cow.
Veterinary profession - Benefit will be realised from the commercial development of new therapies, vaccines, treatments and diagnostics; increasing the array of available tools and treatment options for disease management. This would start to reduce the use of antibiotics and offset the selective pressure on antimicrobial resistance.
Science funding bodies - By establishing the utility of this approach to rapidly and effectively functionalise genomic data in the context of a particular phenotype will enable funding to be targeted at key problems relating to bacterial infection. This would enable scientific research to contribute directly to key objectives, including those published by the UKRI, relating to AMR, disease prevention, sustainable food production and enhancing animal welfare.
General public - Successful completion of this project and initiation of product development (vaccines, novel therapies, disease-specific diagnostics) will have impact on the population at several levels. Firstly, knowledge that scientific funding is directed at endeavour and achievement of value in the real world. Secondly, that the products that may ensue have impactful significance with respect to: animal welfare (prevention of disease or the ability to diagnose and treat disease before welfare is affected) and environmental sustainability (enhanced productivity by reducing disease associated losses, reduction of non-productive greenhouse gas emission, decreased use of antibiotics in the food chain).

Publications

10 25 50
 
Description We have completed the identification of the genes required by Streptococcus uberis to grow in bovine milk and in milk in the presence of bovine serum that leaks into the mammary gland during infection and in teh presence of immune cells present naturally in bovine milk within mammary gland . This was achieved by comparison the ability of original strain and mutants carrying lesion in each gene. These data will contribute to our understanding of key aspects of how this pathogen causes disease.
We have completed the genome sequencing of ~600 isolates of Streptococcus uberis and this has permitted comparison of genome sequence with phenotypic traits of each strain. this has enabled identification of genes and specific regions of genes that are associated with the ability of individual strains to perform better in a given situation. These data were compared with those described above (on the dentification of gene essential for a given phenotype). This comparison showed there was good concordance indicating that variation of certain essential genes is likely to underpin strain variation with respect to individual traits.
These data will be used to determine if such sequences can be used to predict the phenotypes of bacterial strains directly from genome sequence data and thus enhanced the practical information content of genomic data. In this specific case providing information pertaining to bacterial virulence.
Exploitation Route The resources, data and techniques described will be of value to researchers wanting to investigate bacterial phenomics. in this case, the ability to extract such information from a relevant pathogen has enabled interaction with the animal health pharmaceutical sector with a view to future development of vaccines. The final interpretation of these data and replication of our studies in vivo (under subsequent objectives of the current project) are likely to identify mechanism of disease pathogenesis that can be exploited for disease control
Sectors Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology