Prediction of phenotype from genotype with respect to bacterial infection

Lead Research Organisation: University of Nottingham

Department Name: School of Veterinary Medicine and Sci

Abstract

The DNA sequence of a bacterial genome (genotype) encodes all the traits (phenotypes) displayed by an individual organism. DNA sequences can now be read quickly and cost effectively, providing new opportunities to exploit this information in the control of bacterial diseases.
This project will produce the information required to successfully predict the ability of a bacterium to cause disease (virulence) directly from its genome. To achieve this, it will be necessary to identify the bacterial components required for virulence, their variation and determine how such variation impacts the virulence phenotype.
This investigation will focus on bovine mastitis, the most common infectious disease of dairy cattle (~1 million cases /year in the UK). Mastitis typically results from intramammary infection by one of a variety of different bacterial species; the most common in the UK being Streptococcus uberis (~30% of all cases) and this bacterium will be the subject of this study.
Signs of mastitis result from spiralling inflammation due to the inability of the immune defences to clear the infection. The disease results in damage to milk producing tissue and under production of milk and production of milk that is unfit for human consumption. In the UK alone, mastitis results in lost milk production equivalent to ~1M tonnes. The endemic nature of the disease creates major inefficiencies within dairy production, leading to substantial economic impact and significant effects on animal welfare and the environmental sustainability of dairy farming. Control (prevention and treatment) of mastitis to its current level in the UK requires ~3 tonnes of antibiotic /year and new, more effective methods of prevention / treatment are required urgently.
In this project, a combination of mutation and DNA sequencing technology will be used to identify genes in S. uberis that contribute to the initial stages of disease. Genetic lesions (mutations) introduced in to the bacterium (mutant) will be located on the genome by DNA sequencing. Analysis of 100,000 bacterial mutants, each carrying a different mutation, before and after colonisation of the mammary gland will permit identification of genes required for colonisation. The same technology will be used to identify genes required at specific stages of colonisation (growth in milk, resistance to immune defences) in laboratory models.
Not all strains of S. uberis are equally virulent. Some readily cause mastitis, while others only transiently infect the mammary gland in low numbers. We will sequence the genome of 600 strains of S. uberis from diseased and non-diseased animals and quantify the ability of 500 of these to colonise the mammary gland, grow in milk and resist the mammary gland immune defences. These two data sets will be analysed in a variety ways. A detailed comparison of those genes previously identified as important for virulence will determine those individual and combinations of gene variants that align with virulence. Also, a computer based analysis will compare short lengths of each genome sequence to all other genome sequences to provide a profile of DNA sequences that defines more virulent from less virulent in a quantifiable manner.
A combination of these analyses will be used to predict the virulence of 100 S. uberis strains directly from their genome sequence and these strains will be tested in the relevant model system to determine the accuracy of the prediction.
Conducting this investigation will not only provide an example study of how genome sequences may be related to virulence in an effective and useful manner, but will also generate a robust platform of information that will underpin scientifically robust development of interventions to control a major infectious disease with substantial implications for animal welfare and environmentally sustainable food production

Technical Summary

A combination of insertional mutation mapping and detailed genomic analysis will be used to identify genetic variation with causal links to virulence in the major mastitis pathogen, Streptococcus uberis.
Sequences essential for the initial stages of pathogenesis: bacterial colonisation of the bovine mammary gland (growth in milk and survival/proliferation in the presence of the resident mammary gland leukocytes) will be determined using the mutation mapping system (PIMMS).
The same models of virulence will be used to determine the phenotypes of 500 strains of S. uberis. As bacterial fitness (number) is the read out for each phenotype, this procedure will be streamlined by pooling strains that may be distinguished by sequencing strain specific alleles. A phenotypic Index (PIx) will be generated for each strain by comparison of sequence read number to that obtained from an internal standard strain.
Sequence variation relating to PIx will be analysed in a number of ways. Phenotype specific MLST (phMLST) will be established on PubMLST by selection of sequences deemed to be essential for each phenotype. Indexation of alleles of essential genes using the principles of MLST will facilitate identification of those that segregate with virulence and permit strain comparisons by phMLST within a portable system. Further investigation of sequences in relation to virulence will be conducted using the GWAS tool, pyseer. Genomes fragmented into k-mers (DNA words of length k) will be aligned within the 500 genome sequenced and phenotyped strains. Selection by PIx will identify sequences that define a phenotype and quantify the effect of sequence variation within the population.
Using a combination of the outputs from pyseer, phMLST and PIMMS, virulence phenotypes will be predicted directly from genome sequences. Strains with different predicted phenotypes will be analysed in the in vitro and in vivo models to determine accuracy and direct refinement of the predictions.

Planned Impact

This programme of research will deliver impact to different groups of beneficiaries throughout the project and beyond, through a route of exploitation. These beneficiaries include; the scientific community; the pharmaceutical and diagnostics industries; the veterinary profession, the science funding bodies and the general public.
Scientific community - Benefit will be accrued via establishment and dissemination of accessible new technologies reagents and data. This project will provide a practical example of how the current bottleneck between the ability to generate genome sequence data and functional evaluation of genetic variation may be overcome. By sharing data and tools through a proven route of data dissemination (PubMLST) it will provide evidence of how to catalogue and evaluate such data within an existing and well established framework. Application of the described technology will accelerate knowledge generation, and as the use of functional genomic data becomes more widespread it will feed directly into the growth in artificial intelligent analysis systems.
Pharmaceutical and diagnostic industries - Benefits will be acquired through the knowledge gained on host pathogen interaction. Objective 1 of this project will provide a list of candidates against which vaccines and/or new therapeutics may be developed. Objective 2 will refine these data indicating the potential variation of such targets within the population and objectives 2 and 3 will produce information regarding the relevance of these variants to the phenotype, thereby indicating enhanced or diminished functionality. The ability to predict virulence accurately will impact the potential use and development of diagnostics. Currently, intramammary infection can be detected readily and early treatment (prior to signs of disease) would enhance animal welfare. However, early treatment of infections that are likely to resolve (without disease) would significantly increase the use of antibiotics. Due to the requirement to discard milk from treated animals such "over treatment" would further reduce the efficiency / sustainability of milk production. Therefore, the ability to confidently predict the outcome an infection would permit re-evaluation of the current diagnostic approach and management of intramammary infection of the dairy cow.
Veterinary profession - Benefit will be realised from the commercial development of new therapies, vaccines, treatments and diagnostics; increasing the array of available tools and treatment options for disease management. This would start to reduce the use of antibiotics and offset the selective pressure on antimicrobial resistance.
Science funding bodies - By establishing the utility of this approach to rapidly and effectively functionalise genomic data in the context of a particular phenotype will enable funding to be targeted at key problems relating to bacterial infection. This would enable scientific research to contribute directly to key objectives, including those published by the UKRI, relating to AMR, disease prevention, sustainable food production and enhancing animal welfare.
General public - Successful completion of this project and initiation of product development (vaccines, novel therapies, disease-specific diagnostics) will have impact on the population at several levels. Firstly, knowledge that scientific funding is directed at endeavour and achievement of value in the real world. Secondly, that the products that may ensue have impactful significance with respect to: animal welfare (prevention of disease or the ability to diagnose and treat disease before welfare is affected) and environmental sustainability (enhanced productivity by reducing disease associated losses, reduction of non-productive greenhouse gas emission, decreased use of antibiotics in the food chain).

Funded Value:

£532,840

Funded Period:

Dec 19 - Dec 23

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/T001933/1

Principal Investigator:

James Leigh

Research Subject:

Agri-environmental science (30%)

Animal science (40%)

Microbial sciences (20%)

Tools, technologies & methods (10%)

Research Topic:

Agricultural systems (30%)

Animal diseases (40%)

Bioinformatics (10%)

Microorganisms (20%)

Organisations

People	ORCID iD
James Leigh (Principal Investigator)	http://orcid.org/0000-0002-0307-2814
Sharon Egan (Co-Investigator)
Keith Jolley (Co-Investigator)
Richard Emes (Co-Investigator)	http://orcid.org/0000-0001-6855-5481
Adam Blanchard (Co-Investigator)	http://orcid.org/0000-0001-6991-7210
Tracey Coffey (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Blanchard A (2024) PIMMS-Dash: Accessible analysis, interrogation, and visualisation of high-throughput transposon insertion sequencing (TIS) data in Computational and Structural Biotechnology Journal

Whiley D (2024) A core genome multi-locus sequence typing scheme for Streptococcus uberis: an evolution in typing a genetically diverse pathogen. in Microbial genomics

Key Findings


Description	We have used modern molecular techniques to identify the genes required by Streptococcus uberis to (i) grow in bovine milk (ii) grow in bovine milk in the presence of serum (which leaks into the bovine mammary gland during bacterial colonisation) and (iii) colonise the bovine mammary gland. These data contribute to our understanding of how this pathogen causes disease and will facilitate future development of both therapeutic and prophylactic interventions to reduce the disease and its consequences for the sustainability of the dairy industry. We have developed a core genome typing scheme for S. uberis, allowing those investigating this pathogen to evaluate their findings against a global bacterial population. This will provide insight into the development and effect of intervention strategies.
Exploitation Route	The data produced during this study will result in a number of scientific papers (5 currently under preparation) , the first of which is undergoing minor revision prior to publication. The typing scheme developed during this project will provide a valuable tool for those in the filed enabling detailed epidemiology and global evaluation of pathogen variation. The bacterial components identified as vital to colonisation will become candidate for investigation as vaccines
Sectors	Agriculture Food and Drink Pharmaceuticals and Medical Biotechnology