Machine-learning to predict and understand the zoonotic threat of E. coli O157 isolates

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Enterohemorrhagic Escherichia coli (EHEC) O157 are bacteria that have their main reservoir in food production animals, predominately cattle, and can be responsible for serious and life-threatening infections in humans. There are specific factors that define EHEC O157, including a micro-injection (type 3 secretion) system and production of specific Shiga toxins. However, we have known for nearly twenty years that not all subtypes represent the same threat to human health and significant effort has gone into understanding why this is the case. On key reason is that there are different Shiga toxin types, some potentially more toxic than others, and their production levels differ between isolates. This variability comes from the fact that Shiga toxins are introduced into the bacteria by infection with bacterial viruses, known as bacteriophages. These integrate their DNA into the bacterial genome in a 'prophage' state. When the bacterial cell is threatened this can activate the prophage to produce copies of itself and new bacteriophages. From whole genome sequencing of E. coli we are now aware that multiple prophages are present in E. coli genomes, some in different states of decay, but they can impact on each other and recombine to produce new variants. Much of the differences between E. coli O157 isolates are down to their prophage content yet sequence identification methods generally use only 'core' genes for epidemiological studies.
We have recently applied machine-learning approaches to examine whole genome sequences of E. coli O157 from cattle and humans. We use these as training sets and then ask it to predict which group other E. coli O157 isolates should be assigned to. Surprisingly it only assigns a small proportion (<10%) of isolates from cattle to the human grouping, indicating that only this small subset may be more of a threat to human health. This grant is to investigate the biological basis of this selection process. We know that the machine-learning assignment is based on discriminatory protein variants predicted to be expressed from mainly prophage genes, so this fits with our understanding of the variation present in these isolates. The proposed work will be a combination of bioinformatics research and 'wet' infection biology research. For the bioinformatics we can use subjective and objective approaches to swap gene variants, including whole prophage, between isolate sequences and re-calculate their host prediction scores. This will allow us to define the most important combinations of genes being used for the prediction of zoonotic potential. It may also highlight specific genes to simplify the identification process. In the laboratory we will initially compare isolates that are very similar at the core genome level but differ markedly in their prediction scores. We will examine their gene expression profiles, metabolic profiles and key phenotypes such as Shiga toxin production, cellular interactions and pathology in a mouse model. Then we will swap or mutate genes identified by the bioinformatics and test these strain variants in the same laboratory assays.
The research should help validate this exciting new approach to understanding bacterial virulence and identify genes involved in the zoonotic threat of this dangerous pathogen. We should then be able to develop simpler approaches to identifying these specific variants on farms and intervene with, for example a vaccine, to reduce the threat to human health. The approach may also work to predict differences in virulence between human isolates and this could have repercussions for how specific outbreaks are managed. This research is timely as it builds on our recent and unique application of machine learning to predict zoonotic potential and access to fully annotated PacBio sequences of UK cattle and human E. coli O157 isolates generated in partnership with Dr James Bono (USDA, Nebraska).

Technical Summary

Enterohaemorrhagic E. coli (EHEC) O157 lysogenized with Shiga toxin 2a (Stx2a)-encoding bacteriophages have become prevalent in cattle in the UK in the last 30 years and this timeframe matches the emergence of serious EHEC-associated human disease. Cattle are an asymptomatic primary reservoir for this zoonosis which can cause bloody diarrhoea and kidney/brain damage in humans. Whole genome sequencing has demonstrated the mosaic nature of the E. coli O157 genome and multiple prophages contribute to diversity of this serotype. Based on whole genome sequences, we have recently used support vector machine, a machine-learning algorithm, to predict the zoonotic potential of cattle isolates. The main conclusion was that only a small fraction of the bovine isolates (<10%) may be a threat to human health, even within previously defined pathogenic lineages. The prediction probabilities are based primarily on prophage-associated differential protein variants (PVs) extracted from sequence assemblies. The proposed study will combine bioinformatics and laboratory research to define key prophage regions important for prediction and investigate how they impact on pathogen biology. The computer-based studies will focus on in silico recombination, decision trees/random forests and genetic algorithms to define critical combinations of PVs. The laboratory work will initially study paired isolates with similar core genomes but markedly different prediction scores. Transcriptomic analysis, metabolic profiling, Shiga toxin production, cellular interactions and toxin pathology in a mouse model will be studied. Isogenic mutants of prophage regions identified by the bioinformatics analyses will then be characterised in the same laboratory assays. The research aims to identify differential genes responsible for zoonotic potential and use this information to simplify assessment of farm isolates to allow targeted interventions.

Planned Impact

The Edinburgh EHEC grouping has been growing since 1999 and now has links to an extensive network of scientists nationally and internationally covering epidemiology, molecular biology, health impact and possible interventions; this includes collaborations with basic research groups, animal scientists, diagnostic and public health laboratories (PHE, SERl, HPS). A good relationship with FSS/FSA further links us through to food producers, politicians and the general public. Knowledge exchange will be maintained with these groups by twice-yearly meetings which currently occur under our FSS programme (end 08/17) but this momentum will be maintained under this grant. We will also host a specific symposium at the Roslin Institute in 2019 to discuss the application of machine learning to interrogate both zoonotic potential and host source. We are currently helping SERL with the installation of a bioinformatics pipeline based on core SNP differences (developed at PHE) and we aim for this to expanded to include prophage profiles and host prediction scores. A longer-term objective is to work with Health Protection Scotland (HPS) to understand how the SVM prediction scores may relate to patient pathology, with a direct impact on outbreak management. Our group works hard to ensure we obtain published outputs from our research and we have a good track record in this and delivery of seminars across the country and abroad. We currently are part of two international partnership awards, one with researchers in Argentina, which has some of the highest rates of EHEC disease in the world. This application builds on this award by using their in vivo infection model and further researcher exchange. Another partnership award is focused on vaccine development against bacterial zoonoses originating from livestock with groups in the USA and our partnership with Jim Bono (USDA) will further develop this network. Another important impact is through the training of veterinary undergraduates at the University of Edinburgh through lectures and tutorials that benefit from the advances made in this research. This feeds through to the important role that veterinary clinicians have in working with commercial producers and the public to raise awareness of such infections.
We envisage the possibility of herd testing and application of a vaccine or alternative intervention based on identification of strains with high zoonotic potential using the SVM method. Development of a multiplex PCR or alternative test could be applied to screen herds to identify those that should be targeted. Impact for this is through a commercial partner potentially allied to ongoing research on E. coli O157 vaccines. We are currently in negotiation over the licencing of our vaccine patents and with an intention to test the vaccine in a feedlot trial in collaboration with the USDA (Dr Jim Bono, Nebraska). The machine-learning (SVM) method is also proving accurate for predicting the isolation host of E. coli in general (not just EHEC) and this could have important repercussions for food, health and environmental sampling. We will work to achieve this with our dedicated business development operatives from Edinburgh Research and Innovation (ERI), a non-profit subsidiary company of the University of Edinburgh, who are based at the Roslin Institute.
EHEC O157:H7 and other Stx-associated infections generate considerable public interest and we are committed to disseminating the as widely as possible. The Roslin Institute provides information about our research through our web site (http://www.roslin.ac.uk/), talks and discussion groups and direct interaction with the media. Each investigator & PDRA on the grant will be expected to spend ~2 days/yr in direct engagement with the public & schools including participation in our yearly 'open doors' events. Direct impact is also achieved through training of these staff in diverse skills including in bioinformatics and molecular biology.

Publications

10 25 50
 
Description We aimied to use machine learning methods to predict both the host source of bacterial isolates and their likely threat to human health. This grant is focused on E. coli O157 isolates that transmit from cattle as the primary reservoir to humans and so this specific research is about threat analysis and understanding the underpinning reasons at the sequence level. Our work previous work had identified that specific subsets of isolates from cattle have higher scores for human association. We infer from this that these isolates are more likely to cause human disease and so we are trying to understand the genetic basis of this. This crosses over with our epidemiological information where we know that a specific subset of UK E. coli O157 strains are more of a threat to human health, i.e. certain subtypes only are associated with the most severe infections. We are currently trying to obtain more metadata in relation to human outbreaks against which genomic features can be correlated. A key part of this grant has been investigating large genome rearrangements that we have detected in strains and how these duplications and inversions alter the predictive scores and how the rearrangements may change bacterial 'behaviour' i.e. phenotypes that are important for cattle and human colonisation/disease. Our hypothesis is that such changes occur routinely and give strains greater plasticity to survive transitions between hosts and potentially to infect hosts. We are investigating exactly which phenotypes relating to virulence may be affected, including production of Shiga toxin, the main factor associated with pathology in humans.
As well as introducing genes that can contribute to the virulence of a strain, prophage can enable the generation of large-chromosomal rearrangements (LCRs) by homologous recombination. This work examines the types and frequencies of LCRs across the major lineages of the O157 serogroup and defines the phenotypic consequences of specific structural variants. We demonstrate that LCRs are a major source of genomic variation across all lineages of E. coli O157 and by using both optical mapping and ONT long-read sequencing demonstrate that LCRs are generated in laboratory cultures started from a single colony and particular variants are selected during animal colonisation. LCRs are biased towards the terminus region of the genome and are bounded by specific prophages that share large regions of sequence homology associated with the recombinational activity. RNA transcriptional profiling and phenotyping of specific structural variants indicated that important virulence phenotypes such as Shiga toxin production, type 3 secretion and motility are affected by LCRs. In summary, E. coli O157 has acquired multiple prophage regions over time that act as genome engineers to continually produce structural variants of the genome. This structural variation is a form of epigenetic regulation that generates sub-population phenotypic heterogeneity with important implications for bacterial adaptation and survival.
Exploitation Route We continue to examine how we can use machine learning approaches to determine how it can be used to predict both the source of an isolate an its health threat. We are collaborating with both Public Health England and the Food Inspection and Inspection Service (USDA) to examine how best to apply these approaches for risk management.
Sectors Agriculture, Food and Drink,Healthcare

URL https://www.ed.ac.uk/roslin/community-engagement/ag100/current-projects/machine-learning-fight-disease
 
Description Infections with bacteria encoding Shiga toxins can be lethal and are also associated with long term morbidity often as a result of kidney damage requiring repeated dialysis treatments and an eventual transplant. The aim of this work was to define the genetic regions of isolates predicted to have higher zoonotic potential. Routine whole genome sequencing of human EHEC isolates was carried at Public Health England (PHE), now UK-HSA and the Scottish E. coli Reference Laboratory (SERL). We have helped SERL with the installation of a bioinformatics pipeline for providing EHEC outbreak isolates with unique identifiers based on core SNP differences (initially developed by Dr Tim Dallman when at PHE) and has been expanded to consider their prophage profiles and host prediction scores. A longer- term objective is to work with Health Protection Scotland (HPS) to understand how our scores may relate to levels of pathology in patients, with the hypothesis that more bovine -associated scores may be less virulent. If this is the case then it could alter how specific infections are handled depending on the perceived threat of the isolate. We still need develop better ways of understanding the biological significance of the most important features, especially those that lead to strains with high levels of toxicity by impacting on Shiga toxin expression levels. Tracking the source of infections is often critical to outbreak investigations. We consider that the machine learning approaches will have value in epidemiology investigations and will be of interest to FSA/FSS and their stakeholders, from farmer, food producers, packagers, consumers and politicians. The machine-learning (initially SVM but now Random Forest) methods are proving accurate for predicting the isolation host of E. coli in general (not just EHEC) and this could have important repercussions for food, health and environmental sampling. These approaches are now being applied to source attribution for Salmonella servers including Typhimurium. Understanding the genetic basis to our zoonotic prediction scores should provide confidence in the approach at a commercial level. At this point we would envisage the possibility of herd testing and application of a vaccine or alternative intervention based on identification of strains with high zoonotic potential. Currently, a whole genome sequencing approach is probably too expensive, although the costs are generally always falling and so an important aim of our research is to define a limited number of protein variants (PVs) that work well as a proxy for the SVM scoring. At this point a multiplex PCR or alternative test could be applied to screen herds to identify those that should be targeted. The pathway for this is allied to our ongoing research on E. coli O157 vaccines and this continues to have commercial interest. In 2019, Roslin Technologies invested in our vaccine and funded production of the antigens in the USA and supported a feedlot trial in collaboration with USDA in 2020-21 (Dr Jim Bono, Nebraska). The data from this trial is now being analysed for future commercial decisions.
First Year Of Impact 2017
Sector Agriculture, Food and Drink,Healthcare
Impact Types Policy & public services

 
Description Advanced phage therapy for multidrug resistant E. coliassociated with canine urinary tract infections
Amount £162,630 (GBP)
Organisation Dogs Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2021 
End 12/2022
 
Description Precision bacteriophage identification through machine learning for mitigating persistent colonization of Shiga toxin-producing Escherichia coli O157:H7 in cattle
Amount $365,900 (USD)
Organisation U.S. Department of Agriculture USDA 
Department National Institute for Food and Agriculture
Sector Public
Country United States
Start 04/2021 
End 03/2023
 
Description Sub-award from BBSRC Impact Accelerator Account BB/S506722/1. Professor David Gally 'Using bacteriophage to remove Escherichia coli O157:H7 from cattle colonised at the terminal rectum'
Amount £33,439 (GBP)
Funding ID BB/S506722/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 03/2021
 
Description Collaboration with Public Health England 
Organisation Public Health England
Country United Kingdom 
Sector Public 
PI Contribution Provision of animal and human STEC for sequencing, working with PHE to analyse strain phylogeny and epidemiology. We have contributed through further analysis of long read strain sequences to understand changes in strains that occur during outbreaks. We have co-upervised 2 PhD students on STEC bioinformatics projects.
Collaborator Contribution Reduced rate sequencing of STEC, analysis of data, provision of metadata. Co-publication
Impact Publications as in main list
Start Year 2013
 
Description Collaborative research with Public Health England 
Organisation Public Health England
Country United Kingdom 
Sector Public 
PI Contribution We provide samples and analysis of bacteria that are present in livestock that are the potential source of infections in humans
Collaborator Contribution PHE analyse bacterial infections in humans and so by working in partnership we can improve our capacity to determine the source of particular infections in humans and hopefully prevent or limit these to improve public health
Impact Outputs are publications as defined elsewhere as well as improved value to PHE surveillance
Start Year 2015
 
Description EHEC O157 research groups in Argentina 
Organisation National Scientific and Technical Research Council (Argentina)
Country Argentina 
Sector Public 
PI Contribution This award is a 'partnering award' and it has been successful in leading to research exchange trips and discussion between our laboratory and several groups in Argentina, primarily Dr Marina Palermo, CONICET, Buenos Aires and Dr Angel Cataldi, Instituto de Biotecnología, Hurlingham. We have provided genomics and gene expression expertise to aid their analysis of argentinian E. coli O157 isolates.
Collaborator Contribution They have provided strains, immunological expertise and access to a mouse model to investigate Shiga toxin release and pathology
Impact manuscript as presented in main section. The partnering award allows me to travel to Argentina, present our work and initiate discussions with Dr Angel Cataldi at an institute separate from Conicet.
Start Year 2013
 
Description Artificial Intelligence (AI) workshop at Earlham 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Inter-institute workshop that discussed main applications of AI in their fields and potential for further research collaboration
Year(s) Of Engagement Activity 2018
 
Description International workshop on Shiga toxin-producing Escherichia coli at The Roslin Institute 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A two-day international workshop was held at The Roslin Institute on Shiga toxin-producing Escherichia coli (STEC), funded partly by this award (for travel of US collaborators) and partly by the Food Standards Agency of Scotland via a £2m award for collaborative research by a consortium led by Professor Gally. The workshop attracted leading academics working on E. coli O157 and other STEC from the US (Jim Bono, Guy Loneragan, Tom Edrington), Canada (Tim McAllister, Kim Stanford), Germany (Christian Menge), Belgium (Eric Cox), Sweden (Erik Eriksson, Lena-Mari Tamminen, Robert Soderlund) and the United Kingdom (Claire Jenkins, Tim Dallman, Dominic Mellor, Norval Strachan [Chief Scientific Advisor for FSA Scotland]). The workshop shared the latest advances in understanding of the biology of E. coli O157 and other STEC, including epidemiology, genomics, virulence, super-shedding and control strategies.
Year(s) Of Engagement Activity 2017
 
Description Invited speaker at Genome Science 2019 Edinburgh 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Presentation on the genomics of zoonotic E. coli O157
Year(s) Of Engagement Activity 2019
URL https://www.ed.ac.uk/edinburgh-infectious-diseases/news/events-archive/genome-science-2019-edinburgh
 
Description Speaker at an International Conference on One Health Antimicrobial Resistance - Leiden 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I presented a talk on 'alternative approaches to tackling antimicrobial resistance'. A number of scientists at RI have now contributed to a review in this area to be published in 2020.
Year(s) Of Engagement Activity 2019
URL http://www.icohar2019.org/icohar2019/organization.html