Machine-learning to predict and understand the zoonotic threat of E. coli O157 isolates

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Enterohemorrhagic Escherichia coli (EHEC) O157 are bacteria that have their main reservoir in food production animals, predominately cattle, and can be responsible for serious and life-threatening infections in humans. There are specific factors that define EHEC O157, including a micro-injection (type 3 secretion) system and production of specific Shiga toxins. However, we have known for nearly twenty years that not all subtypes represent the same threat to human health and significant effort has gone into understanding why this is the case. On key reason is that there are different Shiga toxin types, some potentially more toxic than others, and their production levels differ between isolates. This variability comes from the fact that Shiga toxins are introduced into the bacteria by infection with bacterial viruses, known as bacteriophages. These integrate their DNA into the bacterial genome in a 'prophage' state. When the bacterial cell is threatened this can activate the prophage to produce copies of itself and new bacteriophages. From whole genome sequencing of E. coli we are now aware that multiple prophages are present in E. coli genomes, some in different states of decay, but they can impact on each other and recombine to produce new variants. Much of the differences between E. coli O157 isolates are down to their prophage content yet sequence identification methods generally use only 'core' genes for epidemiological studies.
We have recently applied machine-learning approaches to examine whole genome sequences of E. coli O157 from cattle and humans. We use these as training sets and then ask it to predict which group other E. coli O157 isolates should be assigned to. Surprisingly it only assigns a small proportion (<10%) of isolates from cattle to the human grouping, indicating that only this small subset may be more of a threat to human health. This grant is to investigate the biological basis of this selection process. We know that the machine-learning assignment is based on discriminatory protein variants predicted to be expressed from mainly prophage genes, so this fits with our understanding of the variation present in these isolates. The proposed work will be a combination of bioinformatics research and 'wet' infection biology research. For the bioinformatics we can use subjective and objective approaches to swap gene variants, including whole prophage, between isolate sequences and re-calculate their host prediction scores. This will allow us to define the most important combinations of genes being used for the prediction of zoonotic potential. It may also highlight specific genes to simplify the identification process. In the laboratory we will initially compare isolates that are very similar at the core genome level but differ markedly in their prediction scores. We will examine their gene expression profiles, metabolic profiles and key phenotypes such as Shiga toxin production, cellular interactions and pathology in a mouse model. Then we will swap or mutate genes identified by the bioinformatics and test these strain variants in the same laboratory assays.
The research should help validate this exciting new approach to understanding bacterial virulence and identify genes involved in the zoonotic threat of this dangerous pathogen. We should then be able to develop simpler approaches to identifying these specific variants on farms and intervene with, for example a vaccine, to reduce the threat to human health. The approach may also work to predict differences in virulence between human isolates and this could have repercussions for how specific outbreaks are managed. This research is timely as it builds on our recent and unique application of machine learning to predict zoonotic potential and access to fully annotated PacBio sequences of UK cattle and human E. coli O157 isolates generated in partnership with Dr James Bono (USDA, Nebraska).

Technical Summary

Enterohaemorrhagic E. coli (EHEC) O157 lysogenized with Shiga toxin 2a (Stx2a)-encoding bacteriophages have become prevalent in cattle in the UK in the last 30 years and this timeframe matches the emergence of serious EHEC-associated human disease. Cattle are an asymptomatic primary reservoir for this zoonosis which can cause bloody diarrhoea and kidney/brain damage in humans. Whole genome sequencing has demonstrated the mosaic nature of the E. coli O157 genome and multiple prophages contribute to diversity of this serotype. Based on whole genome sequences, we have recently used support vector machine, a machine-learning algorithm, to predict the zoonotic potential of cattle isolates. The main conclusion was that only a small fraction of the bovine isolates (<10%) may be a threat to human health, even within previously defined pathogenic lineages. The prediction probabilities are based primarily on prophage-associated differential protein variants (PVs) extracted from sequence assemblies. The proposed study will combine bioinformatics and laboratory research to define key prophage regions important for prediction and investigate how they impact on pathogen biology. The computer-based studies will focus on in silico recombination, decision trees/random forests and genetic algorithms to define critical combinations of PVs. The laboratory work will initially study paired isolates with similar core genomes but markedly different prediction scores. Transcriptomic analysis, metabolic profiling, Shiga toxin production, cellular interactions and toxin pathology in a mouse model will be studied. Isogenic mutants of prophage regions identified by the bioinformatics analyses will then be characterised in the same laboratory assays. The research aims to identify differential genes responsible for zoonotic potential and use this information to simplify assessment of farm isolates to allow targeted interventions.

Planned Impact

The Edinburgh EHEC grouping has been growing since 1999 and now has links to an extensive network of scientists nationally and internationally covering epidemiology, molecular biology, health impact and possible interventions; this includes collaborations with basic research groups, animal scientists, diagnostic and public health laboratories (PHE, SERl, HPS). A good relationship with FSS/FSA further links us through to food producers, politicians and the general public. Knowledge exchange will be maintained with these groups by twice-yearly meetings which currently occur under our FSS programme (end 08/17) but this momentum will be maintained under this grant. We will also host a specific symposium at the Roslin Institute in 2019 to discuss the application of machine learning to interrogate both zoonotic potential and host source. We are currently helping SERL with the installation of a bioinformatics pipeline based on core SNP differences (developed at PHE) and we aim for this to expanded to include prophage profiles and host prediction scores. A longer-term objective is to work with Health Protection Scotland (HPS) to understand how the SVM prediction scores may relate to patient pathology, with a direct impact on outbreak management. Our group works hard to ensure we obtain published outputs from our research and we have a good track record in this and delivery of seminars across the country and abroad. We currently are part of two international partnership awards, one with researchers in Argentina, which has some of the highest rates of EHEC disease in the world. This application builds on this award by using their in vivo infection model and further researcher exchange. Another partnership award is focused on vaccine development against bacterial zoonoses originating from livestock with groups in the USA and our partnership with Jim Bono (USDA) will further develop this network. Another important impact is through the training of veterinary undergraduates at the University of Edinburgh through lectures and tutorials that benefit from the advances made in this research. This feeds through to the important role that veterinary clinicians have in working with commercial producers and the public to raise awareness of such infections.
We envisage the possibility of herd testing and application of a vaccine or alternative intervention based on identification of strains with high zoonotic potential using the SVM method. Development of a multiplex PCR or alternative test could be applied to screen herds to identify those that should be targeted. Impact for this is through a commercial partner potentially allied to ongoing research on E. coli O157 vaccines. We are currently in negotiation over the licencing of our vaccine patents and with an intention to test the vaccine in a feedlot trial in collaboration with the USDA (Dr Jim Bono, Nebraska). The machine-learning (SVM) method is also proving accurate for predicting the isolation host of E. coli in general (not just EHEC) and this could have important repercussions for food, health and environmental sampling. We will work to achieve this with our dedicated business development operatives from Edinburgh Research and Innovation (ERI), a non-profit subsidiary company of the University of Edinburgh, who are based at the Roslin Institute.
EHEC O157:H7 and other Stx-associated infections generate considerable public interest and we are committed to disseminating the as widely as possible. The Roslin Institute provides information about our research through our web site (http://www.roslin.ac.uk/), talks and discussion groups and direct interaction with the media. Each investigator & PDRA on the grant will be expected to spend ~2 days/yr in direct engagement with the public & schools including participation in our yearly 'open doors' events. Direct impact is also achieved through training of these staff in diverse skills including in bioinformatics and molecular biology.

Publications

10 25 50
 
Description Collaboration with Public Health England 
Organisation Public Health England
Country United Kingdom 
Sector Public 
PI Contribution Provision of animal and human STEC for sequencing, working with PHE to analyse strain phylogeny and epidemiology. We have contributed through further analysis of long read strain sequences to understand changes in strains that occur during outbreaks. We have co-upervised 2 PhD students on STEC bioinformatics projects.
Collaborator Contribution Reduced rate sequencing of STEC, analysis of data, provision of metadata. Co-publication
Impact Publications as in main list
Start Year 2013
 
Description International workshop on Shiga toxin-producing Escherichia coli at The Roslin Institute 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact A two-day international workshop was held at The Roslin Institute on Shiga toxin-producing Escherichia coli (STEC), funded partly by this award (for travel of US collaborators) and partly by the Food Standards Agency of Scotland via a £2m award for collaborative research by a consortium led by Professor Gally. The workshop attracted leading academics working on E. coli O157 and other STEC from the US (Jim Bono, Guy Loneragan, Tom Edrington), Canada (Tim McAllister, Kim Stanford), Germany (Christian Menge), Belgium (Eric Cox), Sweden (Erik Eriksson, Lena-Mari Tamminen, Robert Soderlund) and the United Kingdom (Claire Jenkins, Tim Dallman, Dominic Mellor, Norval Strachan [Chief Scientific Advisor for FSA Scotland]). The workshop shared the latest advances in understanding of the biology of E. coli O157 and other STEC, including epidemiology, genomics, virulence, super-shedding and control strategies.
Year(s) Of Engagement Activity 2017