PhytoBacExplorer: A Phylogenomic Resource for the Phytobacterial Community

Lead Research Organisation: University of Warwick
Department Name: Warwick Medical School

Abstract

Bacteria that infect plants are diverse and can be transmitted via air, water, and soil, or via vectors (the organisms that carry them) like insects or nematodes. When introduced to new areas their spread is often highly invasive. Bacteria are difficult to control with chemicals and frequently able to infect a large number of plant hosts. They have the potential to cause widespread and severe economic, environmental and social impacts on landscapes and ecosystems, both rural and urban.

These phytobacteria can evolve rapidly, and as is evident from recent outbreaks, the UK economy and environment are continually threatened by emerging strains that infest or infect our food crops, forests, gardens, parks and nurseries. This is further aggravated by climate change and greater movement of potentially contaminated plant materials by trade. Ideally, we should be proactive and identify new variants posing risks early and take measures to limit their spread. However, our ability to do so is currently limited.

For bacteria that infect humans, the medical community has benefited immensely from EnteroBase, a computational system for processing and analysis of bacterial DNA sequences. It is used by labs around the world to promptly identify the causes of outbreaks, track multi-country outbreaks, inform prompt public health decision making and subsequent recall of contaminated products in increasingly global food chains. We will use the EnteroBase platform to develop "PhytoBacExplorer", a new system dedicated to plant bacteria.

PhytoBacExplorer will initially target genome sequencing for 5 of the top-6 plant bacterial pathogens, and develop connections to feed in related data from other sources. We will provide curation to ensure rich, consistent, and high-quality data including geographic information, host ranges, and genes underlying pathogenicity. The system will inherit many useful features from EnteroBase, including processing of raw data, identification of the shared "core" genome in a population, identification of the phylogenetic (ancestry) tree based on core genomes, and powerful interactive visualisations. We will add tools based on community user engagement including tools for evolutionary analyses of gene families, tools to detect functional differences between groups of strains and metagenomics capacity.

Users can upload raw sequencing data for PhytoBacExplorer to process and integrate with the full data set. Users will learn whether their strains are new species or have been previously observed, and can identify their likely functional characteristics. This will enable monitoring of ongoing pathogen evolution, and significantly reduce time spend on data processing tasks.

Users will have input into the development of the system: there will be a monthly opportunity to talk with the project team in an online meeting room, access online tutorials, and there will conference presentations. The project has a budget for sequencing to cover poorly represented interspecific clades. We will include further pathogenic bacteria and beneficial bacteria later in the project.

The system will enable identification of DNA sequences specific to particular strains and enable faster development of more precise molecular tests. This will greatly facilitate regulatory bodies and Plant Health field officers in tracking new threats to the environment, and contribute to precision agriculture.

PhytoBacExplorer will be designed for interoperability with future computational infrastructure. After the project, it will be continually supported by the University of Warwick via the Bioinformatics Research Technology Platform in terms of hardware and maintenance support. The system will continue to grow its databases automatically by daily scans of public databases and will play an important role in enabling early detection of emerging pathogens and thus upcoming threats to food security.

Technical Summary

The PhytoBacExplorer is a phylogenomic resource for integrative analyses of the most intensively studied phytobacteria and will greatly facilitate the identification and characterisation of new and emerging threats to food security. It will inform academics and practitioners about the ongoing evolution of plant pathogens under pressures of modern agriculture and global change, and serve the community as a central resource, focusing efforts on research rather than technical tasks, and enabling data sharing of consistently processed, richly annotated high-quality data checked by curators. It will build on the impactful and mature EnteroBase system with >4,200 users in the medical community, including a number of national reference laboratories. EnteroBase features include daily checks for new genome sequencing data in public repositories enabling automated growth, core genome identification, core genome-based Multi-Locus Sequence Tag analyses, and phylogenetic reconstruction and visualisation tools. The system will be enhanced by popular tools of the phytobacterial community including population genetic metrics, identification of virulence factors, metabolic pathways, Ks test, gene family co-evolution, and design of molecular probes/markers for the detection of specific pathogens. A query tool for metagenomic data sets will be added. The data will be enriched by curation including effectors, host ranges, and geographic data, and targeted sequencing of poorly represented interspecific clades. The resource will initially focus on Pseudomonas spp., Ralstonia spp., and Xanthomonadaceae covering 5 of the top 6 plant bacterial pathogens, but be designed for expansion to other classes as user demand dictates. A powerful API will enable integration with other systems and programmatic access needed for machine learning and AI applications. PhytoBacExplorer will play an important role in enabling early detection of emerging pathogens and thus upcoming threats to food security.

Publications

10 25 50