PhytoPath, an infrastructure for hundreds of plant pathogen genomes

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Ensembl Genomes

Abstract

Pathogen-mediated disease is a major cause of damage to crops, with considerable economic impact and consequences for food security. Global demand for food is rising because of population growth, increasing affluence and changing diets. In a typical cropping year in each field yield losses through infections caused by pathogenic microbes are rarely below 5% and are more typically in the range 10-15%. In recent years, new possibilities for the study (and ultimately control) of pathogens have opened up through the application of high-throughput technologies for determining the molecular nature of life. These include genome sequencing, which reveals the genetic code that determines inherited properties of cells - and extends to monitoring the varied cellular contents at different stages of life. PhytoPath is a resource designed to capture broad molecular information from plant pathogenic species, and combine it with descriptive information about the process of infection, including more specific molecular information, e.g. about the pathogen and host proteins that interact during infection and the phenotype of the interaction outcome. The former new knowledge on pathogen genomes, patterns of gene expression and potential interacting partner is housed using the Ensembl platform. Ensembl contains a comprehensive suite of software for the management and display of genome-scale data. The latter new phenotypic knowledge on experimentally verified genes required for the disease causing abilities of each pathogenic species is curated by members of the scientific community into the Pathogen Host Interactions (PHI-base) database. New interfaces (within and between) both these resources support the joint querying and visualisation of genomic and phenotypic data.

This is an application for a renewal of BBSRC funding (which commenced in late 2010). We propose expanding the resource, scaling it up to handle hundreds of fungal and oomycete phytopathogens (and associated data about population-wide polymorphisms), deploying new tools for community curation, and improving the facilities for comparative analysis from within the resource. For example, published RNA seq data will be used to predict new gene models and modify existing ones, but by deployment of a new tool for community-curation of gene models further expert revisions can be captured. Tools will be in place for the users to compare the reference genome of each species with the datasets arising from genome re-sequencing projects involving additional strains of a single species with different disease causing abilities, host genotype ranges and / or ability to produce different harmful mycotoxins/metabolites. A new curation focus on the phenotypic information will increase the details recorded about the molecular interactions between the repertoires of small effector proteins produced by pathogen and their initial targets within the crop hosts. New links will be provided allowing users to move freely between the genomes of both plant and pathogen. Also simple visualisation tools will be provide to display protein partnerships, and emerging sections of pathways and local network 'hubs'. We intend to capture molecular and phenotypic information on ~200 pathogenic species with a wide range of pathogenic lifestyles in cereal and non-cereal species. This will further increase the power of comparative analyses and evolutionary studies. The use of the existing pathogen associated microbe gene ontology (PAMGO) terms will gradually be introduced into the curation process. We will continue to engage with the large and active UK research community in this field, to find out their new requirements and to address their current needs through focussed workshops and University/Institute visits by specific PhytoPath team members. This resource will also serve the larger, global academic, industry and government based community increasingly concerned by the same scientific and societal problems.

Technical Summary

We will maintain the resource (PhytoPath) for high throughput molecular biology data from important phytopathogens and pathogenic phenotypes based on the Ensembl software infrastructure for genome analysis and display and PHI-base, the leading resource describing phenotypes of pathogenic infection. Information will be stored in a relational database and made available through a number of public interfaces, including a genome browser, a query optimised data warehouse, and bulk data download. Services will be operated as an integrated part of the EBI's suite of public services, and integrated with other services offering access to genome-scale data from other species (e.g. the plant hosts of pathogen-mediated disease). Species and data will be selected for incorporation according to the current research priorities of the community. Data types of interest to PhytoPath include genome sequence, variation information, functional and regulatory assays, RNA-seq, transcriptomic and proteomic data. In this funding cycle, PhytoPath will include a new tool for the community-curation of existing gene models, to supplement the tool developed in the current funding round for the community curaton of host-pathogen interactions. PHI-base will be developed to include information on the primary host targets of pathogen effector(s) and their cellular location and provide a sequence search function, and the WikiPathways tool will be included within the user interface. We will develop methods for population-scale variation analysis, and comparative genomic analysis between pathogenic and related non-pathogenic species, and include the results in each release of the database. The activities of PhytoPath are overseen by a management board comprising key members of the UK phytopathogen research community.

Planned Impact

The driving rationale for the project, as well as its greatest potential for societal impact, is in sustainably increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health). Massively reduced sequencing costs have led to a sharp (and continuing) increase in the number of sequenced phytopathogens. The power of such technologies, however, is critically limited by the quality of reference information (determined by individual scientific studies, and used to infer information about less well studied systems) and by the need to describe the role of given functional elements in the life of the organism. PhytoPath seeks to address this through providing a system for the management of genome-scale data interfaced with a repository of information about pathogenic phenotypes. Potential beneficiaries thus include not only academic researchers working in this field, but also companies developing pesticides, or attempting to breed new varieties of pathogen-resistant plants. More generally, farmers and the wider global population will benefit from improved strategies for disease control, although they are not expected to be among the direct users of the result.
The PIs at both institutions will engage with society, the media and policy makers to make the case for the importance of research into plant pathogens in the context of rising global concern about food and energy security, and of the potential benefits of genomics in addressing these concerns. But the main thrust of impact activities will be aimed at raising (academic and commercial) user awareness of the resource.
Impact will be achieved through attendance at relevant conferences, publication in the database issue of Nucleic Acids Research and other suitable journals, and through the use of our Scientific Advisory Board to gain feedback from critical members of the community about needs and use cases. Furthermore, we are directly seeking support for frequent visits to important BBSRC-funded UK laboratories for direct discussions with the staff employed there. Such visits will also involve offering informatics support for the submission of data to PhytoPath; and members of such laboratories will be additionally invited to visit EBI for short working visits where this will facilitate progress. We will additionally hold 2 training courses for members of the community, following on from the successful course already held in September 2012. Rothamsted Research (RRes) has an extremely strong track record in crop science; and in the study of plant pathogens; and is at the heart of a leading network of academic and commercial groups operating in these domains. Their BBSRC funded 20:20 wheat ISPG has the ambitious objective of raising potential wheat yields to 20 tonnes per hectare through 20 years of research. Work package 2 in 20:20 wheat - 'Protecting the yield potential of wheat' has several collaborative activities with both academic and / or industrial partners that will directly benefit from the growing data and species available in PhytoPath. The EBI is Europe's leading bioinformatics service centre and is ideally placed to exploit synergies between the activities of PhytoPath and other informatics activities; and is coordinating the ELIXIR project. The EBI also has a long established industry programme which helps guide future developments in accordance with commercial needs. Jointly, the two organisations are well placed to meet the needs of the user base in an efficient and sustainable manner.

Publications

10 25 50
 
Description We have further developed the PhytoPath web portal (http://phytopathdb.org) and underlying databases. We have increased the number of genomes represented in the resource to 306 fungi, protists and bacteria through the implementation of a new pipeline for automatic import of data from the molecular data archives and have implemented a new query interface for data querying over several species. Links to PHI-base (with information on pathogenic phenotypes) have been improved through a new system for sequence matching. We have also initiated programs for community gene curation in several species (providing infrastructure and data for data annotation and integration); a complete re-annaoation of the Botyrtis cinerea genome has results, and re-annaotation of Blumeria graminis has been completed and is awaiting publication before final integration in the resource.. Work on Zymoseptoria tritici is expected to commence soon.
Exploitation Route In 2017, Ensembl Fungi had 32,780 unique visitors; Ensembl Protists had 32,780 unique visitors; the PhytoPath DB portal had 39,236 unique visitors. A rich suite of interactive and programmatic interfaces make many types of biological data data readily available to the research community, organised around the genomic sequence. An understanding of the biology of an organism at the genome scale is an essential tool for developing understanding of mechanisms of infection and strategies for pathogen control and PhytoPath makes it easy for users to access this information.
Sectors Agriculture, Food and Drink,Chemicals

URL http://phytopathdb.org
 
Description The Ensembl Fungi and Ensembl Protists databases have been used in the agrochemical industry, by companies developing fungicides for use against plant pathogens in agricultural contexts.
First Year Of Impact 2013
Sector Agriculture, Food and Drink,Chemicals
Impact Types Economic

 
Title Ensembl Fungi 
Description A database offering access to genome scale data from fungal species. PhytoPath grants have paid for the introduction of data from plant pathogenic species into this database. The latest release includes 635 genome sequences from 374 distinct species. 
Type Of Material Database/Collection of data 
Year Produced 2009 
Provided To Others? Yes  
Impact 59,938 unique visitors (IP addresses) were recorded to the site in 2017. 
URL http://fungi.ensembl.org
 
Title Ensembl Protists 
Description A database offering access to genome-scale data from protist species. PhytoPath grants have funded the inclusion of data from plant pathogenic species. The latest release includes 177 genomes from 113 species. 
Type Of Material Database/Collection of data 
Year Produced 2009 
Provided To Others? Yes  
Impact 32,780 unique users (IP addressed) visiting the site were recorded in 2017. 
URL http://protists.ensembl.org
 
Title PhytoPath 
Description A new database bringing together content from 3 additional resources: Ensembl Fungi, Ensembl Protists and PHI-base, including an integrated data mining tool. 113 fungi, 25 protists and 137 bacteria were present in the resource as of April 2017. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact 39,236 unique visitors (IP addresses) were recorded to the site in 2017. 
URL http://www.phytopathdb.org
 
Description Blumeria graminis genome re-annotation 
Organisation RWTH Aachen University
Country Germany 
Sector Academic/University 
PI Contribution EMBL-EBI is providing curation tools, training, and data integration. Ensembl Fungi will be the final home of the newly annotated data.
Collaborator Contribution The partner has organised the research community to re-annotate the genome using the tools we have provided. The annotation process is now complete and the data is being prepared for release on publication in 2018.
Impact The genome has been completely re-annotated, leading to an increase of the protein coding genes from 6470 to 7118.
Start Year 2016
 
Description Botrytis genome community annotation 
Organisation University of Wageningen
Department Wageningen Food & Biobased Research
Country Netherlands 
Sector Academic/University 
PI Contribution We have provided software and instruction for the collection of community annotation data, and integrated the outputs within the Ensembl Fungi database.
Collaborator Contribution Collaborators at Wageningen University have organised the Botrytis research community to re-annotate the Botrytis cinerea genome.
Impact The B. cinerea genome was completely re-annotated, with 11646 gene structures modified and 32 novel genes added to the set. An account of the annotation has been published here http://onlinelibrary.wiley.com/doi/10.1111/mpp.12384/full.
Start Year 2015
 
Description Zymoseptoria tritici genome annotation 
Organisation AgroParisTech
Country France 
Sector Academic/University 
PI Contribution We provided expertise and an annotation platform to support the re-annotation of the Zymoseptoria tritici genome by the community.
Collaborator Contribution Our collaborators are assessing the data and re-annotating the genome. 25 people participated in training (provided by us) on how to use the tool.
Impact Still awaiting the completion of the annotation.
Start Year 2015
 
Title Software for performing multi-genome queries for phytopathogenic species 
Description An application for selecting and downloading data from plant pathogenic species. Implemented as a user-facing/data aggregation layer on top of a BioMart data warehouse. 
Type Of Technology Webtool/Application 
Year Produced 2015 
Impact Users are able to query for genes involved in pathogenesis across many species, and download genomic data linked to phenotypic query terms. The Phytopath web site had 9,310 unique visitors (measured by IP address) in 2015. 
URL http://www.phytopathdb.org/query/builder
 
Description Dothideomycetes Comparative Genomics workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation of the PhytoPath project and associated tools to the Dothideomycetes community.
Year(s) Of Engagement Activity 2017
URL https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwiU9tPu6KXZAhWKCcAKHbOCC...
 
Description Invited Presentation at the de.NBI symposium in Bielefeld, Germany, October 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An invited presentation was given to the annual symposium of the de.NBI bioinformatics network in Germany.
Year(s) Of Engagement Activity 2017
 
Description OECD meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Talk sparked questions and discussion afterwards

Informed the OECD as they evolve their policy on open data and open science.
Year(s) Of Engagement Activity 2013
URL https://community.oecd.org/community/openscience
 
Description Presentation at the Conference "The Future of Science: The Digital Revolution: What is changing for humankind" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact A presentation at a conference attended mostly by undergraduate and high-school students, focused on far-reaching changes in scientific practice.
Year(s) Of Engagement Activity 2016
URL http://www.futureofscience.org/press/first-world-conference-on-the-future-of-science-science-and-soc...
 
Description Webinar on the annotation of the Zymoseptoria genome 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 25 Zymoseptoria researchers were introduced by webinar to online tools and resources for improving the annotation of the reference Zymoseptoria genome.
Year(s) Of Engagement Activity 2017
 
Description Wellcome Trust Advanced Course on Fungal Pathogen Genomics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Wellcome Trust has funded a recurring 5-day advanced course on Fungal Pathogen Genomics at Hinxton, UK, starting in 2017. I was invited to co-organise the course in 2017 and 2018 and the team presented Ensembl Fungi, Ensembl Protists and PhytoPath (all resources developed under funding from this award) during the course.
Year(s) Of Engagement Activity 2017,2018
URL https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=629