PhytoPath: an integrated resource for comparative phytopathogen genomics

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Ensembl Genomes


Food security has emerged as one of the most significant challenges for humankind in the 21st century. Food shortages, high energy costs, conflicting demands on crop production for biofuel generation and the soaring demand for food from east and south-east Asia are combining to drive food prices to their highest levels for many years. A significant constraint on crop productivity is disease, which accounts for 10-20% losses in yield every year. Controlling plant diseases furthermore represents a significant cost to farmers both in time and resources. The development of new durable disease control strategies that can be deployed at low cost therefore represents one of the best means of ensuring sustainable food production. Plant pathogens (and other species, including the plants they afflict) are increasingly studied through the use of high throughput, automated experimental approaches that generate large quantities of data. For over ten years, the determination of the sequence of complete genomes (that is, all the information that determines the heritable characteristics of a species) has been possible. More recently, advances in technology have reduced the costs of genome sequencing drastically, and made possible the determination of individual genome sequences (thus allowing the sampling of populations to determine their characteristics). Similar improvements in technology have increased the quantity of data produced describing the expression of genes and proteins in a variety of experimental conditions. However, while public repositories exist for certain types of these new experimental data, there is no integrative resource available that unifies these to facilitate their interpretation. In the absence of such a resource, there is (at worst) a danger that data generated by new technologies is lost; or alternatively that every scientist wishing to exploit such data has to tediously and wastefully integrate and correct information from different data sets. For many scientists, the determination of a coherent, up-to-date body of data from different experiments is a near-impossible challenge, and a distraction from the challenge of using such information to solve real scientific problems. The Ensembl software platform comprises a suite of tools for the analysis, integration and display of data from complete genomes. It includes modules for the handling of population-wide genome variation amongst individuals, and the evolutionary comparison between species. The platform has been used to capture genomes from many species including vertebrates and plants. We now propose creating PhytoPath, a new resource based on Ensembl technology to capture data from phytopathogen genomes, in response to the increased interest in food security and the concomitant increase in high throughput data available for pathogens of interest. PhytoPath will be run by the EBI, Europe's leading bioinformatics service centre, but will take its scientific direction from members of the UK phytopathogen research community, who are directly engaged in producing and exploiting these data. The use of the Ensembl platform is not only cost-effective (taking advantage of solutions already developed for use in other contexts), but also offers the exciting prospect of providing access to host, pathogen (and vector) genomes through a common interface. Particularly, this will facilitate the development of a new type of resource, correlating phenotype (i.e. the symptoms of pathogen-mediated disease) in with genotype (i.e. individual genome sequence) in both host and pathogen. The leading current resource for plant disease phentoypes is the pathogen-host interactions database (PHI-base), maintained by Rothamsted Research. We will develop a new interface for supervised community curation of PHI-base and integrate PHI-base tightly within PhytoPath to ensure that the pathogen phenotype can be studied in its genomic context.

Technical Summary

We will establish a resource (PhytoPath) for high throughput molecular biology data from important phytopathogens and pathogenic phenotypes based on the Ensembl software infrastructure for genome analysis and display and PHI-base, the leading resource describing phenotypes of pathogenic infection. Information will be stored in a relational database and made available through a number of public interfaces, including a genome browser, a query optimised data warehouse, and bulk data download. Services will be operated as an integrated part of the EBI's suite of public services, and integrated with other services offering access to genome-scale data from other species (e.g. the plant hosts of pathogen-mediated disease). PhytoPath will be run by a management board comprising key members of the U.K. phytopathogen research community, and will initially prioritise the incorporation of data from selected fungal and oomycete pathogens of particular interest in the U.K. Data types of interest to PhytoPath include genome sequence, variation information, functional and regulatory assays, ESTs, transcriptomic and proteomic data. Priority species for inclusion in the first release are Magnaporthe grisea, Mycosphaerella graminicola, and Phytophthora infestans. Subsequently, data will be selected for incorporation according to the current research priorities of the community. We will develop methods for population-scale variation analysis, and comparative genomic analysis between pathogenic and related non-pathogenic species, building the domain specific expertise of project partners); and include the results in each release of the database. We will also develop a new interface to support annotation of host-pathogen interactions by community users, and develop a new interface linking genotypes in host and pathogen species with the disease phenotypes.

Planned Impact

As pressures on global food supplies grow, agriculture is of increasing economic value; and of huge significance to the large proportion of the world's population likely to face hunger if food production cannot be increased. PhytoPath will be extensively used by the agricultural biotechnology/agrochemical industry, which has considerable demand for a unified, centrally-curated database of genomic information for key plant pathogenic species. The development of an integrated resource aligning genomic, transcriptomic and proteomics data from both pathogen and host will provide a platform for researchers wishing to apply a systems approach to pathogen-mediated disease. PhytoPath will provide a framework for its management, a single home for community - centric curation of the scientific literature, and provide bidirectional links between the two. It will permit the rapid identification of the conserved and non-conserved evolutionary themes of biology. Other potential uses include enabling reviews on plant / animal pathogenesis; analysing the results of forward and reverse genetics experiments, validating gene models (against EST libraries or RNA-seq data), analysis of proteomics and protein-protein interaction studies, and the design and development of diagnostic markers with biological relevance for each species. All of these uses are invaluable to the agro-chemical industry. Links to the Fungicide Resistance Action Committee's website will integrate information about known target sites on top of the genomic/phenotypic information in PhytoPath itself. All data and software from PhytoPath will be available for universal re-use without restriction, allowing companies to integrate their own private information with that already in the public domain within a secure environment (although we will encourage companies to release pre-competitive information through the public site). In a wider sense, the potential benefits of the application of ultra-high throughput sequencing technologies to plant pathogens for target identification, include the potential for significant reduction in the cost of disease control and increases in achievable yields (benefiting both food and biofuel production). The economic and quality-of-life benefits of such advances are massive. While the development of PhytoPath is in itself only a small step towards a new green revolution, the proper management and integration of genome scale data is crucial for its correct interpretation and downstream exploitation. The partners will provide training to industrial (as well as to academic users). Additionally, both partners have strong track records for engagement with industry, both through specific collaborations and also through the EBI's Industry Programme and Small Medium Enterprise Forum, which provide an opportunity for the commercial sector to convey their current and future requirements of EBI services. We will additionally promote the project to the general public (through science fairs and press releases, and through inviting a student to create educational materials based on the project); and to potential downstream beneficiaries (e.g. meetings aimed at the agricultural community).
Description We have developed web-accessible database resources providing access to biological information about the genomes and biology of the causative agents of important plant diseases.
Exploitation Route The information available has the potential for use in crop breeding, pesticide development and agronomy.
Sectors Agriculture, Food and Drink,Healthcare

Description Many researchers have accessed the 3 web portals developed (the PhytoPath site, Ensembl Fungi and Ensembl protists) (over 37,000 in 2013) and each of them have had new, accurate information delivered.
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Chemicals
Impact Types Economic

Description Biotechnology and Biological Resource Fund
Amount £600,000 (GBP)
Funding ID BB/K020102/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 11/2013 
End 11/2016
Description Global Development Program
Amount $199,290 (USD)
Funding ID OPPGD1491 
Organisation Bill and Melinda Gates Foundation 
Sector Charity/Non Profit
Country United States
Start 06/2010 
End 10/2014
Title Ensembl Fungi 
Description A database offering access to genome scale data from fungal species. PhytoPath grants have paid for the introduction of data from plant pathogenic species into this database. The latest release includes 635 genome sequences from 374 distinct species. 
Type Of Material Database/Collection of data 
Year Produced 2009 
Provided To Others? Yes  
Impact 59,938 unique visitors (IP addresses) were recorded to the site in 2017. 
Title Ensembl Protists 
Description A database offering access to genome-scale data from protist species. PhytoPath grants have funded the inclusion of data from plant pathogenic species. The latest release includes 177 genomes from 113 species. 
Type Of Material Database/Collection of data 
Year Produced 2009 
Provided To Others? Yes  
Impact 32,780 unique users (IP addressed) visiting the site were recorded in 2017. 
Title PhytoPath 
Description A new database bringing together content from 3 additional resources: Ensembl Fungi, Ensembl Protists and PHI-base, including an integrated data mining tool. 113 fungi, 25 protists and 137 bacteria were present in the resource as of April 2017. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact 39,236 unique visitors (IP addresses) were recorded to the site in 2017. 
Description Fusarium oxysporum genome annotation 
Organisation University of Amsterdam
Country Netherlands 
Sector Academic/University 
PI Contribution We provided analytic expertise and computational capacity.
Collaborator Contribution They provided raw data, and will submit the analysed data that results from the collaboration to the PhytoPath database.
Impact Analysed data from the Fusarium oxysporum genome has been generated and will be made publicly available through PhytoPath.
Start Year 2013
Description G8 Open Data meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Talked sparked questions and discussion afterwards

G8 committed to ongoing pursuit of the open data agenda following the conclusion of the meeting (which contained many presentations including my own)
Year(s) Of Engagement Activity 2013