PhytoPath, an infrastructure for hundreds of plant pathogen genomes

Lead Research Organisation: Rothamsted Research
Department Name: Biointeractions and Crop Protection

Abstract

Pathogen-mediated disease is a major cause of damage to crops, with considerable economic impact and consequences for food security. Global demand for food is rising because of population growth, increasing affluence and changing diets. In a typical cropping year in each field yield losses through infections caused by pathogenic microbes are rarely below 5% and are more typically in the range 10-15%. In recent years, new possibilities for the study (and ultimately control) of pathogens have opened up through the application of high-throughput technologies for determining the molecular nature of life. These include genome sequencing, which reveals the genetic code that determines inherited properties of cells - and extends to monitoring the varied cellular contents at different stages of life. PhytoPath is a resource designed to capture broad molecular information from plant pathogenic species, and combine it with descriptive information about the process of infection, including more specific molecular information, e.g. about the pathogen and host proteins that interact during infection and the phenotype of the interaction outcome. The former new knowledge on pathogen genomes, patterns of gene expression and potential interacting partner is housed using the Ensembl platform. Ensembl contains a comprehensive suite of software for the management and display of genome-scale data. The latter new phenotypic knowledge on experimentally verified genes required for the disease causing abilities of each pathogenic species is curated by members of the scientific community into the Pathogen Host Interactions (PHI-base) database. New interfaces (within and between) both these resources support the joint querying and visualisation of genomic and phenotypic data.

This is an application for a renewal of BBSRC funding (which commenced in late 2010). We propose expanding the resource, scaling it up to handle hundreds of fungal and oomycete phytopathogens (and associated data about population-wide polymorphisms), deploying new tools for community curation, and improving the facilities for comparative analysis from within the resource. For example, published RNA seq data will be used to predict new gene models and modify existing ones, but by deployment of a new tool for community-curation of gene models further expert revisions can be captured. Tools will be in place for the users to compare the reference genome of each species with the datasets arising from genome re-sequencing projects involving additional strains of a single species with different disease causing abilities, host genotype ranges and / or ability to produce different harmful mycotoxins/metabolites. A new curation focus on the phenotypic information will increase the details recorded about the molecular interactions between the repertoires of small effector proteins produced by pathogen and their initial targets within the crop hosts. New links will be provided allowing users to move freely between the genomes of both plant and pathogen. Also simple visualisation tools will be provide to display protein partnerships, and emerging sections of pathways and local network 'hubs'. We intend to capture molecular and phenotypic information on ~200 pathogenic species with a wide range of pathogenic lifestyles in cereal and non-cereal species. This will further increase the power of comparative analyses and evolutionary studies. The use of the existing pathogen associated microbe gene ontology (PAMGO) terms will gradually be introduced into the curation process. We will continue to engage with the large and active UK research community in this field, to find out their new requirements and to address their current needs through focussed workshops and University/Institute visits by specific PhytoPath team members. This resource will also serve the larger, global academic, industry and government based community increasingly concerned by the same scientific and societal problems.

Technical Summary

We will maintain the resource (PhytoPath) for high throughput molecular biology data from important phytopathogens and pathogenic phenotypes based on the Ensembl software infrastructure for genome analysis and display and PHI-base, the leading resource describing phenotypes of pathogenic infection. Information will be stored in a relational database and made available through a number of public interfaces, including a genome browser, a query optimised data warehouse, and bulk data download. Services will be operated as an integrated part of the EBI's suite of public services, and integrated with other services offering access to genome-scale data from other species (e.g. the plant hosts of pathogen-mediated disease). Species and data will be selected for incorporation according to the current research priorities of the community. Data types of interest to PhytoPath include genome sequence, variation information, functional and regulatory assays, RNA-seq, transcriptomic and proteomic data. In this funding cycle, PhytoPath will include a new tool for the community-curation of existing gene models, to supplement the tool developed in the current funding round for the community curaton of host-pathogen interactions. PHI-base will be developed to include information on the primary host targets of pathogen effector(s) and their cellular location and provide a sequence search function, and the WikiPathways tool will be included within the user interface. We will develop methods for population-scale variation analysis, and comparative genomic analysis between pathogenic and related non-pathogenic species, and include the results in each release of the database. The activities of PhytoPath are overseen by a management board comprising key members of the UK phytopathogen research community.

Planned Impact

The driving rationale for the project, as well as its greatest potential for societal impact, is in sustainably increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health). Massively reduced sequencing costs have led to a sharp (and continuing) increase in the number of sequenced phytopathogens. The power of such technologies, however, is critically limited by the quality of reference information (determined by individual scientific studies, and used to infer information about less well studied systems) and by the need to describe the role of given functional elements in the life of the organism. PhytoPath seeks to address this through providing a system for the management of genome-scale data interfaced with a repository of information about pathogenic phenotypes. Potential beneficiaries thus include not only academic researchers working in this field, but also companies developing pesticides, or attempting to breed new varieties of pathogen-resistant plants. More generally, farmers and the wider global population will benefit from improved strategies for disease control, although they are not expected to be among the direct users of the result.

The PIs at both institutions will engage with society, the media and policy makers to make the case for the importance of research into plant pathogens in the context of rising global concern about food and energy security, and of the potential benefits of genomics in addressing these concerns. But the main thrust of impact activities will be aimed at raising (academic and commercial) user awareness of the resource.

Impact will be achieved through attendance at relevant conferences, publication in the database issue of Nucleic Acids Research and other suitable journals, and through the use of our Scientific Advisory Board to gain feedback from critical members of the community about needs and use cases. Furthermore, we are directly seeking support for frequent visits to important BBSRC-funded UK laboratories for direct discussions with the staff employed there. Such visits will also involve offering informatics support for the submission of data to PhytoPath; and members of such laboratories will be additionally invited to visit EBI for short working visits where this will facilitate progress. We will additionally hold 2 training courses for members of the community, following on from the successful course already held in September 2012. Rothamsted Research (RRes) has an extremely strong track record in crop science; and in the study of plant pathogens; and is at the heart of a leading network of academic and commercial groups operating in these domains. Their BBSRC funded 20:20 wheat ISPG has the ambitious objective of raising potential wheat yields to 20 tonnes per hectare through 20 years of research. Work package 2 in 20:20 wheat - 'Protecting the yield potential of wheat' has several collaborative activities with both academic and / or industrial partners that will directly benefit from the growing data and species available in PhytoPath. The EBI is Europe's leading bioinformatics service centre and is ideally placed to exploit synergies between the activities of PhytoPath and other informatics activities; and is coordinating the ELIXIR project. The EBI also has a long established industry programme which helps guide future developments in accordance with commercial needs. Jointly, the two organisations are well placed to meet the needs of the user base in an efficient and sustainable manner.

Publications

10 25 50
 
Title A collection of postcards 
Description A collection of postcard highlighting the importance of Fusarium and the research on-going at Rothamsted were prepared for the Fusarium one-day event held at Rothamsted in July 2016. These postcards were disseminated to the wider public and key stakeholders through this event and follow-on activities. 
Type Of Art Artwork 
Year Produced 2016 
Impact These postcards have made the Fusarium disease problems on crop plants and the subsequent detrimental affects on food and feed industries farm far more accessible to the general public, potiticians and the Agri-Industry. 
 
Description All the project objectives have been met, and many exceeded. The ongoing collaboration between the EBI Ensembl team, the Rothamsted Research PHI-base team, the curation company Molecular Connections and the University of Cambridge single organism literature curation Pombase team has been significantly strengthened.

? The incorporation and prioritisation of large numbers of fungal, protist and bacterial genomes into Ensembl. Since the start of the project in 2014, the number of pathogen genomes available rose from 20 to over 320, covering more than 160 fungal, 40 protist, and 120 bacterial pathogenic species. These genomes were overlaid with variation data, RNA-Seq data, and whole genome alignment data, then embedded within the Ensembl comparative genomics pipelines for numerous pathogenic and non-pathogenic species. This integration enables simultaneous querying of multiple sources of data, and enables findings from well-studied pathogens to be applied to underfunded, under-studied species. Three data releases have taken place each year.

? Whole genome alignments have been performed - using the programs LASTZ and translated BLAT - between almost every pair of protist pathogen genomes (including non-phytopathogenic relatives), and between most pairs of fungi in the Ashyba/Saccharomyces, Colletotrichum/Verticilium, Fusarium/Trichoderma, Gaeumannomyces/Magnaporthe, Puccinia and basidiomycete groups. A total of 43 pairwise comparisons involving fungal phytopathogens and 632 pairwise comparisons involving protist phytopathogens are available. Synteny was predicted for closely related species. Evolutionary trees have been inferred for many fungal and protist species. A total of 886,676 proteins from 83 phytopathogenic species were placed in a total of 307,073 phylogenetic trees, with a total of 2,041,570 other proteins from non-phytopathogenic species. In addition, a total of six phytopathogenic species were included in a broad range, pan-taxonomic comparative analysis.

? The development of a PHI-base mapping tool to regularly annotate Ensembl microbial genes with the rich, manually-curated data in PHI-base. These annotations are exposed through the browser, BioMart and programmatic APIs. This permitted approximately 98.5% of PHI-base entries to be successfully mapped onto their corresponding genome. This activity has continued beyond the lifetime of the PhytoPath project. As of PHI-base version 4.6, a total of 3,299 microbial genes in Ensembl Genomes have PHI-base annotations mapped onto them. These can be searched for via a web browser and the BioMart tool.

? The bringing together of multiple fungal communities to collaborate on improving the gene sets of key pathogens. This created well-curated gene sets for Botrytis cinerea and Blumeria graminis var. tritici, and both genomes are now published. Also, a reliable infrastructure and training material was developed that is being exploited by other communities wishing to improve the gene models available for key species of interest. For example, six research groups within the global Zymoseptoria tritici community have, since summer 2018, been collectively editing all gene models in transposon/repeat rich regions of the genome using the Apollo tools.

? Regular manual curation of 40 peer-reviewed articles per month into PHI-base.

? Biannual updates of data in PHI-base, each May and November, with this data published on the PHI-base website (www.phi-base.org).

? Increased the volume of data in PHI-base. The increase from PHI-base version 3.6 (released 6 May 2014) to version 4.6 (released 1 November 2018) is as follows: genes, 2875 to 6438 (224%); interactions, 4102 to 11,340 (276%); pathogenic species, 166 to 263 (158%) (now capped at approximately 200 species for curation purposes); hosts, 110 to 194 (176%); diseases, 181 to 510 (281%); and references, 1243 to 3011 (242%). Curation of the literature on effectors rose considerably: from less than 50 entries to 1,963 entries.

? Launched a new interface for PHI-base in September 2015, providing a faceted view of the data, and permitting 81 different data types to be searched for and displayed (only 50 data types were searchable in 2014).

? Creation of a user-friendly web portal for PHI-base version 4, including a unique protein-to-phenotype service (PHIB-BLAST), launched in May 2016.

? Fully implemented the four FAIR data principles (findability, accessibility, interoperability, and reusability) for PHI-base in late 2015. The publicly-accessible PHI-base GitHub repository (https://github.com/PHI-base) provides exports of data from PHI-base, plus source code for the PHI-base parser and website.

? Placed PHI-base on the ELIXIR 'Data for Life' roadmap in 2016. Following an open competition, PHI-base was also successfully included in the European life-science infrastructure for biological information initiative, via membership of the UK node of ELIXIR since May 2017. All the legal documentation between the UK partners and the UK Hub (University of Manchester) was signed off by November 2017.

? Since mid-2018, the PHI-base phenotype data sets all became available within FungiDB, the Fungal and Oomycete Genomics resource (https://fungidb.org/fungidb/). This further exposed the data in PHI-base to the USA research community, as well as various model organism communities and human pathogens.

? Development of ontologies - including a species-neutral ontology called PHIPO (Pathogen-Host Interactions Phenotype Ontology). PHIPO contains descriptive phenotype terms which enables improved phenotype searches across many pathogen and host species that are important for animal and human health and agriculture. PHIPO currently includes phenotype terms capable of relating the 250 pathogen species and 160 host species contained in PHI-base. PHIPO was formally registered with the OBO Foundry in November 2018 (https://www.obofoundry.org/ontology/phipo.html), and is available on GitHub . (https://github.com/PHI-base/phipo/blob/master/phipo.owl). A corresponding disease ontology called PHIDO (Pathogen-Host Interactions Disease Ontology has also been developed, and is also available on GitHub (https://github.com/PHI-base/config/blob/master/ontologies/phido.owl).

? Development of PHI-Canto: a community curation web application (www.phi-canto.org) that aims to encourage scientists and journals to capture scientific data at its source (usually upon immediate publication of a peer reviewed article).

? Developed a way to capture published, experimentally verified first host targets of a pathogen effector gene product. This was done with a small test set of interactions. EnsemblFungi/Protist genome IDs were captured for the pathogen gene product, and EnsemblPlant genome IDs were captured for the corresponding plant first host target. This cross connection of different Ensembl databases, via specific gene entries, is a novel development for Ensembl.

? Captured CHEBI identifiers, CAS numbers, and documented mode of action for 149 commercial antifungal chemistries, for use by authors from the fungicide community using the PHI-Canto tool.

? Three training workshops were held at the EBI, with 25-44 participants at each.

? PhytoPath and PHI-base have maintained up-to-date Wikipedia entry pages, with external links to other relevant resources.

? A twitter account @PHI_base was established for PHI-base in April 2017 to further improve communication with users of PHI-base, and to widen awareness of PHI-base in the scientific community.

? Increasing user numbers for both the PhytoPath and PHI-base resources, with totals per year rising as follows: 2014 - 59,515; 2015 - 82,240; 2016 - 116,697; 2017 - 142,539; 2018 - 146,163. The entire PHI-base database was downloaded an increased number of times for use in bioinformatics studies: 2014 - 396; 2015 - 576; 2016 - 612; 2017 - 444; 2018 - 672.

? Increased number of citations for PHI-base, from 103 at the start of 2014, to a total of 255 by the end of 2018. All publications citing PHI-base use listed on the PHI-base website in publication year order.
Exploitation Route ? The PhytoPath and PHI-base resources are of immediate benefit to all researchers in the medical, crop plant, animal, and model organism biosciences working on diseases caused by fungi, protists and bacteria. Both resources remove bottlenecks to new discoveries caused by data sets being unavailable, non-integrated, and/or incompatible with simple queries and complex analyses. To ensure use of the data by others, priority infectious microbes have been selected and included according to the interests of industrial and academic researchers in the UK. These two inter-connected resources provide standardised annotation, more powerful comparative analyses, and greater data access through interactive interfaces and new tools.

? Potential users include agricultural-chemical companies developing pesticides; plant breeders engaged in breeding new varieties of pathogen-resistant plants; and pharmaceutical companies developing new health care products to stop or minimise infectious microbes, both in general human populations and within hospitals. Other potential users of the data are diagnostic companies, following the emergence of problematic pathogenic microbes.

? The interpretation of genome-scale molecular biology and phenotyping data is a key component in the development of novel strategies for sustainable disease control in humans, cropped plants, and farmed animals. This biology has considerable academic, economic, social, and ecological value. Specifically, these resources have been developed to organise genome sequences, genetic variation data, and phenotypic data, and to make such data widely accessible. The resources permit users from multiple disciplines to perform genome-wide enquiries across wide taxonomic distances, whilst at the same time linking these analyses directly to pathogenic phenotypes associated with gene mutations, that have been curated from peer-reviewed literature. This immediate access by users speeds up analysis, and allows hypotheses to be rapidly confirmed or refuted. All previous studies on the same gene can be identified immediately, which eases adding further information to a new discovery, and prevents repetition of prior successful experiments.

? The availability of negative experimental data sets is particularly useful when modelling the mechanisms and pathways underlying infectious processes, and the genes, proteins and metabolites involved these processes. The inclusion of negative data was requested by PHI-base users in the mid 2000s, and curation of negative data is ongoing. As of version 4.6, PHI-base provides 2877 unaffected pathogenicity entries, constituting 25% of the total data.

? The full data downloads available for both PhytoPath and PHI-base means that industry-based users have complete access to these datasets for commercial purposes.

? The greatest potential for societal impacts from PhytoPath and PHI-base is in two targeted industry sectors. Firstly, sustainability increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their host targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health of the natural ecosystem). The second targeted sector is human health, and medical interventions to ensure healthy ageing throughout the life course. Understanding pathogen gene function, host targets and downstream biological functions will aid novel drug discoveries, and help to track clinical efficacy.

? Emerging infectious diseases (EIDs). Academic, industry and governmental organisations are increasingly monitoring alleles in pathogen genomes that, when mutated or transferred between species (by horizontal gene transfer or conjugation), lead to enhanced disease formation. The former type (mutations) are often the negative regulators in a biological system. As of version 4.6, PHI-base has 323 single gene entries classified as increased virulence (hypervirulence) covering 514 interactions and 98 pathogenic species.

? Pathogen effectors are increasingly being used successfully by plant breeders in traditional breeding programmes to remove susceptible loci, thereby reducing overall levels of a particular disease (for example, glume blotch disease on wheat caused by the fungus Parastagnospora nodorum). Alternatively, plant breeders are now using effectoromics screening with purified effector proteins - rather that direct pathogen tests - both to save time and money, to identify the presence of specific disease resistance (R) genes in germplasm collections, as well as in introgression, pedigree and hybrid breeding. This identification is done alongside marker-assisted selection, to ensure the retention of functional R alleles, and the absence of additional suppressor loci. As of version 4.6, PHI-base has 1965 entries termed 'effector', constituting 17% of all data.

? First host targets of pathogen effectors are also curated into PHI-base, and tagged in EnsemblPlants. These coupled datasets - although currently low in number at approximately 20 entries - could be used immediately by agricultural-biotechnology and plant breeding companies to develop novel disease control strategies: either through gene editing, or through introgression and hybrid breeding. It would also be possible to use this information to track the closest sequence-related alleles in taxonomically related plant species that are attacked by related pathogenic species.

? The potential discovery of novel anti-infective targets by users. There are 199 lethal entries in PHI-base, of which 83 are plant pathogen entries. Comparative genomics analyses can be done using the Ensembl BioMart tool to immediately infer that the same gene is likely to be lethal in a related species. Follow-up studies exploring sequence variation and protein modelling - using a range of pathogenic and non-pathogenic species- could then be used to determine if regions of these proteins could be targeted by specific chemistries. This analysis could provide pathogen control, while not affecting beneficial species.

? Improved general information and access to all known fungicide targets. These targets are all listed in PHI-base and marked up on the pathogen genome browser in PhytoPath. If reduced fungicide efficacy is noted, the Uniprot ID can be downloaded, PCR primers designed, and these primers can be used to check problematic isolates for the presence of novel variant sequences in the coding, non-coding and promoter sequences of the predicted gene model.

? With a future increase in the number of wheat host target genes curated in PHI-base (1350 interaction entries; 12% of total data), the resource can be linked to CerealsDB (hosted at Bristol), allowing users to monitor the presence or absence of host resistance genes in more than 500 wheat lines: a considerable number of which are current or former commercial varieties. This could become a regular activity within the BBSRC ISP Designing Future Wheat (DFW).

? Increased use of PhytoPath and PHI-base in predictive biology. Both academic and industry based researchers are increasingly moving from exploring a single reference genome for a pathogenic species, to exploring the genomes of multiple isolates from a single species with different biological phenotypes, and going on to predict the core and variable parts of the pan-genome. Researchers want to know what has newly arrived in a population or species that previously was absent, and what has changed (for example, expanded or contracted gene families). Researchers often start by describing what gene content was found in all the older isolates (n= 100), then as new isolates with different phenotypes arise in a region (anywhere in the world), these isolates are collected and sequenced, then their genomes are compared with the reference core and variable pan-genome. What are the shifts? Is there evidence of horizontal gene transfer or the arrival of new lineages? Researchers can also explore where in the genome the new genes are found, to identify the hot spots, and these can be the focus of other genome surveys. The gene function datasets available from PHI-base are already becoming highly useful for these types and other types of pan-genome analyses.

? The development of PHI-Canto author curation tool has included the development of a new concept called the 'metagenotype': a combination of a pathogen genotype and a host genotype. This concept could be adapted for use by many other academic research communities when exploring interactions between two or more organisms. All the source code for PHI-Canto is licensed under a free software license . (Specifically the GNU General Public License v3.0; see https://github.com/pombase/canto/blob/master/LICENSE). Once the PHI-Canto publication appears in 2019, we are confident that other communities will adapt this new code for their own needs.

? PHIPO and PHIDO. PHIPO provides a generic, extensible solution to describe pathogen-host interaction phenotypes. All terms continue to be logically defined in terms of external ontologies, and interoperability with other phenotype ontologies will be enabled by adopting standards defined by the Phenotype Ontologies Reconciliation Effort (which formalises phenotype ontology design across species) (https://github.com/obophenotype/upheno/wiki/Phenotype-Ontologies-Reconciliation-Effort). PHIPO is easily accessible, due to its inclusion in a catalogue of ontologies hosted by the OBO Foundry. PHIDO is an initial effort to collect together disease names found on a variety of hosts including plants, animals and humans. This data has been collected from current PHI-base data and is available on our GitHub page (https://github.com/phi-base).
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Creative Economy,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Other

URL http://www.phytopathdb.org/
 
Description The AgroChemical industry and in particular the Fungicide Resistance Action Committee (FRAC) have become considerably more interested in this database and how it can in the future be further developed to help their R and D activities. PHI-base provides improved general information and access to all known fungicide targets. These targets are all listed in PHI-base and marked up on the pathogen genome browser in PhytoPath. If reduced fungicide efficacy is noted, the Uniprot ID can be downloaded, PCR primers designed, and these primers can be used to check problematic isolates for the presence of novel variant sequences in the coding, non-coding and promoter sequences of the predicted gene model. This database is being used in the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through Biotechnology and Biological Sciences Research Council's Industrial Strategy Challenge Fund. This database is being used by diagnostic companies, following the emergence of problematic pathogenic microbes. The PhytoPath and PHI-base resources are of immediate benefit to all researchers in the medical, crop plant, animal, and model organism biosciences working on diseases caused by fungi, protists and bacteria. Both resources remove bottlenecks to new discoveries caused by data sets being unavailable, non-integrated, and/or incompatible with simple queries and complex analyses. To ensure use of the data by others, priority infectious microbes have been selected and included according to the interests of industrial and academic researchers in the UK. These two inter-connected resources provide standardised annotation, more powerful comparative analyses, and greater data access through interactive interfaces and new tools. The interpretation of genome-scale molecular biology and phenotyping data is a key component in the development of novel strategies for sustainable disease control in humans, cropped plants, and farmed animals. This biology has considerable academic, economic, social, and ecological value. Specifically, these resources have been developed to organise genome sequences, genetic variation data, and phenotypic data, and to make such data widely accessible. The resources permit users from multiple disciplines to perform genome-wide enquiries across wide taxonomic distances, whilst at the same time linking these analyses directly to pathogenic phenotypes associated with gene mutations, that have been curated from peer-reviewed literature. This immediate access by users speeds up analysis, and allows hypotheses to be rapidly confirmed or refuted. All previous studies on the same gene can be identified immediately, which eases adding further information to a new discovery, and prevents repetition of prior successful experiments. The availability of negative experimental data sets is particularly useful when modelling the mechanisms and pathways underlying infectious processes, and the genes, proteins and metabolites involved these processes. The inclusion of negative data was requested by PHI-base users in the mid 2000s, and curation of negative data is ongoing. As of version 4.6, PHI-base provides 2877 unaffected pathogenicity entries, constituting 25% of the total data. The full data downloads available for both PhytoPath and PHI-base means that industry-based users have complete access to these datasets for commercial purposes. Emerging infectious diseases (EIDs). Academic, industry and governmental organisations are increasingly monitoring alleles in pathogen genomes that, when mutated or transferred between species (by horizontal gene transfer or conjugation), lead to enhanced disease formation. The former type (mutations) are often the negative regulators in a biological system. As of version 4.6, PHI-base has 323 single gene entries classified as increased virulence (hypervirulence) covering 514 interactions and 98 pathogenic species. Pathogen effectors are increasingly being used successfully by plant breeders in traditional breeding programmes to remove susceptible loci, thereby reducing overall levels of a particular disease (for example, glume blotch disease on wheat caused by the fungus Parastagnospora nodorum). Alternatively, plant breeders are now using effectoromics screening with purified effector proteins - rather that direct pathogen tests - both to save time and money, to identify the presence of specific disease resistance (R) genes in germplasm collections, as well as in introgression, pedigree and hybrid breeding. This identification is done alongside marker-assisted selection, to ensure the retention of functional R alleles, and the absence of additional suppressor loci. As of version 4.6, PHI-base has 1965 entries termed 'effector', constituting 17% of all data. First host targets of pathogen effectors are also curated into PHI-base, and tagged in EnsemblPlants. These coupled datasets - although currently low in number at approximately 20 entries - are being used immediately by agricultural-biotechnology and plant breeding companies to develop novel disease control strategies: either through gene editing, or through introgression and hybrid breeding. It would also be possible to use this information to track the closest sequence-related alleles in taxonomically related plant species that are attacked by related pathogenic species. The potential discovery of novel anti-infective targets by users. There are 199 lethal entries in PHI-base, of which 83 are plant pathogen entries. Comparative genomics analyses is being done using the Ensembl BioMart tool to immediately infer that the same gene is likely to be lethal in a related species. Follow-up studies exploring sequence variation and protein modelling - using a range of pathogenic and non-pathogenic species- is then used by industry to determine if regions of these proteins could be targeted by specific chemistries. This analysis could provide pathogen control, while not affecting beneficial species. Increased use of PhytoPath and PHI-base in predictive biology. Both academic and industry based researchers are increasingly moving from exploring a single reference genome for a pathogenic species, to exploring the genomes of multiple isolates from a single species with different biological phenotypes, and going on to predict the core and variable parts of the pan-genome. Researchers want to know what has newly arrived in a population or species that previously was absent, and what has changed (for example, expanded or contracted gene families). Researchers often start by describing what gene content was found in all the older isolates (n= 100), then as new isolates with different phenotypes arise in a region (anywhere in the world), these isolates are collected and sequenced, then their genomes are compared with the reference core and variable pan-genome. What are the shifts? Is there evidence of horizontal gene transfer or the arrival of new lineages? Researchers can also explore where in the genome the new genes are found, to identify the hot spots, and these can be the focus of other genome surveys. The gene function datasets available from PHI-base are already becoming highly useful for these types and other types of pan-genome analyses. ? PHIPO and PHIDO. PHIPO provides a generic, extensible solution to describe pathogen-host interaction phenotypes. All terms continue to be logically defined in terms of external ontologies, and interoperability with other phenotype ontologies will be enabled by adopting standards defined by the Phenotype Ontologies Reconciliation Effort (which formalises phenotype ontology design across species) (https://github.com/obophenotype/upheno/wiki/Phenotype-Ontologies-Reconciliation-Effort). PHIPO is easily accessible, due to its inclusion in a catalogue of ontologies hosted by the OBO Foundry. Once in the OBO foundry, this ongoing ontology development is immediately accessible to other users in the global ontology development community.
First Year Of Impact 2015
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Other
Impact Types Societal,Economic,Policy & public services

 
Description BBSRC - DTP Studentship (Walker)
Amount £108,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2015 
End 09/2019
 
Description BBSRC Future Leaders Fellowship ( Neil Brown)
Amount £372,256 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2016 
End 03/2019
 
Description BBSRC Future leaders Fellowship - Rothamsted contribution
Amount £100,000 (GBP)
Organisation Rothamsted Research 
Sector Academic/University
Country United Kingdom
Start 04/2016 
End 03/2018
 
Description EMBRAPA personal fellowship (Gilvan)
Amount £16,000 (GBP)
Organisation Brazilian Agricultural Research Corporation 
Sector Public
Country Brazil
Start 07/2015 
End 07/2016
 
Description GRDC Australia with Australia National University, Camberra
Amount £144,000 (GBP)
Organisation Grains Research and Development Corporation 
Sector Public
Country Unknown
Start 01/2016 
End 12/2016
 
Description Industry -Syngenta
Amount £588,887 (GBP)
Organisation Syngenta International AG 
Sector Public
Country Global
Start 04/2016 
End 03/2020
 
Description Innovation Centres UK (CHAP)
Amount £2,469,000 (GBP)
Organisation Department for Business, Energy & Industrial Strategy 
Sector Public
Country United Kingdom
Start 03/2016 
End 03/2018
 
Description Rothamsted Research Fellowship
Amount £300,000 (GBP)
Organisation Rothamsted Research 
Sector Academic/University
Country United Kingdom
Start 01/2014 
End 12/2016
 
Description Work Solutions - James Seager
Amount £4,160 (GBP)
Organisation Hertfordshire County Council 
Sector Public
Country United Kingdom
Start 04/2018 
End 07/2019
 
Title Ensembl Invertebrates, especially Ensembl Fungi, Ensembl Protist and Ensembl bacteria 
Description Ensembl Invertebrates provides annotated genome and a range of simple and advanced query tools to explore the genomes of numerous pathogenic micro-organisms. The genes curated into the Pathogen Host Interactions database (PHI-base), are directly available within the genome browsers of individual species. This data is colour coded to provide the phenotypic outcome from wet biology experimentation and is linked back to the full curated data sets available within PHI-base. The PHI genes can also be searched for within the BioMart Tool across multiple species and by using nine published high level phenotype terms. 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact Improved comparative genomic analysis of multiple pathogenic and non-pathogenic species. Hypothesis testing. Providing up-to-date novel functional data into poorly annotated genomes. 
URL https://fungi.ensembl.org/index.html
 
Title FgMutantDB 
Description FgMutantDb was designed as a simple spreadsheet that is accessible globally on the web that will function as a centralized source of information on F. graminearum mutants. FgMutantDb aids in the maintenance and sharing of mutants within a research community. It will serve also as a platform for disseminating prepublication results as well as negative results that often go unreported. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Through the use of FgMutantDB missing annotations were feedback into larger multispecies fungal genomic databases including, FungiDB, Ensembl and PHI-base. 
URL https://www.sciencedirect.com/science/article/pii/S1087184518300021
 
Title PHI-base: Pathogen-Host Interactions Database 
Description PHI-base (www.phi-base.org) is a knowledge database accessed by researchers in over 125 countries. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. Genes not affecting the disease interaction phenotype are also curated. PHI-base data is linked to the genome browsers and advanced query tools in ENSEMBL and FungiDB. The data content provided comes from >3000 manually curated references and reports information on 6438 genes from 263 pathogens tested on 194 hosts (plant, animal, others) in 11340 interactions. Direct targets of pathogen effector proteins are also included. Recently the PHI-base team in collaboration with the PomBase team based at the University of Cambridge have developed an online author curation tool called PHI-Canto 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact Over 250 peer reviewed publications have cited PHI-base use in their article and cites one or more of the PHI-base references 
URL http://www.phi-base.org
 
Title Pathogen-Host Interactions Database 
Description PHI-base database contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions. Information is also given on the target sites of some anti-infective chemistries 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact PHI-base has since 2012 been awarded by the BBSRC National Capability status Over 130 peer reviewed publications have cited PHI-base use as part of their in silico analysis of pathogen host interactions, these are all cited on the website. The database is accessed by researchers located in 91 countries. Version 4.2 of the database was launched in September 2016 and allows all the curated information to be displayed and searched from within PHI-base. 90% of the plant pathogen entries in PHI-base can also be identified / searched for in another BBSRC sponsored resource called Phytopathdb run by the EBI Cambridge. PHI-base has been invited to write an 'Expression of Interest' to join the ELIXIR project in 2016 
URL http://www.phi-base.org
 
Description ELIXIR - Data for Life project 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Pathogen Host Interactions Database (PHI-base) is an Agrigenomics data resource provider into the ELIXIR project via the UK node of ELIXIR
Collaborator Contribution The UK Node of ELIXIR. ELIXIR-UK aims to incorporate and represent the widest possible range of UK activities in bioinformatics in ELIXIR. Three areas of focus: Enhancing training capacity and capability via the ELIXIR Training Platform Playing a leading role in the ELIXIR Interoperability Platform Providing the link between UK bioinformatics Tools and Data Resources and the wider ELIXIR ecosystem In 2016, ELIXIR-UK added the following resources to its portfolio, as a first step towards incorporating a wider range of the UK's bioinformatics activities into ELIXIR. The tools and data resources are classified under four strategic themes: 1. Protein Structure & Function Phyre2 CATH-Gene3D Jalview, and the Dundee Resource for Sequence Analysis and Structure Prediction 2. Imaging and Atlases Biomedical Atlas Centre 3. Human Health and Disease IUPHAR/BPS Guide to PHARMACOLOGY 4. Agri-science Pathogen Host Interactions Database (PHI-base) ENSEMBL Farmed and Domesticated Animals ELIXIR-UK will continue it add new resources to its portfolio over time. As part of this process, it has identified and road-mapped resources for future inclusion: EuPathDB, Ionomics Hub, SignaLink, BioCatalogue and Collaborative Open Plant Omics.
Impact PHI-base has been invited to apply for ELIXIR Core funding as a data resource provider. Outcome of application will be known Q2 2017.
Start Year 2016
 
Description EMBL-EBI ENSEMBL 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Highly curated information on the genes in multiple pathogens shown to be required for the disease causing ability. Monthly curation of the peer reviewed literature , to link gene sequence information to phenotypic information for 250 pathogenic species
Collaborator Contribution The quarterly mapping of the single gene - phenotype information onto the pathogen genomes with ENSEMBL. Also the creation of a bespoke genome portal called PhytoPath to display the genomes of hundreds of plant pathogen genomes within which this gene-to-phenotype information is displayed on the genome browser and can be searched for via the BioMart tool.
Impact Several joint publications, quarterly joint data releases since 2013 via ENSEMBL, PhytoPath and PHI-base. New funding since 2017 via the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through Biotechnology and Biological Sciences Research Council's Industrial Strategy Challenge Fund to explore the possibility of curating new data types into PHI-base relating to fungicide insensitivity, resistance linked to pathogen target information.
Start Year 2010
 
Description John Jones 
Organisation James Hutton Institute
Department Cell and Molecular Sciences
Country United Kingdom 
Sector Public 
PI Contribution The cyst nematode group at James Hutton Institute are particularly interested in predicting and characterising putative secreted effectors and their first host targets. With this in mind the Rothamsted team within the BBSRC funded PhytoPath project has started to curate this literature and to modify the PHI-base database schema to hold information on the 1st targets of effects in plants.
Collaborator Contribution Bioinformatics analyses of the sequenced nematode genome to predict the repertoire of secreted effectors
Impact One verified cyst nematode effector successfully entered into the PHI-base database namely Gr-VAP1
Start Year 2014
 
Description PHI-CANTO 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution The PHI-base team at Rothamsted Research is providing knowledge on pathogen interactions with hosts, pathogen species and strain information, descriptions of scientific terms used to describe phenotypic outcomes and knowledge on experimental methodologies,
Collaborator Contribution The CANTO Team at the University of Cambridge is providing expertise in Perl scripting and gene ontologies to convert the single species curation tool CANTO into the multi-species curation tool called PHI-Canto
Impact The overall idea to co-develop the PHI-Canto author curation tool is described in one section of a recent joint publication. Urban, M., Cuzick, A., Rutherford, K., Irvine, A., Pedro, H., Pant, R., Sadanadan, V., Khamari, L., Billal, S., Mohanty S. and. Hammond-Kosack, K.E. (2017) PHI-base: A new interface and further additions for the multi-species pathogen-host interactions database. Nucleic Acids Res 45, D604-D610. PMID:27915230
Start Year 2015
 
Description PHI-base collaboration with PomBase (PHI-CANTO and PHI-PO) 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Since Sept 2017 the PomBase team at the University of Cambridge and the PHI-base team at Rothamsted Research have held weekly meetings ( by Skype) as well as occasional face-to-face meetings to develop an new multi-species author curation tool called PHI-Canto as well as a new pathogen - host interaction ontology called PHI-PO and a new .pathogen host disease ontology called PHI-DO. The Rothamsted team have provided the biological, wet biology experimental and literature knowledge into this collaboration
Collaborator Contribution The PomBase team had already developed a highly successful single organism author curation tool called Canto. The PomBase team also bring a wealth of ontology development expertise into this collaborative project.
Impact Two joint posters will be given at the International Ontology Development conference to be held in Cambridge UK in April 2019. The presenting authors will be Dr Alayne Cuzick and Dr Val Wood.
Start Year 2017
 
Description PHIbase - FungiDB collaboration 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution The PHI-base team at Rothamsted Research is providing host-pathogen information including mutant phenotypes as functional gene annotation for the pathogen genomes available in FungiDB.
Collaborator Contribution FungiDB is displaying hyperlinks to all pathogen genes available in PHI-base.
Impact FungiDB/EuPathDB is an NIH Bioinformatics Resource Center for Infectious Diseases. FungiDB focusses on integrating functional genomic databases for the kingdom Fungi and PHI-base focusses on providing manually curated mutant phenotypes to microbial pathogen genome databases. Baldwin, T.T., Basenko, E., Harb, O., Brown, N.A., Urban, M., Hammond-Kosack, K.E., and Bregitzer, P.P. (2018). Sharing mutants and experimental information prepublication using FgMutantDb (https://scabusa.org/FgMutantDb). Fungal Genet Biol. doi: 10.1016/j.fgb.2018.01.002.
Start Year 2017
 
Description PHIbase - FungiDB collaboration 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution The PHI-base team at Rothamsted Research is providing host-pathogen information including mutant phenotypes as functional gene annotation for the pathogen genomes available in FungiDB.
Collaborator Contribution FungiDB is displaying hyperlinks to all pathogen genes available in PHI-base.
Impact FungiDB/EuPathDB is an NIH Bioinformatics Resource Center for Infectious Diseases. FungiDB focusses on integrating functional genomic databases for the kingdom Fungi and PHI-base focusses on providing manually curated mutant phenotypes to microbial pathogen genome databases. Baldwin, T.T., Basenko, E., Harb, O., Brown, N.A., Urban, M., Hammond-Kosack, K.E., and Bregitzer, P.P. (2018). Sharing mutants and experimental information prepublication using FgMutantDb (https://scabusa.org/FgMutantDb). Fungal Genet Biol. doi: 10.1016/j.fgb.2018.01.002.
Start Year 2017
 
Description Rothamsted - Syngenta Alliance - RoSy 
Organisation Syngenta International AG
Country Global 
Sector Public 
PI Contribution An industry:academia collaboration turning excellence in wheat science into cutting edge technology for UK and global farmers Recognising the strength and quality of wheat research available at Rothamsted Research, Syngenta is making a multi-million pound collaborative investment into a set of projects aimed at translating our excellence in wheat science into cutting edge technology for farmers. The capacity at Rothamsted was built following years of funding from the Biotechnology and Biological Sciences Research Council (BBSRC), and Rothamsted has been developing new knowledge and tools to increase UK wheat yield potential to 20 tonnes per hectare. Currently the average farm yield of wheat in the UK is 8.4 tonnes per hectare, dropping to just 3 tonnes per hectare world-wide. Additionally, the rate of yearly increase in wheat yields has declined since 1980. Wheat provides a fifth of human calories. The BBSRC-funded research is improving our understanding of how best to maximise and protect yield potential, determine soil resource interactions and use modelling approaches to support crop improvement. This strategic alliance with Syngenta enables Rothamsted to apply its scientific knowledge and skills to develop outputs that can be used by the company to develop new solutions for farmers in the UK and beyond. Altogether, across the 10 projects funded under this alliance, we have 27 scientists, (from 9 global Syngenta sites), with backgrounds in breeding and crop protection, interacting with 29 scientists from Rothamsted, from molecular biologists to modellers. For Dr Malcolm Hawkesford, 20:20 Wheat® Programme Lead, the level of knowledge exchange enabled by this alliance will lead to not only new products and better advice being developed for farmers, but also contribute to even more relevant science being undertaken at Rothamsted. Specifically the wheat pathogenomics research team at Rothamsted Research has been involved in four collaborative projects with Syngenta at three of their research sites exploring wheat, specific fungal pathogens and /or the development of novel functional genomics tools.
Collaborator Contribution "The objectives of this alliance are totally aligned with those of the Syngenta Cereals Strategy and provide integrated solutions to help growers maximise the yields they can get from their crop in a sustainable way", said Dr James Melichar, Head Product Selection Cereals EAME - Seeds Product Development in Syngenta. "Furthermore, although most of the activities funded under the alliance are focused on wheat, the breakthroughs from projects may be applicable to each of the cereal crops that Syngenta breeds". Syngenta are contributing into the four pathology projects, technical expertise, new knowledge and specific plant genotypes and fungal pathogens for detailed analysis.
Impact Peer reviewed publications, a submitted patent and fully funding the training of a PhD student.
Start Year 2013
 
Description Attendance at International Biocuration Conference (Stanford University, CA, USA) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This conference provides a forum for curators and developers of biological databases to discuss their work, promote collaboration and foster a sense of community in this very active and growing area of research. Attendance at this meeting and presentation of a PHI-base/PhytoPath poster enabled a wide international audience to learn and ask questions about the PHI-base and Phytopath projects. The provision of a doi for the poster also allowed researchers to have access to the poster after the event. The conference was an excellent opportunity to make contact with Biocuration experts, view posters and listen to talks on other biological databases and ontology development. Several contacts were made and follow up emails exchanged.
Year(s) Of Engagement Activity 2017
URL https://med.stanford.edu/biocuration.html
 
Description Cereal Show 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Cereals
A Fusarium Head Blight exhibit was situated at the Rothamsted Research stand (Cereals 2016). This exhibit portrayed the impact of Fusarium on wheat production and the associated risk of mycotoxin contamination. It highlighted the need for new approaches to tackle this hazardous fungal disease. During the two day event, knowledge of the approaches taken at Rothamsted, including those within the associated fellowship, were described to farmers, agronomists, the press and industry. This exhibit commonly promoted the discussion of the use of GM and non-GM mediated approaches to control fungal diseases.

This two day event is attended by > 25,000 visitors (approximately 10% from overseas) from the AgIndustries, AgriFood and Farming sectors, as well as the media, politicians and NGOs.
Year(s) Of Engagement Activity 2016
 
Description Fusarium event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact This one day Fusarium event was hosted at the Rothamsted Research (July 2016) educated the general public and interested stakeholders in the impact of Fusarium-borne diseases and the associated risk of mycotoxin contamination. It highlighted deficiencies is current approaches to prevent Fusarium-borne diseases and the need for new approaches to tackle this hazardous fungal disease. During this event, knowledge of the approaches taken at Rothamsted, including those within the associated fellowship, were described to the general public, farmers and agronomists. This exhibit commonly promoted the discussion of the use of GM and non-GM mediated approaches to control fungal diseases, and also described the background behind the use of host-induced gene silencing as a GM approach to fight fungal disease.

This event was from 4 to 8 pm and included a wheat and soybean field tour to visit and discuss three experiments, the running of a virtual laboratory for fungal pathogen transformation and analysis, a bioinformatics display and game to analyse sequenced fungal genomes, poster and live exhibits on the effect of fungal pathogens on various crops plant species and post harvest fruits and a poster display and talk on using GM and non-GM approaches to fight fungal diseases.
Year(s) Of Engagement Activity 2016
 
Description PHI-base entry into ELIXIR annual newsletter - 2017 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Information about the latest developments and data to have be unloaded into the Pathogen Host Interactions database were included in the annual electronic newsletter generated by ELIXIR UK
Year(s) Of Engagement Activity 2017
 
Description Rothamsted Research Annual Review book 2015/2016 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact An article was written for the Institute's annual report entitled ' Healthy Crops - Healthy Food'. This was used to focus attention on the 15% annual loss of crop productivity and harvested food due to plant disease problems. In the article we described the novel results that had been published on the Zymoseptoria tritici-wheat leaf interaction, via a combined transcriptome and metabolome analysis and virus induced gene silencing(VIGS). These are providing leads for the development of novel disease control options in wheat. The article also explained about the continuous updating of an open access internet resource called PHI-base which is being used increasingly to provide new insights for agricultural, biomedical, and ecological research.

.
Year(s) Of Engagement Activity 2016
URL http://www.PHI-base.org
 
Description STEM article targeting 11-16 year age group 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A joint team involving the four member PHI-base team at Rothamsted Research and three members of the EBI Ensembl Invertebrates team have devised a STEM article on Big Data and Plant Heath to attract more school pupils into a career in science. The six page article plus a 1 page activities sheet is entitled ' Saving Plants from Disease' . The article will be published in March 2019 in the second issue of a new outreach journal called Futurum Careers, published by Sci-Comm Consulting, UK. This company found the Abstract of the BBSRC BBR grant 'PhytoPath, an infrastructure for hundreds of plant pathogen genomes' ( PI Kim Hammond-Kosack) and also consulted with two members of the BBSRC to identify a science group specifically working with big data and wet biology.
Year(s) Of Engagement Activity 2018,2019