📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

A FAIR community resource for pathogens, hosts and their interactions to enhance global food security and human health

Lead Research Organisation: Rothamsted Research
Department Name: Protecting Crops and the Environment

Abstract

Infectious microbes continue to impose major costs on the UK farming and food industry and increasingly threaten global food security, commercial and ornamental tree health and ecosystem resilience. Similarly, due to the rise in resistance to antimicrobial compounds and increased globalisation of trade and travel, infectious microbes impose ever greater costs on public and private UK medical and veterinary providers and threaten human and animal health and wellbeing across the lifecourse. There is a substantial and diverse UK and international bioscience research community whose needs are addressed by this resource. As the biosciences become an increasingly data-intensive discipline and mega-scale data analyses become the new norm, building and maintaining community resources that ensure the Findability, Accessibility, Interoperability, and Reusability of data (i.e. are FAIR) will benefit many different bioscience disciplines.

In recent years, new possibilities for the study (and ultimately control) of pathogens have opened up through the application of high-throughput technologies for determining the molecular nature of life. These include genome sequencing - which reveals the genetic code that determines inherited properties of cells - and extends to monitoring the varied cellular contents at different stages of life and disease. This FAIR community resource is designed to capture broad molecular information from pathogenic organisms, and combine it with descriptive information about the process of infection, including more specific molecular information, e.g. about the pathogen and host proteins that interact during infection, the phenotype of the interaction outcome and flag up which pathogen proteins are already targeted by anti-infective chemicals. The new knowledge on pathogen genomes, patterns of gene expression and potential interacting partner is housed using the Ensembl platform. Ensembl contains a comprehensive suite of software for the management and display of genome-scale data. The new phenotypic knowledge on experimentally verified genes required for the disease-causing abilities of each pathogenic species will increasingly be curated by members of the scientific community into the Pathogen Host Interactions (PHI-base) database using a newly developed tool called PHI-Canto. A new curation focus will increase the details recorded about (i) the molecular interactions between the repertoires of small effector proteins produced by pathogens and their initial targets within each host species, and (ii) the pathogen targets for anti-infective chemistries. To support the ongoing curation efforts, new generic PHIPO ontologies (controlled definitions) will be developed to accurately describe the depth and breadth of pathogen-host interactions.

By further developing the interfaces (within and between) Ensembl genomes, PHI-base and other key e-sciences data/ information providers this will support the joint querying and visualisation of genomic and phenotypic data. We will also deploy new and existing tools (graphical and non-graphical) to improve inter-species comparative analysis and the integration of different large data types to speed up analyses and make new discoveries on the evolutionary origin of genes, mutations important in the process of infection and genes/ gene networks conferring host resistance, pathogen virulence or resistance to anti-infective chemicals.
We will continue to engage with the large and active UK research community in the biosciences to identify their current needs and emerging requirements through University/Institute visits, and will conduct training activities to demonstrate the potential use of the resource. We will engage with academic and industry based scientists in other countries by attending and presenting this FAIR community resource and its uses at international conferences and workshops.

Technical Summary

PHI-base is the phenotype data source provider. We will continue to curate the literature for ~200 pathogenic species and include emerging problematic species. New advanced curation will include (a) first host plant targets of pathogen effectors, (b) anti-infective targets and variant sequences causing chemical insensitivity, (c) ~8 specific genome landscape features. We will further develop the multi-species PHI-Canto tool to enable rapid, accurate and comprehensive publication based author curation. PHI-base data is to be made available in emerging data exchange formats (eg phenopackets) to increase interoperability and use. The new PHIPO ontologies to underpin this curation will be built using protégé and adhering to strict ontology development principles outlined by the obo-foundry.

The PHI-phenotype information will be mapped onto microbial genes in Ensembl Genomes; an established platform combining a relational database back-end for persistent, non-redundant storage of data with web-based tools, programmatic interfaces (including RESTful APIs) and the ability to export and upload (local or remote) annotation files in standard file formats (e.g. BAM, CRAM, VCF). Genomes are overlaid with variation/ transcriptome data along with whole genome alignments and pan species comparative relationships; allowing extrapolation of functional annotation, eg from well understood pathogens to under-studied, under-funded pathogens.

To provide a bigger context, we will functionally advance the Knetminer open-source software to integrate the PHI-data and ontologies with biological pathway (BioCyc) and protein-protein interaction data (BioGrid, IntAct) from eight model organisms to elucidate the cascading processes triggered by pathogen effectors and their first targets in the host. This will allow multi-species, cross-kingdom network visualisation and analysis. We will create biannual releases of the integrated knowledge base in FAIR compliant RDF and Neo4j graph formats.

Planned Impact

This FAIR community resource is aligned with the BBSRC fundamental and strategic research priorities to achieve sustainable global food security, and improve human and animal health and wellbeing across the life course.
This resource is of immediate benefit to all researchers in the medical, crop plant, animal and model organism biosciences working on diseases caused by fungi, protists and bacteria, and will remove bottlenecks to new discoveries caused by data sets being unavailable, non-integrated and/or incompatible for simple queries/complex analyses. Priority infectious microbes have previously been selected and included according to UK industrial and academic researcher interests. This project will provide standardised annotation, more powerful comparative analyses, and greater data access through interactive interfaces and new tools.
The interpretation of genome-scale molecular biology and phenotyping data is a key component in the development of novel strategies for sustainable disease control in humans, cropped plant, farmed animals and has considerable academic, economic, social and ecological value. Specifically, this FAIR resource will organise genome sequence, genetic variation and phenotypic data and make it widely accessible through a new set of interfaces and new tools to permit genome-wide enquiries, linked to literature-curated pathogenic phenotypes associated with gene mutations.
The driving rationale for the project, as well as its greatest potential for societal impact, is in two targeted sectors. Firstly, sustainably increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health). This FAIR resource and the associated new tools will provide access to existing and new knowledge for numerous phytopathogenic species. The second targeted sector is human health and medical interventions to ensure healthy ageing throughout the life course. Understanding pathogen gene function, host targets and downstream biological functions will aid novel drug discoveries, track clinical efficacy and help diagnostic companies follow emerging problematic pathogenic microbes.
The main route to achieving impact will be through raising (academic and commercial) user awareness and use of the resource. Potential beneficiaries include AgCompanies developing pesticides or attempting to breed new varieties of pathogen-resistant plants and pharmaceutical companies developing new healthcare products to stop/ minimise infectious microbes in the general human populations and within hospitals. More generally, farmers and the wider global population will benefit from improved strategies for disease control, although they are not expected to be among the direct users of the database. The PIs at each organisation will engage with society, the media and policy makers to make the case for the importance of research into crop plant and medically important pathogens in the context of rising global concern about food and energy security, human health, farmed animal health, ecosystem resilience and of the potential benefits of genomics in addressing these concerns.
The five project objectives have been chosen in the light of the above observations. Collectively, the objective is to put the increasing quantities of data being generated back in the hands of researchers in as useful a form as possible, and to allow them to see the full spectrum of experimental results - from the study of an individual mutant phenotype to information about gene expression or its variance in a population - in an integrated fashion.
 
Description Overview

This narrative gives the progress on developing a community resource for pathogens, hosts and their interactions. This project brings together five groups with demonstrable expertise in the areas of capturing, integrating and interrogating useful data from literature; data that would otherwise remain dispersed. Highlighted are advances in our data capture tools and ontologies, and continuous updates to open-access data available to the research community. The researchers involved are based at Rothamsted, with the ENSEMBL team at EMBL-EBI Cambridge the University of Cambridge and the company Molecular Connections located in Bangalore, India. The outputs from the Rothamsted team are highlighted below.

Releases of data to the community

PHI-base makes a data release twice a year. The current release version 4.8 was made on 16th September 2019 and contains 6,780 genes with 13,801 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi: 10.1093/nar/gku1165). This data has come from 268 pathogen species and 210 host species and has been manually biocurated from 3,454 peer-reviewed publications. Typically we curate 400-440 publications each year. The PHI-base resource has been cited 59 times in peer reviewed articles in 2019. In total 98% of the gene to phenotype entries available in PHI-base are also available from the genome browsers available in ENSEMBL fungi, protists or bacteria as well as FungiDB. In 2019, PHI-base had users in 130 countries, this included 10,960 users worldwide, of which 4639 were based in Europe and 3,183 were based in the UK.

New PHI-base gene-centric display

Molecular Connections created the current PHI-base user interface (version 4) in 2015. A new user interface is required to display future data curated by PHI-Canto, which will be far richer in content. We have decided on a two-step process to update the user interface. Firstly, to display the current 15 years' worth of PHI-base version 4 data within the new gene-centric pages; and secondly, to display the new PHI-Canto data in the same format.
A draft version of the first step is available at http://poc.molecularconnections.com/Phibase/#/home
Future public releases, from May 2020 onwards, will involve running the existing PHI-base user interface and the new gene-centric version in parallel. The gene-centric display is required to logically display phenotypes reported in multiple publications for one gene, as well as all gene synonyms and identifiers. The UniProt ID for the gene is provided on each gene page. A similar gene-centric approach is used by Pombase (pombase.org). In addition, the gene-centric display allows us to provide multiple pathogen-host interaction phenotypes reported for different tissues or hosts on a single page, together with all the assigned references.

Curation tool and ontology development

Robust curation systems are critical for interpreting, recording, sharing and analysing phenotypic observations. Our overall aim is to develop an open, generic, extensible infrastructure for the curation of pathogen-host interactions, which can be applied to any pathogen or host. We will use this infrastructure to create high-quality shareable annotations in the pathogen-host interaction curation space. Moreover, we will also enable the community to participate in the curation of their own publications, via a proven community curation system.

The PHI-Canto community curation tool - done in collaboration with the University Of Cambridge

We have developed PHI-Canto, a multi-species community curation tool for the PHI-base database (canto.phi-base.org) PHI-Canto is a web-based tool that enables professional curators and publication authors to curate peer-reviewed papers using terms from biological ontologies. PHI-Canto is an extension of the Canto (curation.pombase.org) software developed by PomBase (pombase.org), the model organism database for fission yeast (Rutherford et al., (2014) Bioinformatics 30: 1791-1792).
Canto required major development to support curation for PHI-base. Firstly, Canto was originally defined for single-species curation, and could therefore only be configured with a set of single-species identifiers. PHI-Canto adds support for UniProt identifiers, which cover gene products from many species. Using UniProt allows PHI-Canto to automatically retrieve information about the gene product required by PHI-base, including its description, sequence, cross-references to other databases, taxonomic lineage, etc. The curator is no longer required to enter any of this information manually, which reduces curation overhead and room for error.

Canto supported phenotype annotation, but only for a single species phenotype-genotype combination. PHI-Canto extended this to allow annotations on a pathogen-host interaction genotype, which is modelled as a composition of a pathogen genotype and a host genotype (referred to as a metagenotype). Curators can curate allele variants in either the host or the pathogen, or both, to permit maximum flexibility for each interaction.

PHI-Canto phenotype annotation also allows the curation of one or more strains for each species in a curation session. Note that the term 'strain' in PHI-Canto refers to any variation below the species level (including subspecies, pathovars, etc.); this matches the historical use of 'strain' in PHI-base. Changes introduced into strains (laboratory strain) are described in the genotype, as well as any background mutations, for example Ku-70.

Canto contains built-in help pages to assist users in the curation process. Much of this documentation was only applicable to fission yeast curation; we are in the process of adapting and extending the help text to cover pathogen-host curation.

The Pathogen-Host Interaction Phenotype Ontology

We are developing the Pathogen-Host Interactions Phenotype Ontology (PHIPO): a logically defined, pre-composed phenotype ontology comprising of two branches that describes either the single species phenotypes of pathogens or hosts singly, or the phenotypes of multi-species pathogen and host interactions. PHIPO has been accepted into the OBO Foundry to fill this domain space. As of February 2020, PHIPO contains 1068 terms, of which 993 have text definitions, and 447 have logical definitions.
PHIPO is built using the Ontology Development Kit, which aims to standardise ontology development according to the OBO Library principles. International efforts to improve interoperability between existing phenotype ontologies - by standardising their underlying formal design patterns (logical definitions) - are ongoing; PHIPO is part of this effort, and is in the process of being integrated with the Unified Phenotype Ontology (uPheno), which aims to establish the logical correspondence between PHIPO and phenotypes in other ontologies. PHIPO is freely available under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0), and is hosted both on GitHub and the OBO Foundry.

Relating PHI-base phenotypes to PHIPO phenotypes

PHI-base contains thousands of records annotated with the set of nine high-level phenotype terms (Urban et al., (2015) Frontiers in Plant Sciences 6, 605. doi: 10.3389/fpls.2015.00605). To enable us to continue to summarise the data using these classifiers, and to map legacy data to the new system, it was necessary to establish a correspondence between the legacy high-level terms and the new terms contained in PHIPO. Three methods were used to achieve this.

Terms describing chemical resistance and sensitivity have been directly mapped to new terms in the single species branch of PHIPO, specifically 'increased resistance to chemical' (PHIPO:0000022) and 'increased sensitivity to chemical' (PHIPO:0000021). These phenotypes contain subclasses describing resistance or sensitivity to a particular chemical entity (logically defined using the ChEBI ontology), for example 'resistance to ampicillin'.

Terms describing changes in pathogenicity, virulence, and mutualism are now captured using an annotation extension in PHI-Canto, that applies to all terms in the pathogen-host interaction phenotype branch. This allows an interaction to be annotated with a phenotypic outcome (called 'infective ability' in PHI-Canto), in addition to an observed phenotype. For example, an observed phenotype 'decreased extent of pathogen-associated host lesions' can be annotated with the extension 'reduced virulence'.

Terms describing effector phenotypes are now annotated using GO terms: first, the relevant pathogen gene is annotated with its molecular function, then the molecular function is related to the biological process 'effector-mediated modulation of host process' (GO:0140418). In cases where the molecular function is unknown, the pathogen gene is annotated with the biological process directly.

Following this new process, it is possible to capture information on the phenotypes observed when an effector is involved in a host interaction; this was not possible with the curation method previously used for PHI-base.

Publications selected for curation to improve PHIPO

A significant curation effort started in autumn 2019 and is still being progressed by Rothamsted and the University of Cambridge. We have partially or fully curated 31 publications, generating 678 annotations (as of February 2020). The papers selected for this effort cover a range of biological areas of interest to pathogen-host interactions: early-acting pathogen virulence proteins, receptor decoys, R-Avr interactions, secondary metabolite clusters required for pathogen virulence, first host targets of pathogen effectors, fungal toxins, bacteria-human and fungal-human interactions, and antifungal targets.

These papers were used to seed the ontology with a basic range of terms required for curating pathogen-host phenotypes. Curating the papers allowed us to check the applicability of these ontology terms to real publications, and also allowed us to assess the suitability of PHI-Canto itself. Specifically, the initial curations have already revealed a need to extend Canto with new annotation types: for example, interspecies complementation, and protein-protein interactions between pathogen and host.

Validating strain and disease names

Due to a former lack of use of controlled vocabularies, PHI-base has numerous data quality issues, particularly with regards to strain names and disease names. Work is ongoing to clean and validate the list of strains in PHI-base, cross-referencing against external authorities where possible (including the ATCC, and model organism databases such as MGI and FlyBase). The disease name list in PHI-base has also undergone cleaning and cross-referencing against the Mondo disease ontology, with the revised list used to create a supplementary ontology called PHIDO (the Pathogen-Host Interactions Disease Ontology). These changes are planned to be applied to PHI-base in a future release.

AI and curation of the anti-infective literature

Text mining is increasingly used to extract important information from research articles. A major challenge here is to handle the heterogeneity, varied quality, and diverse identifiers of the data. Studies by Pletscher-Frankild et al (Methods: 74, 83-89, 2015) suggests that automated data extraction of disease-gene associations from biomedical abstracts can assist and shorten the work of biocurators. For PHI-base, a pilot study has been done by Molecular Connections focusing on identifying all the publications over the past 10 years relevant to fungicides, and their targets in pathogens of plant and humans. To seed the first AI study, Rothamsted created a 'gold' set corpus consisting of both relevant and contextually-irrelevant documents. The positive corpus consists of 45 research articles from the MARDy database (downloaded from mardy.net on 4 October 2019) and 14 articles from PHI-base version 4.8. The negative corpus consists of 3439 curated articles from PHI-base where no fungicide chemistry is reported. The PubMed ID, author, chemistry, organism, and gene details will be programmatically extracted from the identified articles and used to prefill a PHI-Canto curation session for the benefit of future curators.

KnetMiner updates

KnetMiner (www.knetminer.org) is a digital research assistant that has a Google-like search interface, and makes use of predictive graph algorithms and interactive features to help scientists tell the stories of complex traits and diseases in any species. KnetMiner knowledge graphs follow FAIR principles, by modelling the data using standardised ontologies (where possible), and making the graph database accessible through standardised Cypher and SPARQL endpoints.
The first aim of the KnetMiner work package (WP) is to develop an integrated pathogen-host knowledge graph for major crop, pathogen and model organism genomes. Enriched with manually curated data from PHI-base, ChEBI and speculative information extracted from the scientific literature corpus, or through other means. The second aim of the WP is to customise KnetMiner to the needs of the pathogen-host scientists, improve its FAIRness, and tightly integrate KnetMiner and PHI-base to enable a more advanced user experience.

To achieve these two aims, we have started with improvements to the KnetMiner API endpoints and wrapper scripts (https://github.com/josephhearnshaw/genelist-api), to demonstrate a use case for programmatic access to analyse a large gene dataset, e.g. differentially expressed genes from an RNA-Seq experiment. In December 2019, we started the development of a major new platform (KnetSpace), which is tightly coupled with KnetMiner. It will allow scientists to store and collaborate on the curation of knowledge networks. Users will be able to share their knowledge graph data with other research groups, encouraging collaboration, and thus increasing the potential outreach of PHI-base and PHI-Canto data.

PHI-base links:
1. PHI-base version 4: www.phi-base.org
2. PHI-base GitHub page: http://github.com/PHI-base
3. PHIB-BLAST: http://phi-blast.phi-base.org
4. Gene-centric PHI-base 5 (beta version): http://poc.molecularconnections.com/Phibase/#/home
5. PHI-base Wikipedia page: http://en.wikipedia.org/wiki/PHI-base
6. PHI-Canto (multi-species community annotation tool): http://canto.phi-base.org/

2021 report
Over the past twenty years, techniques around sequencing have improved dramatically, enabling us to interpret the genomic makeups of any species with increasing accuracy and speed. This has paved the way for deeper, biological explorations; for instance, how exactly are these species interacting with each other on a molecular level, how do these interactions influence the outcome and what factors can change them. This grant has allowed multiple groups specialising in standardised information capture and the representation of genomic data to work together to focus on these interactions between pathogens and their hosts (animals, plants, insects and humans). This work enables the development of ontologies (precise definitions) to describe these interactions, intuitive interfaces to allow scientists to curate their experimental findings and software and databases to represent these in a way that they can be queried by anyone, freely, around the world to make predictions and develop testable hypotheses for the laboratory. The results of this grant can (and will) extend well beyond pathogens and hosts, into species in a variety of ecosystems (human gut, soil, water) and the applications of these efforts range from the discovery of new therapeutics, agriculture and understanding the impact of climatic fluctuations.

Data Releases and improving interoperability
PHI-base makes a data release twice a year. The current release, version 4.10, was made on 2 November 2020 and contains 7,681 genes with 15,928 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi:10.1093/nar/gku1165). This data has come from 274 pathogen species and 216 host species and have been manually biocurated from 3,914 peer-reviewed publications. Molecular Connections curated 457 publications in 2020. Of these 253 publications came from single gene studies, 110 publications from two-gene studies and 94 publications from complicated studies (three or more genes involved in study). The PHI-base resource has been cited 104 times in peer reviewed articles from 2019 to 5 March 2021.

These relevant genomes in Ensembl have been annotated with interaction data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity. More than 95% of the entries in PHI-base now map directly to the corresponding protein in Ensembl, ensuring a continuously high level of interoperability between these two data resources.

The microbial team within Ensembl are developing a data schema to model a test data set of microbial genes linked to first host target genes provided by Rothamsted Research. This will be accompanied by updates to their visualisation and query strategies. Initially these 1st host target entries will appear in Ensembl Plants.

New PHI-base gene-centric display
Molecular Connections created the current PHI-base user interface (version 4) in 2015. A new user interface is required to display the richer future PHI-Canto curation. We are implementing a two-stage process to update the user interface. Firstly, to display the existing knowledge corpus (curated over the part 15 years) within the new (version 5) gene-centric pages; and secondly, to display the new richer PHI-Canto data in the same display. The first version of the PHI-base user interface (version 5) has been developed in 2020 by Molecular Connections under guidance from Rothamsted Research, and is currently under further revision and testing. We expect the new interface to be completed by Nov 2021. From then onwards, future public releases will involve running the existing PHI-base 4 user interface and the new gene-centric PHI-base 5 version in parallel, until the user community becomes familiar with the new interface.

The PHI-Canto community annotation tool
We have further developed PHI-Canto, a multi-species community curation tool for the PHI-base database (canto.phi-base.org).
In the past year we have improved on and tested the following:
? Added support for strain synonyms in PHI-Canto: each primary name of a strain is now linked to its synonymous names, and curators can search for strains using these synonyms. This process required manual review of all unique strain names in PHI-base, plus revisions to reduce data redundancy and inaccuracy.
? Added support for curating experimental controls, through use of an extension on a 'pathogen-host interaction phenotype' annotation that relates it to a control interaction (usually wild type). The control interaction can also be annotated with phenotypes describing the 'normal' or wild type interaction outcome.
? Added a new 'gene-for-gene phenotype' annotation type to simplify curation of gene-for-gene experiments. The annotation type also links to an ontology of predefined interaction outcomes (PHIPO_EXT) for ease of use by the curator, which also supports inverse gene-for-gene interactions.
? Added a new disease annotation type, which replaces the previous process of curating the disease as an extension of every phenotype annotation. This reduces manual work for the curator since the disease only needs to be curated once for each interaction.
? Added a dedicated field for curating the Figure in the publication that relates to a particular annotation. Previously, this was captured in the annotation comments.
? Enabled the curation of wild type RNA and protein expression levels.
? Added a shortcut to quickly curate wild type alleles for any gene in the curation session.
? Formalised the curation of the delivery mechanism in an experiment as an experimental condition (as opposed to an independent data type).
? At least 81 issues closed on the PHI-Canto tracker - covering feature requests, bug-fixes, and discussion.
The original single species curation tool Canto contains built-in help pages to assist users in the curation process. Much of this documentation was only applicable to fission yeast curation; over the past year we have been adapting and extending the help text to cover pathogen-host curation in PHI-Canto.

Disease name curation
Initially, information pertaining to the disease caused by a pathogen-host interaction was captured by an annotation extension. The disadvantage of this approach was that disease information had to be related to every annotation on every metagenotype, despite the relevant information (host species, pathogen species, and infected host tissue) being unique to the metagenotype, and not its annotations. This resulted in excessive manual curation of redundant information. In response to these problems, we decided to add a dedicated annotation type for disease information. The annotation captures the host and pathogen species involved (as a metagenotype) and the infected host tissue (as an 'infected tissue' annotation extension). We expect this will reduce the amount of redundant information captured. During 2020, we flagged up all disease synonyms and aligned these to the predominantly used current disease name.

The Pathogen-Host Interaction Phenotype Ontology
We have continued to develop the Pathogen-Host Interactions Phenotype Ontology (PHIPO) which is maintained on the OBO Foundry. PHIPO provides terms with logical definitions that are already composed from other ontology terms (pre-composition). As of February 2021, PHIPO contains 1,109 terms, of which 1,035 have text definitions, and 438 have logical definitions. PHIPO is freely available under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0), and is hosted both on GitHub and the OBO Foundry.

Relating PHI-base phenotypes to PHIPO phenotypes
PHI-base contains thousands of records annotated with the set of nine high-level phenotype terms (Urban et al., (2015) Frontiers in Plant Sciences 6, 605. doi:10.3389/fpls.2015.00605). To enable us to continue to summarise the data using these classifiers, and to map legacy data to the new system, it was necessary to establish a correspondence between the legacy high-level terms and the new terms contained in PHIPO. Three methods were used to achieve this.
Terms describing chemical resistance and sensitivity have been directly mapped to new terms in the single species branch of PHIPO, specifically 'increased resistance to chemical' (PHIPO:0000022) and 'increased sensitivity to chemical' (PHIPO:0000021). These phenotypes contain subclasses describing resistance or sensitivity to a particular chemical entity (logically defined using the ChEBI ontology), for example 'resistance to ampicillin'.
Terms describing changes in pathogenicity, virulence, and mutualism are now captured using an annotation extension in PHI-Canto, that applies to all terms in the 'pathogen-host interaction phenotype' branch. This allows an interaction to be annotated with a phenotypic outcome (called 'infective ability' in PHI-Canto), in addition to an observed phenotype. For example, an observed phenotype 'decreased extent of pathogen-associated host lesions' can be annotated with the extension 'reduced virulence'.
Terms describing effector phenotypes are now annotated using GO terms: first, the relevant pathogen gene is annotated with its molecular function, then the molecular function is related to the biological process 'effector-mediated modulation of host process by symbiont' (GO:0140418). In cases where the molecular function is unknown, the pathogen gene is annotated with the biological process directly.
Following this new process, it is possible to capture information on the phenotypes observed when an effector is involved in a host interaction; this was not possible with the curation method previously used for PHI-base.

Publications selected for curation
A significant curation effort is now in progress by Rothamsted Research and the University of Cambridge, and we have partially or fully curated 34 publications, generating 846 annotations (as of January 2021). The papers selected for this effort cover a range of biological areas of interest to pathogen-host interactions: early-acting pathogen virulence proteins, receptor decoys, R-Avr interactions, secondary metabolite clusters required for pathogen virulence, first host targets of pathogen effectors, fungal toxins, bacteria-human and fungal-human interactions, and antifungal targets.

KnetMiner updates
Meetings have been conducted between the KnetMiner and the PHI-base team to ascertain how to model the PHI-base data and PHIPO ontologies as a RDF and Linked Property Graph. A prototype parser for PHI-Canto data has been developed to model and integrate the data into KnetMiner. We have started the development of Fungal knowledge graphs for key species including Fusarium culmorum, Fusarium graminearum and Zymoseptoria tritici.

We have started with improvements to the KnetMiner API endpoints and wrapper scripts (https://github.com/josephhearnshaw/genelist-api), to demonstrate a use case for programmatic access to analyse a large gene dataset, e.g. differentially expressed genes from an RNA-Seq experiment.
We started the development of a major new platform (KnetSpace), which is tightly coupled with KnetMiner. Knetscape will allow scientists to store and collaborate on the curation of knowledge networks. Users will be able to share their knowledge graph data with other research groups, encouraging collaboration, and thus increasing the potential outreach of PHI-base and PHI-Canto data. KnetSpace follows the FAIR principles, by making user networks easily findable, accessible, and enables networks to be reusable and interoperable in other web applications (such as PHI-base) via APIs. We aim to make KnetSpace beneficial to PHI-base and PHI-Canto data curators by facilitating their curation effort through simple access to a rich set of auto-generated annotations and links to relevant literature.


PHI-base data releases: The latest release of PHI-base, version 4.12, was on 2 September 2021 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.7, released on 27 May 2019 (i.e. pre the start of this funding), the data increased as follows: genes from 6304 to 8411 (25%), interactions from 12,467 to 18,190 (31%), pathogenic species from 266 to 279 (5%), host species from 199 to 228 (13%), diseases from 490 to 533 (8%), and references (publications) from 3216 to 4387 (27%). PHI-base phenotyping data has been freely available at FungiDB since 2019. UniprotKB links to PHI-base records since 2020.
PHI-base gene-centric display: PHI-base, in collaboration with Molecular Connections, developed a new version of the PHI-base website (available at www.phi5.phi-base.org) that aggregates all data in PHI-base by its related gene, i.e. one page per gene per species. Each gene is assigned a stable and unique identifier that is cross-referenced with identifiers from other genetic databases. Currently, the new interface only displays data curated with PHI-Canto; the remaining data in PHI-base will be migrated in Q1 2022. We extended the search interface of the new website to allow querying the new data types added by PHI-Canto curation.
PHI-Canto: PHI-base, in collaboration with PomBase, developed PHI-Canto: a free and open source online biocuration tool that allows the research community to curate and annotate pathogen-host literature with terms from biological ontologies. PHI-Canto is derived from Canto, the community annotation tool developed by PomBase. We extended Canto with the following features: support for any gene with an accession in UniProtKB; the ability to curate and annotate pathogen genotypes, host genotypes, and pathogen-host interactions involving mutant pathogens and mutant or wild-type hosts; support for annotating tissue type, disease caused, relation to wild-type controls, and outcomes of gene-for-gene interactions; and the ability to specify one or more strains for any species, either from a controlled list (which includes strain synonyms) or as free text. We documented these features in an online manual included with the software. So far, 36 publications have been fully or partially curated in PHI-Canto by professional curators.
PHI-base ontology development: We developed the Pathogen-Host Interaction Phenotype Ontology (PHIPO) to enable annotation of the outcomes of pathogen-host interactions across multiple species, and single species phenotypes for pathogens and hosts. We integrated PHIPO with the OBO Foundry and the Unified Phenotype Ontology (uPheno): the latter aligns our ontology semantics with other phenotype ontologies. As of 14 July 2021, PHIPO contains 920 terms, of which 536 (58%) have logical equivalence and interoperability with other ontologies, following a standard format defined by uPheno. We also developed supplementary ontologies to enable annotation of experimental conditions and diseases most relevant to pathogen-host experiments.
Identification of relevant anti-infective literature: Molecular Connections developed a bespoke text mining approach written in Java to identify candidate peer reviewed publications containing information on 150 commercial and/or experimental antifungal and antimicrobial chemistries. This literature (approx. 3000 articles between 1975 to 2020) is now available for further triaging and community curation using PHIPO ontology terms in PHI-Canto.
KnetMiner development: KnetMiner released version 5.0 of the KnetMiner software, developed a new knowledge graph for Fusarium culmorum and updated existing knowledge graphs for Fusarium graminearum and Zymoseptoria tritici. The knowledge graphs now integrate protein-phenotype relations from PHI-base. In addition, a COVID-19 KnetMiner knowledge graph was developed with endpoints for data access (https://f1000research.com/articles/10-703). KnetMiner was promoted to a full ELIXIR-UK service.
Ensembl Genomes developments: We have made 3-4 Ensembl microbial releases every year and now host 1505 fungal, 237 protist and 31,332 prokaryotic genomes. Our fungal genomes coverage has increased significantly (+491 genomes) due to a fresh import of data from the public archives and 15 genomes originating from VEuPathDB's fungal database, FungiDB. Our bacteria resources have adopted UniProt's redundancy definitions resulting in the removal of over 12,000 genomes, whilst maintaining coverage across 527 bacterial families. Where appropriate, genomes have been annotated with interaction phenotypic data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity and used the Ontology Lookup Service to facilitate standardised queries and vocabulary of our data. In support of objective 4, we have developed a new data schema to capture multiple types of interactors within an extensible schema (genes, proteins, synthetic molecules) capable of creating links across species. This will enable researchers to browse a fungal effector gene in Ensembl Fungi and navigate to its first host target in Ensembl Plants (and vice versa). A new Python pipeline has been written to verify and capture curated protein-protein interaction data from PHI-base into this model. Fungal and protist protein-coding genes have continued to be used to infer evolutionary trees by determining orthology and paralogy, and several species have pairwise whole genome alignments; allowing information from well studied pathogens to help elucidate mechanisms and processes occuring in novel or understudied pathogens. Key bacterial species have continued to be embedded in Ensembl's pan-taxonomic compara.
UK and international resource usage trends: Unique visitors to Ensembl microbial portals, PHI-base and KnetMiner continue to increase: For 2021, the total usage figures were: Ensembl microbial portals > 97,000 (protists -12%, fungi-35% and bacteria -53%, PHI-base >22,000 and > 900 full database downloads, Knetminer > 7,000 unique websites and > 11,000 API users. Total users > 137,000.
Project publications: We now have a total of six peer reviewed publications one on community curation (Helder, 2019), one on using PHI-base data in multi-pathogen species network analyses (Janowska-Sejda, 2019), two PHI-base database updates (Urban 2020, 2022) and two Ensembl Genomes non-vertebrate updates (Howe, 2020 , Yates, 2022). In particular, the two 2020 NAR articles are being well cited: Howe (265 citations-Google Scholar (GS), 167 citations-Web of Science (WoS)) and Urban (88 citations - GS, 91 citations WoS).

Janowska-Sejda et al. (2019) Front. Microbiol. 10, 2721; Pedro et al., (2019) Front. Microbiol. 07, 2477; Howe et al. (2020) Nucleic Acids Research 48, D689-D695; Urban et al. (2020) Nucleic Acids Research 48, D613-D620; Urban et al. (2021) "PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1037; Yates et al. "Ensembl Genomes 2022: an expanding genome resource for non-vertebrates." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1007.

2023 entry

PHI-base data releases: The latest release of PHI-base, version 4.14, was on 1 November 2022 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.7, released on 27 May 2019 (i.e. pre the start of this funding), the data increased as follows: genes from 6304 to 8993 (43%), interactions from 12,467 to 19,881 (59%), pathogenic species from 266 to 283 (6%), host species from 199 to 234 (18%), diseases from 490 to 542 (11%), and references (publications) from 3216 to 4847 (51%). PHI-base phenotyping data has been freely available at FungiDB since 2019. UniprotKB links to PHI-base records since 2020.
PHI-base gene-centric display: PHI-base, in collaboration with Molecular Connections, developed a new version of the PHI-base website (available at www.phi5.phi-base.org) that aggregates all data in PHI-base by its related gene, i.e. one page per gene per species. Each gene is assigned a stable and unique identifier that is cross-referenced with identifiers from other genetic databases. Currently, the new interface only displays data curated with PHI-Canto; data from previous versions of PHI-base are still to be migrated (discussed below). We extended the search interface of the new website to allow querying the new data types added by PHI-Canto curation.
PHI-base 4 data migration: We are developing a unified data import format for the new version of the PHI-base database that can merge data exported from PHI-Canto with data from a previous version of PHI-base (version 4). Of the 18,984 records in PHI-base 4.13, 15,483 records (82%) were found to be compatible with this new import format. So far, 16,556 phenotype annotations have been extracted from these records using an automated data pipeline. A small subset of this data has been confirmed to successfully load into the new PHI-base database and can be displayed on the new PHI-base website. Still to be migrated are annotations for disease names, Gene Ontology terms, and protein-protein interactions.
PHI-base data cleaning: Starting with PHI-base version 4.9, the list of pathogen and host strains in PHI-base have been manually reviewed to standardise nomenclature, remove redundancy, and correct curation errors. From version 4.9 to version 4.14, 965 pathogen strains and 1,500 host strains were amended. Host strains use the nomenclature of model organism databases wherever possible (e.g. MGI, WormBase, and TAIR). Cross-references were added to Expasy Cellosaurus and the Brenda Tissue Ontology for host cell lines. Disease names in PHI-base were also manually reviewed, starting with 568 names in version 4.10, of which 506 were amended, up to a total of 633 disease names amended as of version 4.14.
PHI-Canto: PHI-base, in collaboration with PomBase, developed PHI-Canto: a free and open source online biocuration tool that allows the research community to curate and annotate pathogen-host literature with terms from biological ontologies. PHI-Canto is derived from Canto, the community annotation tool developed by PomBase. We extended Canto with the following features: support for any gene with an accession in UniProtKB; the ability to curate and annotate pathogen genotypes, host genotypes, and pathogen-host interactions involving mutant pathogens and mutant or wild-type hosts; support for annotating tissue type, disease caused, relation to wild-type controls, and outcomes of gene-for-gene interactions; and the ability to specify one or more strains for any species, either from a controlled list (which includes strain synonyms) or as free text. We documented these features in an online manual included with the software. As of 7 March 2023, 36 publications have been fully curated and approved in PHI-Canto by professional curators. Eight videos with sub-titles have been made covering all different aspects of PHI-Canto curation process and these are accompanied by three PHI-base videos with sub-titles explaining how to search in this database and retrieve information for inclusion in other types of analyses. All 11 videos are available from YouTube.
PHI-base ontology development: We developed the Pathogen-Host Interaction Phenotype Ontology (PHIPO) to enable annotation of the outcomes of pathogen-host interactions across multiple species, and single species phenotypes for pathogens and hosts. We integrated PHIPO with the OBO Foundry and the Unified Phenotype Ontology (uPheno): the latter aligns our ontology semantics with other phenotype ontologies. As of 7 March 2023, PHIPO contains 952 terms, of which 532 (56%) have logical equivalence and interoperability with other ontologies, following a standard format defined by uPheno. We also developed supplementary ontologies to enable annotation of experimental conditions and diseases most relevant to pathogen-host experiments.
Identification of relevant anti-infective literature: Molecular Connections developed a bespoke text mining approach written in Java to identify candidate peer reviewed publications containing information on 150 commercial and/or experimental antifungal and antimicrobial chemistries. This literature (approx. 3000 articles between 1975 to 2020) is now available for further triaging and community curation using PHIPO ontology terms in PHI-Canto.
KnetMiner development: KnetMiner released version 5.0 of the KnetMiner software, developed a new combined knowledge graph for nine ascomycete fungiwhich are either of globally agricultural or medical importance pathogenic species or are key non-pathogenic model species: Aspergillus_fumigatus, Aspergillus_nidulans, Candida_albicans, Fusarium_culmorum, Fusarium_graminearum, Magnaporthe_oryzae, Neurospora_crassa, Saccharomyces_cerevisiae, Schizosaccharomyces_pombe, and Zymoseptoria tritici. This knowledge graph also now integrates protein-phenotype relations from PHI-base. In addition, a COVID-19 KnetMiner knowledge graph was developed with endpoints for data access (https://f1000research.com/articles/10-703). KnetMiner was promoted to a full ELIXIR-UK service.
Ensembl Genomes developments: We have made 3-4 Ensembl microbial releases every year and now host 1505 fungal, 237 protist and 31,332 prokaryotic genomes. Our fungal genomes coverage has increased significantly (+491 genomes) due to a fresh import of data from the public archives and 15 genomes originating from VEuPathDB's fungal database, FungiDB. Our bacteria resources have adopted UniProt's redundancy definitions resulting in the removal of over 12,000 genomes, whilst maintaining coverage across 527 bacterial families. Where appropriate, genomes have been annotated with interaction phenotypic data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity and used the Ontology Lookup Service to facilitate standardised queries and vocabulary of our data. In support of objective 4, we have developed a new data schema to capture multiple types of interactors within an extensible schema (genes, proteins, synthetic molecules) capable of creating links across species. This will enable researchers to browse a fungal effector gene in Ensembl Fungi and navigate to its first host target in Ensembl Plants (and vice versa). A new Python pipeline has been written to verify and capture curated protein-protein interaction data from PHI-base into this model. Fungal and protist protein-coding genes have continued to be used to infer evolutionary trees by determining orthology and paralogy, and several species have pairwise whole genome alignments; allowing information from well-studied pathogens to help elucidate mechanisms and processes occuring in novel or understudied pathogens. Key bacterial species have continued to be embedded in Ensembl's pan-taxonomic compara.
UK and international resource usage trends: Unique visitors to Ensembl microbial portals, PHI-base and KnetMiner continue to increase. For 2022, the total usage figures were: Ensembl microbial portals >87,900 (47.45% Bacteria, 38.15% Fungi, 14.39% Protists).PHI-base > 25,800 and > 1007 full database downloads, > 6,000 unique users and > 400 registered API users. Total users > 119,700.
Project publications: We now have a total of six peer reviewed publications one on community curation (Helder, 2019), one on using PHI-base data in multi-pathogen species network analyses (Janowska-Sejda, 2019), two PHI-base database updates (Urban 2020, 2022) and two Ensembl Genomes non-vertebrate updates (Howe, 2020 , Yates, 2022). In particular, the two 2020 NAR articles are being well cited: Howe (265 citations-Google Scholar (GS), 167 citations-Web of Science (WoS)) and Urban (88 citations - GS, 91 citations WoS). A 7th Manuscript on the PHI-Canto author curation tool and the PHI-Phenotype Ontology (PHI-PO) has been placed on bioRxiv whilst under review at eLife Cuzick et al. (2022). https://biorxiv.org/cgi/content/short/2022.12.15.520601v1.

Janowska-Sejda et al. (2019) Front. Microbiol. 10, 2721; Pedro et al., (2019) Front. Microbiol. 07, 2477; Howe et al. (2020) Nucleic Acids Research 48, D689-D695; Urban et al. (2020) Nucleic Acids Research 48, D613-D620; Urban et al. (2021) "PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1037; Yates et al. "Ensembl Genomes 2022: an expanding genome resource for non-vertebrates." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1007.

A nine month no cost extension was awarded to this project to cover the period 1st July 2022 - 31st March 2023 to catch up on several planned activities that were either delayed or suspended due to Covid-19. The project was further extended top 30th June 2023. We are reporting here on the period 1st April 2023 to 30th June 2023

PHI-base data releases: The latest release of PHI-base, version 4.16, was on 30 November 2023 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.15, released on 2 May 2023, the increase in data is: 289 genes, 726 interactions, 217 publications, 9 new pathogens, 8 new hosts, and 7 diseases. The current total amount of data in PHI-base is: 8,993 genes, 19,881 publications, 283 pathogens, 234 hosts, 542 diseases, and 4,847 publications. We continue to freely share phenotyping data with Ensembl (since 2011), FungiDB (since 2019) and UniProtKB (since 2020).
PHI-base data cleaning: Data cleaning for version 4.16 of PHI-base led to updates to 257 pathogen strain names (out of 2,032), 489 host strain names (out of 1,672), and 89 disease names (out of 581). Data values were manually reviewed to standardise nomenclature, remove redundancy, and correct curation errors. Host strains were updated to use the nomenclature of model organism databases wherever possible (e.g. MGI, WormBase, and TAIR), and cross-references were added to Expasy Cellosaurus and the Brenda Tissue Ontology for host cell lines. The running total of updated names is: pathogen strains: 1,452; host strains: 2,429; diseases: 844.
Curation: Six months of additional pathogen-host interaction literature (May 2023 - Oct 2023) has been curated by professional curators at Molecular Connections using the spreadsheet curation workflow, and is currently under review by the PHI-base team for the upcoming release of PHI-base 4.17 (due May 2024). Curation by Molecular Connections continues at a rate of 20 articles per month (starting from November 2023), with a focus on plant pathogens. Professional curators in the PHI-base team have curated and approved 56 publications using the PHI-Canto curation workflow, to a total of 83 approved publications, with curation of 47 publications still in progress.
PHI-base 5: development work on a beta version of the PHI-base website during this period included a) bug fixes and improvements to the text and layout of the website; b) enabling secure connections to the website and ensuring the PHI-base 5 system is compliant with security requirements at Rothamsted Research; c) improving the speed at which the system loads data; and d) creating a full download format for the database.
PHI-base 4 data migration: the following improvements were made to the data pipeline that migrates data from version 4 to version 5 of the PHI-base database: a) adding support for migrating 'in vitro growth' annotations from PHI-base 4; b) adding automated tests to assert the correctness of the pipeline's functionality; and c) improving a command-line interface to the pipeline to make it easier to use for technical users.
PHI-Canto: we made the following improvements to the PHI-Canto curation tool: a) added 46 pathogen strains and 6 host strains to the strain list that is available for curators to choose from, and b) updated the JSON schema file that describes the JSON export format produced by PHI-Canto, both to fix errors and describe new properties, such as ORCID IDs for curators.
Ontology development: we published 6 releases of the Pathogen-Host Phenotype Ontology (PHIPO) to Zenodo (most recently https://doi.org/10.5281/zenodo.10600650), and added 120 terms to the ontology, for a total of 1092 terms. The number of PHIPO ontology terms with a machine-readable logical definition remains unchanged at 532 terms. We also added 36 terms to the PHI-base Experimental Conditions Ontology (PHI-ECO; https://github.com/PHI-base/phi-eco), for a total of 304 terms created by PHI-base (additional terms were previously created by PomBase).
PHI-Canto fungicide resistance literature curation project: a project to curate fungicide resistance literature using PHI-Canto is underway, with a new methodology developed to select suitable publications and curate them. A novel annotation extension (AE) called 'alteration in archetype' has been developed in collaboration with Nichola Hawkins at NIAB. The purpose of this AE is to record the archetype species for a chemistry target site and the equivalent position of a residue alteration between the experimental species and the archetype species. To date (5 March 2024), 46 out of 82 curated publication sessions have been checked and approved and, of these 29 out of 46 contain the novel AE 'alteration in archetype'.
Tutorial videos: we uploaded 5 more tutorials to YouTube on the PHI-Canto curation process, for a total of 13 tutorials. The new tutorials covered an introduction to PHI-base, benefits of contributing curation, how to choose a publication to curate, and instructions on curating inverse and normal gene-for-gene interactions. Subtitles were added and manually reviewed for all videos.
PHI-Canto: Introduction to PHI-base
https://youtu.be/5GlKRM9qshk

PHI-Canto: Choosing a suitable publication
https://youtu.be/J6z4MlU8Bkc

PHI-Canto: Why should I contribute?
https://youtu.be/QQGZBnUYZcA

PHI-Canto: Annotating a gene-for-gene interaction
https://youtu.be/_og3umfy4WU
PHI-Canto: Annotating an inverse gene-for-gene interaction
https://youtu.be/KKhomX11TMs

Project publications: ...Within the current reporting reporting period we have a peer-reviewed publication describing our new community curation tool PHI-Canto (Cuzick et al 2023) and an associated eLife Digest (https://elifesciences.org/digests/84658/all-in-one-place).
Cuzick A, Seager J, Wood V, Urban M, Rutherford K, Hammond-Kosack KE. A framework for community curation of interspecies interactions literature. Elife. 2023 Jul 4;12:e84658. doi: 10.7554/eLife.84658. PMID: 37401199; PMCID: PMC10319440.
KnetMiner
The main KnetMiner 2021 publication has been cited 64 times (March 2024). KnetMiner had over 5,500 unique guest users and a total of 450 registered users in 2023. A major achievement has been the standardization of our data pipeline and the extension to fungal species. We have developed a framework to build multi-species knowledge graphs for genomes available in Ensembl Compara. The backend of KnetMiner is in the process of being redeveloped using Neo4j to improve scalability, interoperability and reusability of our knowledge graphs.
UK and international resource usage trends: PHI-base > 25,800 and > 1007 full database downloads, > 6,000 unique users and > 400 registered API users. Total users > 119,700 over the period.
Exploitation Route In October 2019, PHI-base successful applied to remain as a gold standard provider of Agrigenomics data within the UK node of the ELIXIR @Data for Life ' project.

In 2020, Knetminer successful applied to be a provider of Agrigenomics and genomics data within the UK node of the ELIXIR 'Data for Life ' project.

In 2021, 2022 and 2023 we engaged with various scientific journals to determine whether they would become early adopters on the PHI-Canto author curation tool. We are now working with the journal New Phytologist to capture at source all the information from newly published articles that are with the PHI-base scope.
Sectors Agriculture

Food and Drink

Chemicals

Healthcare

Pharmaceuticals and Medical Biotechnology

URL http://www.PHI-base.org
 
Description This database provides information to help SMEs and larger companies develop diagnostics for specific genes of interest in key pathogenic species and to explore for new intervention targets. The same data can be used in the public sector to develop diagnostic markers for key pathogens and key alleles The AgroChemical industry and in particular the Fungicide Resistance Action Committee (FRAC) have become considerably more interested in this database and how it can in the future be further developed to help their R and D activities. The database has been used in the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through Biotechnology and Biological Sciences Research Council's Industrial Strategy Challenge Fund. The curation of 1st host targets of pathogen effectors into PHI-base and then ENSEMBL plants s useful for plant breeding companies, to remove / modify susceptibility alleles from breeding programmes
First Year Of Impact 2020
Sector Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Environment,Healthcare
Impact Types Economic

 
Description AgroServ: Integrated SERVices supporting a sustainable AGROecological transition
Amount £13,000,000 (GBP)
Funding ID 10054008 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 04/2023 
End 05/2028
 
Description BBSRC - IPG - Large Equipment - QRT-PCR machine
Amount £2,837,000 (GBP)
Funding ID IGP20-023 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 07/2020 
End 03/2021
 
Description DFW - Designing Future Wheat - Work package 2 (WP2) - Added value and resilience
Amount £7,068,842 (GBP)
Funding ID BBS/E/C/000I0250 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 03/2017 
End 03/2023
 
Description Delivering Sustainable Wheat: Delivering Resilience to Biotic Stress (Rothamsted Research)
Amount £575,550 (GBP)
Funding ID BBS/E/RH/230001B 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 03/2023 
End 03/2028
 
Description Jade Smith - Investigating fungal pathogen effector localisation within plant cells - SWBioDTP 2023-2027
Amount £130,000 (GBP)
Funding ID 229139594 SWBio DTP Rothamsted studentship - University of Bath 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2023 
End 09/2027
 
Description SW-BioDTP - Victoria Armer - Exploring communication mechanisms between fungal pathogens and plant cells
Amount £120,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2020 
End 09/2024
 
Description SWBio-DTP - Erika Kroll - Fusarium disease of wheat - exploring tissue specific host-pathogen interactions using a systems biology approach.
Amount £120,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2020 
End 09/2024
 
Description The Genetics Society (UK) - Travel award to attend the IS-MPMI Congress Glasgow 14-18 th July 2019
Amount £750 (GBP)
Organisation Rothamsted Research 
Sector Academic/University
Country United Kingdom
Start 06/2019 
End 07/2019
 
Description UKRI/BBSRC-NSF/BIO Determining the Roles of Fusarium Effector Proteases in Plant Pathogenesis
Amount £813,377 (GBP)
Funding ID BB/X012131/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 02/2023 
End 01/2027
 
Description US Wheat and Barley Scab Initiative project - Engineering Gene-for-Gene Resistance to Fusarium Head Blight in Wheat and Barley. (Scofield, Helm and Innes)
Amount $57,200 (USD)
Organisation U.S. Department of Agriculture USDA 
Department Agricultural Research Service
Sector Public
Country United States
Start 06/2020 
End 09/2022
 
Title PHI-Canto 
Description This community curation tool and framework tool permits authors of in scope peer reviewed publication to manually enter all the data and findings from their publication into the open access PHI-base database, using controlled ontologies and evidence codes. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Highly fragment data sets on any pathogen host interaction or any chemical interaction with a pathogen species, published in all peer review articles is converted to FAIR data 
URL https://doi.org/10.7554/eLife.84658
 
Title PHI-Canto: Annotating a gene-for-gene interaction 
Description PHI-Canto: Annotating a gene-for-gene interaction. This is a specific type of host-pathogen interaction that determines the overall outcome of the interaction namely the resistance or susceptibility of the host and the ability of the pathogen to cause disease or not. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact The authors of a relevant peer reviewed publication can curate their complicated data sets directly into the PHI-base database using the PHI-Canto tool. 
URL https://youtu.be/_og3umfy4WU
 
Title PHI-Canto: Annotating an inverse gene-for-gene interaction 
Description PHI-Canto: Annotating an inverse gene-for-gene interaction into the PHI-base database. https://youtu.be/KKhomX11TMs 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact The authors of a relevant peer reviewed publication can curate their complicated data sets directly into the PHI-base database using the PHI-Canto tool. The are genetically the rarer interaction types compared to gene-for-gene interaction types. 
URL https://youtu.be/KKhomX11TMs
 
Title PHI-Canto: Why should I contribute? 
Description PHI-Canto: the personal and professional benefits to authors of contributing a peer reviewed article using this community curation tool https://youtu.be/QQGZBnUYZcA 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Soon after publication, more authors will spend the time curating their research articles via PHI-Canto into the PHI-base database 
URL https://youtu.be/QQGZBnUYZcA
 
Title PHI-NETS 
Description Protein-Protein interaction Networks have been developed and published for 13 pathogenic ascomycete species that infect cereal and non-cereal species 
Type Of Material Model of mechanisms or symptoms - non-mammalian in vivo 
Year Produced 2019 
Provided To Others? Yes  
Impact The new PHI-NET resource permits for the 1st time comparative genomics analyses between multiple ascomycete fungal species by network analyses. 
URL http://www.phi-base.org/
 
Title The Pathogen-Host Interactions Database (PHI-base) 
Description This is described in the Wikipedia entry - https://en.wikipedia.org/wiki/PHI-base 
Type Of Material Improvements to research infrastructure 
Provided To Others? Yes  
Impact PHI-base makes two data releases per annum in May and November. In the reporting period the following has occurred The latest release of PHI-base, version 4.15, was on 2 May 2023 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.14, released on 1 November 2022, the increase in data is: 384 genes, 1,069 interactions, 237 publications, 2 new pathogens, and 2 new hosts. The number of diseases was reduced by one, due to data cleaning efforts removing redundant and incorrect disease names. We continue to freely share phenotyping data with Ensembl ( since 2011), FungiDB (since 2019) and UniProtKB (since 2020). UK and international resource usage trends: PHI-base > 25,800 and > 1007 full database downloads, > 6,000 unique users and > 400 registered API users. Total users > 119,700 during the reporting period April 2023-March 2024. 
URL http://www.phi-base.org/
 
Title Video - PHI-Canto: Choosing a suitable publication 
Description PHI-Canto: Choosing a suitable publication https://youtu.be/J6z4MlU8Bkc 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact How to select an in scope publication for author curation using the community curation tool - PHI-Canto 
URL https://youtu.be/J6z4MlU8Bkc
 
Title Video - PHI-Canto: Introduction to PHI-base 
Description https://youtu.be/5GlKRM9qshk 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Improved understanding of what PHI-base is about for first time users of the database 
URL https://youtu.be/5GlKRM9qshk
 
Title Additional file 17: of Inter-genome comparison of the Quorn fungus Fusarium venenatum and the closely related plant infecting pathogen Fusarium graminearum 
Description Fusarium venenatum presence of TRI6 Fusarium greaminearum binding sites predicted by Nasmith et al. [39]. Fusarium venenatum BLASTP alignment percentages were added to identify presence or absence. (XLS 61Â kb) 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Placing the chromosome scale, fully annotated F. venenatum genome in the public domain has increased the power of comparative genomics for cereal infecting Fusarium species. 
URL https://springernature.figshare.com/articles/Additional_file_17_of_Inter-genome_comparison_of_the_Qu...
 
Title COVID-19 KG 
Description First release of COVID-19 Knowledge Graph for KnetMiner. Available in OXL, RDF and Neo4j format. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Increased citations and collaboration requests for KnetMiner. 
URL https://f1000research.com/articles/10-703
 
Title PHI-base successfully reapplied to be a member of the UK node of the Europe wide Elixir, Data for Life project 
Description PHI-base (www.phi-base.org) is a knowledge database accessed by researchers in over 125 countries. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. Genes not affecting the disease interaction phenotype are also curated. PHI-base data is linked to the genome browsers and advanced query tools in ENSEMBL and FungiDB. The data content provided comes from >3400 manually curated references. PHI-base makes a data release twice a year in May and September. The current release version 4.8 was made on 16th September 2019 and contains 6,780 genes with 13,801 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi: 10.1093/nar/gku1165). This data has come from 268 pathogen species (bacteria, fungi and protists) and 210 host species (plant, animal, others) and has been manually biocurated from 3,454 peer-reviewed publications. Typically we curate 400-440 publications each year. The PHI-base resource has been cited over 330 times with 59 citations appearing in peer reviewed articles in 2019. These citations are all listed in the about section of the database. Direct targets of pathogen effector proteins are also included. Recently the PHI-base team in collaboration with the PomBase team based at the University of Cambridge have developed an online author curation tool called PHI-Canto which is under beta testing amongst the UK community. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact PHI-base has since 2016 provided gold-standard Agrigenomics information of plant pathogens and their hosts into the UK node of the Elixir Data for life project. In 2019 we successfully re-applied for PHI-base to remain a member of the UK node of Elixir. Over 330 peer reviewed publications have cited PHI-base use in their article and cites one or more of the PHI-base references. Fifty- nine 59 PHI-base citations have appear in peer reviewed articles in 2019. 
URL http://www.PHI-base.org
 
Title PHI-base: the Pathogen-Host Interactions Database, version 5.0 
Description The Pathogen-Host Interactions Database (PHI-base) is an online database that catalogues experimentally-verified pathogenicity, virulence and effector genes from fungal, oomycete, and bacterial pathogens, which infect animal, plant, fungal, and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Information in PHI-base is manually curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as references to the literature in which the original experiments are described. Annotations are made using terms from ontologies and controlled vocabularies, including the Gene Ontology (GO), Brenda Tissue Ontology (BTO), and the Pathogen--Host Interaction Phenotype Ontology (PHIPO). PHI-base 5 includes data that was curated using a new curation process described in Cuzick et. al (2023). Data releases for PHI-base 5 do not use the same schema as data releases from PHI-base 4, but all data records from PHI-base 4 that can be made compatible with the new schema are included with this release. Data releases from PHI-base 4 and PHI-base 5 will occur in parallel until such time that all data from PHI-base 4 can be migrated to PHI-base 5. The PHI-base 4 data releases are  available on Zenodo at https://zenodo.org/doi/10.5281/zenodo.5356870.  Data content phi-base_v5.0.xlsx: the PHI-base dataset as an Excel spreadsheet. This format follows the layout of the PHI-base 5 website, with sheets corresponding to the sections of gene pages on the website. This format is designed for use by non-technical users. phi-base_v5.0.json: the PHI-base dataset in JSON format. This format is closer to the data format that is exported by PHI-Canto, the curation tool used by PHI-base. This format is primarily intended for programmatic usage and has additional data (e.g. metadata for curation sessions) that is not included in the spreadsheet format. phi-base.schema.json: a JSON Schema file for the JSON format of the dataset. This is included as documentation for the fields in the JSON file, but can also be used to validate the dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.10722192
 
Title PHI-base: the Pathogen-Host Interactions Database, version 5.0 
Description The Pathogen-Host Interactions Database (PHI-base) is an online database that catalogues experimentally-verified pathogenicity, virulence and effector genes from fungal, oomycete, and bacterial pathogens, which infect animal, plant, fungal, and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Information in PHI-base is manually curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as references to the literature in which the original experiments are described. Annotations are made using terms from ontologies and controlled vocabularies, including the Gene Ontology (GO), Brenda Tissue Ontology (BTO), and the Pathogen--Host Interaction Phenotype Ontology (PHIPO). PHI-base 5 includes data that was curated using a new curation process described in Cuzick et. al (2023). Data releases for PHI-base 5 do not use the same schema as data releases from PHI-base 4, but all data records from PHI-base 4 that can be made compatible with the new schema are included with this release. Data releases from PHI-base 4 and PHI-base 5 will occur in parallel until such time that all data from PHI-base 4 can be migrated to PHI-base 5. The PHI-base 4 data releases are  available on Zenodo at https://zenodo.org/doi/10.5281/zenodo.5356870.  Data content phi-base_v5.0.xlsx: the PHI-base dataset as an Excel spreadsheet. This format follows the layout of the PHI-base 5 website, with sheets corresponding to the sections of gene pages on the website. This format is designed for use by non-technical users. phi-base_v5.0.json: the PHI-base dataset in JSON format. This format is closer to the data format that is exported by PHI-Canto, the curation tool used by PHI-base. This format is primarily intended for programmatic usage and has additional data (e.g. metadata for curation sessions) that is not included in the spreadsheet format. phi-base.schema.json: a JSON Schema file for the JSON format of the dataset. This is included as documentation for the fields in the JSON file, but can also be used to validate the dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.10722193
 
Title Pathogen-Host Interactions database 
Description Pathogen-Host Interactions database 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Two new releases Version 4.15 May 2nd 2023 Version 4.14 November 1st 2022 
URL http://www.phi-base.org
 
Title The Pathogen-Host Interactions Database, version 4.13 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.13, contains information from 4,611 publications, covering 18,982 pathogen-host interactions and 8,708 pathogen genes across 282 pathogen species and 233 host species. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.12800133
 
Title The Pathogen-Host Interactions Database, version 4.14 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.14, contains information from 4,847 publications, covering 19,881 pathogen-host interactions and 8,993 pathogen genes across 283 pathogen species and 234 host species. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.13480920
 
Title The Pathogen-Host Interactions Database, version 4.15 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.15, contains 5,084 publications, covering 20,950 pathogen-host interactions and 9,377 pathogen genes across 285 pathogen species and 236 host species. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.13483233
 
Title The Pathogen-Host Interactions Database, version 4.16 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.16, contains 5,301 publications, covering 21,676 pathogen-host interactions and 9,666 pathogen genes across 294 pathogen species and 244 host species. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.13484596
 
Title The Pathogen-Host Interactions Database, version 4.17 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.17, contains 5,521 publications, covering 22,408 pathogen-host interactions and 9,973 pathogen genes across 296 pathogen species and 249 host species. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.5356870
 
Title The Pathogen-Host Interactions Database, version 4.17 
Description PHI-base is an online database (available at phi-base.org) that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is a valuable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. Each entry in PHI-base is curated by domain experts and is supported by strong experimental evidence (for example, gene disruption and gene complementation experiments), as well as literature references in which the original experiments are described. Each gene in PHI-base is presented with its nucleotide sequence and deduced amino acid sequence (available in a FASTA file), as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using ontologies, controlled vocabularies, and links to external sources (including UniProt, Gene Ontology, Enzyme Commission, NCBI Taxonomy, EMBL, PubMed and FRAC). This PHI-base dataset is a Frictionless Data Package that contains an export of the PHI-base database in CSV format (comma-separated values), plus a FASTA file with sequences for each gene in the database. This version of the dataset, version 4.17, contains 5,521 publications, covering 22,408 pathogen-host interactions and 9,973 pathogen genes across 296 pathogen species and 249 host species. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.13485488
 
Description EMBL-EBI ENSEMBL 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Highly curated information on the genes in multiple pathogens shown to be required for the disease causing ability. Monthly curation of the peer reviewed literature , to link gene sequence information to phenotypic information for 250 pathogenic species
Collaborator Contribution The quarterly mapping of the single gene - phenotype information onto the pathogen genomes with ENSEMBL. Also the creation of a bespoke genome portal called PhytoPath to display the genomes of hundreds of plant pathogen genomes within which this gene-to-phenotype information is displayed on the genome browser and can be searched for via the BioMart tool.
Impact Several joint publications, quarterly joint data releases since 2013 via ENSEMBL, PhytoPath and PHI-base. New funding since 2017 via the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through Biotechnology and Biological Sciences Research Council's Industrial Strategy Challenge Fund to explore the possibility of curating new data types into PHI-base relating to fungicide insensitivity, resistance linked to pathogen target information.
Start Year 2010
 
Description EMBRAPA Brazil - Bioinformatics Laboratory, Cenargen Brasilia 
Organisation Brazilian Agricultural Research Corporation
Country Brazil 
Sector Public 
PI Contribution The genomes of 16 well-characterised Fusarium graminearum (Fg) (15-ADON) isolates, eight each from Parana and Rio Grande du Sul states, were sequenced by Illumina paired end reads. The highly virulent isolate CML3066, with the best sequence coverage (x180), was nominated as the Brazilian reference isolate. We have subsequently created the pan genome for Fg using this data and an additional six global Fg stains including the global reference strain PH-1 originally from the USA. The focus at Rothamsted was then the characterisation of genes predicted to code for small secreted proteins or predicted to reside with discrete secondary metabolite clusters. The sequence variation in the known Fg pathogenicity and virulence genes documented in the PHI-base database has also been explored. To complement these comparative genome analyses, the relative disease causing ability of the 16 Brazilian isolates compared to the global reference strain has been explored in detail.
Collaborator Contribution The EMBRAPA bioinformatics team have applied their expertise in transmembrane spanning proteins to explore the predicted G-protein coupled receptor, 7 transmembrane spanning protein superfamiliy. This superfamiliy contains > 100 genes and some of there are now know to be required for the disease causing ability of Fg.
Impact Three publications have already arisen from the initial joint genome data analysis on the Fg PH-1 genome, which was done in preparation for the main project.. Bresso, E., Leroux, V., Urban, M., Hammond-Kosack, K.E., Maigret, B.. and Martins, N.F. (2016) Structure-based virtual screening of hypothetical inhibitors of the enzyme longiborneol synthase, a possible target to reduce Fusarium head blight disease. Journal of Molecular Modeling 22, 1-13. Martins, N.F., Bresso, E., Togawa, R. C., Urban, M., Antoniw, J., Maigret, B. and Hammond-Kosack, K.E. (2016) Searching for novel targets to control wheat head blight disease. I- Protein identification, 3D modeling and virtual screening. Advances in Microbiology 6 (11), 811-830. Doi 10.4236/aim.2016.611079. Bresso, E., Togawa, R. C., Hammond-Kosack, K.E., Urban, M., Maigret, B. and Martins, N.F (2016). GPCRs from Fusarium graminearum, detection, modeling and virtual screening - the search for new routes to control head blight disease. BMC Bioinformatics 17 (18), 39. PMID: 28105916. These joint studies were multi-disciplinary and involved bioinformatics and protein modelling.
Start Year 2013
 
Description EMBRAPA Brazil - Passo fundo - Trigo (wheat) team 
Organisation Embrapa Trigo
Country Brazil 
Sector Private 
PI Contribution The Rothamsted Team has sequenced the genomes of 16 Brazilian strains of the Fusarium head blight disease causing strain F. graminearum (Fg) and has now created a pan -genome for this species by comparing with the available Fg genomes for 6 six additional global isolates including the reference isolate PH-I. This has indicated that the Fg pan genome is relatively closed. The Rothamsted Team has stably transformed the Brazilian commercial wheat cultivar Guaramin for the first time
Collaborator Contribution The EMBRAPA Trigo team have generated various transgenic Arabidopsis harboring different Fg HIGS constructs. The EMBRAPA Team have also screened and identified lettuce cultivars that are fully susceptible to the nominated reference Brazilian strain.
Impact A major display at the annual Cereals event held in Cambridgeshire in June 2016 on the new HIGS and SIGS technologies for the control of FHB disease in wheat . A open evening public event entitled @ Healthy Crops- Healthy Food done at Rothamsted Research in July 2016.
Start Year 2014
 
Description KnetMiner ELIXIR-UK Service 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We applied for KnetMiner to become a full ELIXIR-UK service.
Collaborator Contribution Following the Scientific Development Group meeting, which was held on 3rd September 2021, we are delighted to tell you that KnetMiner scored 1 - which means the panel considered that this service meets all the ELIXIR-UK criteria and is Infrastructure ready and we are delighted to include it as an ELIXIR-UK service.
Impact Invitation to participate in ELIXIR Implementation Study for Plant Sciences. Invitation to join a EU HORIZON-INFRA grant proposal.
Start Year 2021
 
Description Molecular and biological characterisation of Fusarium species and isolates collected from infected wheat fields in Southern Brazil 
Organisation Federal University of Viçosa
Country Brazil 
Sector Academic/University 
PI Contribution Rothamsted team has used a next generation sequencing approach to explore the genomes of the five Fusarium Head Blight causing species in Southern Brazil, namely F. graminearum, F. meridionale, F.cortaderiae, F. austroamericanum and F. asiaticum.. The Rothamsted team are currently focussing on investigating and defining the core and variable parts of the pan genome of F. graminearum. The Rothamsted team also assembled and annotated the F. meridionale genome and has given this data to the University team for further analysis.
Collaborator Contribution The University team had collected field isolates during 2009 - 2012 and provided the 24 Fusarium isolates covering the five required species. The University team had also characterised the disease causing ability of each isolate on the floral spikes of various Brazilian wheat genotypes. Currently, the University team is exploring the genomes of the various F. meridionale,
Impact The two main output delivered so far has been (1) the biological characterisation of the 24 / 5 species collection for disease causing ability on both Brazilian and non-Brazilian wheat genotypes and (2) the 24 newly assembled and annotated genomes covering the five most important FHB causing species in Southern Brazil.
Start Year 2014
 
Description PHI-Canto_curation of fungicide literature_Nichola Hawkins_NIAB 
Organisation National Institute Of Agricultural Botany
Country United Kingdom 
Sector Private 
PI Contribution The PHI-Canto community curation tool for literature curation for multiple pathogenic species. Detailed literature searches to identify the in scope publications Development of controlled vocabularies, ontology terms and evidence codes to curate the literature on fungicide targets and resistance mechanisms
Collaborator Contribution In depth knowledge of the relevant fungicide resistance literature for key species , especially for those infecting wheat and barley. In depth knowledge of anew system to identify orthologous function mutations in protein structure across different species even when the amino acid sequence changes are not identical .
Impact This joint project has been a success and a joint manuscript is in preparation
Start Year 2022
 
Description PHI-base collaboration with PomBase (PHI-CANTO and PHI-PO) 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Since Sept 2017 the PomBase team at the University of Cambridge and the PHI-base team at Rothamsted Research have held weekly meetings ( by Skype) as well as occasional face-to-face meetings to develop an new multi-species author curation tool called PHI-Canto as well as a new pathogen - host interaction ontology called PHI-PO and a new .pathogen host disease ontology called PHI-DO. The Rothamsted team have provided the biological, wet biology experimental and literature knowledge into this collaboration
Collaborator Contribution The PomBase team had already developed a highly successful single organism author curation tool called Canto. The PomBase team also bring a wealth of ontology development expertise into this collaborative project.
Impact Two joint posters will be given at the International Ontology Development conference to be held in Cambridge UK in April 2019. The presenting authors will be Dr Alayne Cuzick and Dr Val Wood.
Start Year 2017
 
Description Providing pathogen-host interaction data to UniprotKB 
Organisation ExPASy, Swiss Institute of Bioinformatics (SIB)
Country Switzerland 
Sector Academic/University 
PI Contribution Providing pathogen-host interaction data to UniprotKB
Collaborator Contribution Providing pathogen-host interaction data to UniprotKB
Impact Providing pathogen-host interaction data to UniprotKB
Start Year 2019
 
Title KnetMiner 4.0 
Description KnetMiner 4.0 was released with a new feature to save gene networks to KnetSpace 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact User experience is more awarding now. Increased number of users. Increased number of citations. 
URL https://knetminer.com/news/knetminer-4-release.html
 
Title KnetMiner 5.6 
Description Next iteration of KnetMiner with new and enhanced functionality 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Wheat researchers are now able to search and visualise genomics and literature data more easily. 
URL https://github.com/Rothamsted/knetminer/releases
 
Title KnetMiner 5.7 
Description KnetMiner 5.7 includes minor enhancements and bug fixes. We have made small changes to the user search limits incentivizing users to register for a free account by offering higher gene list limits to registered users. Additionally, the API and backend have seen upgrades for better performance and usability. 
Type Of Technology Webtool/Application 
Year Produced 2024 
Open Source License? Yes  
Impact KnetMiner 5.7 signifies our last major release at Rothamsted Research. In November 2023, we have established the Knetminer spin-out company to build the next generation of the KnetMiner SaaS platform with a team of experienced designers, developers, data scientists and bioinformaticians. While we remain committed to open source, to safeguard our IP and our expanding customer base, 5.7 is our final completely open-source version. 
URL https://knetminer.com/knetminer-5-7_newspost/
 
Company Name Knetminer Limited 
Description KnetMiner develops software that helps scientists search and analyze large volumes of biological literature and data to better understand complex traits and diseases. 
Year Established 2023 
Impact The company has been established to leverage new types of funding and revenue to sustain the product development and maintenance of KnetMiner in the long term.
Website https://knetminer.com/
 
Description ? FuturumCareers.com: Saving plants from disease, for 11-19 year-olds (July 2019) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact An article was written for a new publication produced by the London based company called FuturumCareers that is specifically focussing on increasing the number of school pupils in the age group 11-19 to become interested in a STEAM subject and to go on and select an appropriate University Course. The PHI-base team at Rothamsted in collaboration with the ENSEMBL team at EMBL-EBI in Cambridge developed both a comprehensive magazine style article and a quiz to enhance the understanding of pathogen - plant host interactions and the increased use of big data to explore plant diseases.
Year(s) Of Engagement Activity 2019
URL https://futurumcareers.com/saving-plants-from-disease
 
Description ? Press release 'hypervirulence genes' in London Metro (Nov 2019) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact As a result of the initial pres release done by Rothamsted Research, this 2nd press release appeared in the London Metro Magazine which is distributed and read throughout the London Underground.

This 2nd press release also highlighted the increasing number of PHI-base data entries that have recorded an increase in virulence (hypervirulence) phenotype as a result of a deliberate single gene change, typically a gene deletion or point mutation in pathogenic species that infect either plants, animal or human hosts. Collectively, these results highlight that the pathogenic process is controlled by an increasing diversity of negative feedback loops that can be altered through mutation and result in viable new strains that can cause increased disease symptoms and / or in host pathogen burden. In this article the effects on human pathogens were particularly highlighted.
Year(s) Of Engagement Activity 2019
URL https://www.metro.news/superbug-is-one-mutation-away-from-killing-us-all/1812832/
 
Description Blog post on FAIR Knowledge Graphs 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A blog article by Marco Brandizi describing the power of standardised and FAIR knowledge graphs under the hood of KnetMiner.
Year(s) Of Engagement Activity 2020
URL https://knetminer.com/cases/the-power-of-standardised-and-fair-knowledge-graphs.html
 
Description Fusarium secretome 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact This was an invited seminar and discussion period with the company Corteva. The scientific audience was based at several of their sites in the USA who were primarily focussed on maize and soybean improvement and the threat of Fusarium diseases and mycotoxin contamination. The seminar was entitled 'The secretome of Fusarium: The known and unknown'. The audience included bioinformaticians, plant biotechologists, molecular plant pathologists and plant breeders.
Year(s) Of Engagement Activity 2021
 
Description Government Open Access article (Jan 2020), page 370 entitled ' Fighting infectious diseases: Protecting the global wheat crop with big data analysis and knowledge networks' 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact This article was written to increase awareness of the importance of crop plant health amongst UK and European politicians, leaders of industry and leaders of third sector organisations. The article had a focus on the effects of disease causing pathogens on the No 1 arable crop in Europe namely wheat and how experimentation is increasing becoming predictive through using big data sets and network analyses. The two BBSRC funded resources highlighted in this online and hard copy article are the Pathogen-Host Interactions database (PHI-base) and the knowledge graphical visualisation tool Knetminer.
Year(s) Of Engagement Activity 2020
URL https://edition.pagesuite-professional.co.uk/html5/reader/production/default.aspx?pubname=&edid=e7e6...
 
Description Invited guest Lecture - University of Bath - Healthy humans, healthy animals, healthy crops, healthy food and healthy natural ecosystems. What's the problem ? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Invited lecture focussing on the importance of different beneficial and detrimental fungi on society and the global economy. Examples were given for the major crop species in particular fungi that cause diseases on wheat and maize crops, how fungicide resistance emerges in plant, human and farmed animals and how fungi can be used to produce a wide range of high value metabolites and proteins. The PHI-base database was also explained as away to keep track of the genes required by fungi to cause disease. This special lecture concluded with a lively debate with the students many of which were international MSc students on how to improve plant healthy and make crop yield more resilient. We also discussed career pathways in science and the differences and similarities between doing research in a company, for a charity or at an academic institution.
Year(s) Of Engagement Activity 2020
 
Description Invited oral presentation given at Plant Health at the Age of Metagenomics, Scientific Colloquium -Paris France September 2019 by Kim Hammond-Kosack 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited Oral Presentation entitled 'Using a FAIR database and bioinformatics analyses to improve plant, human, animal and ecosystem health' at Plant Health at the Age of Metagenomics, given at a Scientific Colloquium organised by the European and Mediterranean Plant Protection Organization and the Euphresco network for phytosanitary research coordination and funding UNESCO, Paris, France, 26th September 2019.

This scientific colloquium focussed on ways to improve phytosanitation methods and the detection of emerging plant pathogen threats across the 31 different country borders throughout Europe

Link to the video-recordings - http://bit.ly/350Hm1P
Link to the recorded presentation - https://zenodo.org/record/3471776#.XZY8z0Yzbcs
Year(s) Of Engagement Activity 2019
URL https://zenodo.org/record/3471776#.XZY8z0Yzbcs
 
Description Invited oral presentation given at the 6th Plant Genomics and Gene Editing Conference, (May 2019, Rotterdam, The Netherlands) by Kim Hammond-Kosack 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This invited oral presentation and discussion was given to increase awareness of the PHI-base resource to the European Plant Health community that are focussed on delivering improvements to either the arable or horticultural sectors.
Year(s) Of Engagement Activity 2019
 
Description Invited speaker at IB2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited speaker at the Integrative Bioinformatics Conference in Poland.
Year(s) Of Engagement Activity 2023
URL https://ib2023.port.org.pl/
 
Description Invited talk on the PHI-base database given at the European Fusarium workshop, in Rome, Italy, February 2020 by Martin Urban 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The invite talk was entitled 'The Pathogen-Host Interactions database (PHI-base): a phenotype database for pathogens, hosts and their interactions to enhance global food security and human health'. This talk was given to raise awareness in the International Fusarium community of the very large number of database entries that have arisen from gene function studies in a wide range of Fusarium species infecting various cereal and non- cereal plant species. The talk also highlighted that the data in the PHI-base resource is also available and is searchable from within the ENSEMBL fungal genomes resource hosted by the EMBL-EBI, Cambridge as well as the FungiDB resource.
Year(s) Of Engagement Activity 2020
 
Description M Urban conference poster 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster 'Harnessing community expertise to determine anti-infective target sites' presented at Resistance '19, Rothamsted UK. 16-18th Sept 2019
Year(s) Of Engagement Activity 2019
 
Description New Scientist Live Excel London - Oct 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact The Fusarium team in collaboration with the PHI-base, Knetminer, EMBL-EBI, the BSPP and the Wellcome Connecting Science teams devised and presented a large display at New Scientist Live (ExCel London, October 2022), providing posters, hands-on interactive activities, and career advice on the topics of disease and mycotoxin control, DNA extraction, genomics, biocuration, and network analysis. On display table No1 were wheat plants infected with Fusarium head blight, infected and non-infected grains, the chemical structure of the DON mycotoxin as well as wheat plants infected with the Take-all fungus, petri dishes with fungal cultures and a binocular microscope to aid detailed viewing of infected plant material and /or the fungus. On display table 6 was a newly devised interactive game that allowed the visitor to learn about current disease control strategies and future NextGen options based around genomics, functional genomics and/or effector biology for ten globally important arable crop, horticultural and animal husbandry disease problems. FHB disease of wheat was one of the disease problems that could be selected. In total, 21,500 visitors attended the event (one day for school-age children and two days for the general public), and ~1,000 visitors explored our display, with ~20% staying for 1-2 hours. We had a diverse team consisting of post-docs, PhD students and undergraduates across six nationalities. Include in the display team for all 3 days from this project were Martin Urban and Kim Hammond-Kosack.
Year(s) Of Engagement Activity 2022
URL https://live.newscientist.com/
 
Description New ways to combat infectious diseases, article for Laboratory News 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This article was written to increase awareness of the importance of crop plant health, human health and ecosystem health amongst UK, European and International Researchers actively engaged in other research disciplines The article has a focus on the effects of disease causing pathogens (fungi, protists and bacteria) on different host species and how experimentation is increasing becoming predictive through using big data sets and network analyses. The two BBSRC funded resources highlighted in this online and hard copy article are the Pathogen-Host Interactions database (PHI-base) and the knowledge graphical visualisation tool Knetminer.
Year(s) Of Engagement Activity 2020
URL https://www.labnews.co.uk/
 
Description Poster presentation given on PHI-base at the 15th European Fungal Genetics conference (Feb 2020, Rome, Italy) given by Martin Urban 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A poster entitled ' PHI-base, a multi-species phenotype database for pathogens, hosts and their interactions to enhance global food security and human health' was given at this international conference to raise awareness of this resource and to ' sign up ' authors to test the newly developed PHI-Canto author curation tool.
Year(s) Of Engagement Activity 2020
URL https://www.ecfg15.org/
 
Description Press Release on PHI-base Nucleic Acids Research publication Jan 2025 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Rothamsted Press release for the published article
176. Martin Urban, Alayne Cuzick, James Seager, Nagashree Nonavinakere, Jashobanta Sahu, Pallavi Sahu, Vijay Laksmi Iyer, Lokanath Khamari, Manuel Carbajo Martinez and Kim E. Hammond-Kosack (2025) PHI-base - The multi-species Pathogen-Host Interaction Database in 2025. Nucleic Acids Research (database issue) 53, D826-D838. Available from: https://doi.org/10.1093/nar/gkae1084.
Year(s) Of Engagement Activity 2025
URL https://www.rothamsted.ac.uk/news/pathogen-host-database-refocuses-genes
 
Description Press release - KnetMiner re-purposed for COVID-19 research 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Taking time off from their own research, the Rothamsted team repurposed a tool they had originally developed to help crop scientists, to provide medical researchers with quick and intuitive access to all documented linkages between genes, medicines, and the virus.
Year(s) Of Engagement Activity 2020
URL https://www.rothamsted.ac.uk/news/rothamsted-answers-white-house-call-coronavirus-data-help
 
Description Press release on PHI-base update article Urban et al. Nucleic Acids Research article (Nov 2019) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This press release highlighted the increasing number of PHI-base entries that have recorded an increase in virulence (hypervirulence) phenotype as a result of a deliberate single gene change, typically a gene deletion or point mutation in pathogenic species that infect either plants, animal or human hosts. Collectively, these results highlight that the pathogenic process is controlled by an increasing diversity of negative feedback loops that can be altered through mutation and result in viable new strains that can cause increased disease symptoms and / or in host pathogen burden.
Year(s) Of Engagement Activity 2019
URL https://www.rothamsted.ac.uk/news/gene-data-suggests-superbug-threat-underestimated
 
Description UK Civil Service Fast-stream programme _Nov 2023_Rothamsted 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact 30 min talk + 10 min Q&A on global food security and the biotic threats encounter both historic and recent. Then focussed on the current biotic threats to the UK wheat crop and the types of research approaches ongoing in DFW and DSW to help to provide alternative pest and pathogen control strategies and novel targets for intervention. Also discussed various UK policies which are in place to effectively minimise the biotic threat risk, whilst pointing out that air borne threats pose are the most challenging for policy makers.
Year(s) Of Engagement Activity 2023