A FAIR community resource for pathogens, hosts and their interactions to enhance global food security and human health

Lead Research Organisation: Rothamsted Research
Department Name: Protecting Crops and the Environment

Abstract

Infectious microbes continue to impose major costs on the UK farming and food industry and increasingly threaten global food security, commercial and ornamental tree health and ecosystem resilience. Similarly, due to the rise in resistance to antimicrobial compounds and increased globalisation of trade and travel, infectious microbes impose ever greater costs on public and private UK medical and veterinary providers and threaten human and animal health and wellbeing across the lifecourse. There is a substantial and diverse UK and international bioscience research community whose needs are addressed by this resource. As the biosciences become an increasingly data-intensive discipline and mega-scale data analyses become the new norm, building and maintaining community resources that ensure the Findability, Accessibility, Interoperability, and Reusability of data (i.e. are FAIR) will benefit many different bioscience disciplines.

In recent years, new possibilities for the study (and ultimately control) of pathogens have opened up through the application of high-throughput technologies for determining the molecular nature of life. These include genome sequencing - which reveals the genetic code that determines inherited properties of cells - and extends to monitoring the varied cellular contents at different stages of life and disease. This FAIR community resource is designed to capture broad molecular information from pathogenic organisms, and combine it with descriptive information about the process of infection, including more specific molecular information, e.g. about the pathogen and host proteins that interact during infection, the phenotype of the interaction outcome and flag up which pathogen proteins are already targeted by anti-infective chemicals. The new knowledge on pathogen genomes, patterns of gene expression and potential interacting partner is housed using the Ensembl platform. Ensembl contains a comprehensive suite of software for the management and display of genome-scale data. The new phenotypic knowledge on experimentally verified genes required for the disease-causing abilities of each pathogenic species will increasingly be curated by members of the scientific community into the Pathogen Host Interactions (PHI-base) database using a newly developed tool called PHI-Canto. A new curation focus will increase the details recorded about (i) the molecular interactions between the repertoires of small effector proteins produced by pathogens and their initial targets within each host species, and (ii) the pathogen targets for anti-infective chemistries. To support the ongoing curation efforts, new generic PHIPO ontologies (controlled definitions) will be developed to accurately describe the depth and breadth of pathogen-host interactions.

By further developing the interfaces (within and between) Ensembl genomes, PHI-base and other key e-sciences data/ information providers this will support the joint querying and visualisation of genomic and phenotypic data. We will also deploy new and existing tools (graphical and non-graphical) to improve inter-species comparative analysis and the integration of different large data types to speed up analyses and make new discoveries on the evolutionary origin of genes, mutations important in the process of infection and genes/ gene networks conferring host resistance, pathogen virulence or resistance to anti-infective chemicals.
We will continue to engage with the large and active UK research community in the biosciences to identify their current needs and emerging requirements through University/Institute visits, and will conduct training activities to demonstrate the potential use of the resource. We will engage with academic and industry based scientists in other countries by attending and presenting this FAIR community resource and its uses at international conferences and workshops.

Technical Summary

PHI-base is the phenotype data source provider. We will continue to curate the literature for ~200 pathogenic species and include emerging problematic species. New advanced curation will include (a) first host plant targets of pathogen effectors, (b) anti-infective targets and variant sequences causing chemical insensitivity, (c) ~8 specific genome landscape features. We will further develop the multi-species PHI-Canto tool to enable rapid, accurate and comprehensive publication based author curation. PHI-base data is to be made available in emerging data exchange formats (eg phenopackets) to increase interoperability and use. The new PHIPO ontologies to underpin this curation will be built using protégé and adhering to strict ontology development principles outlined by the obo-foundry.

The PHI-phenotype information will be mapped onto microbial genes in Ensembl Genomes; an established platform combining a relational database back-end for persistent, non-redundant storage of data with web-based tools, programmatic interfaces (including RESTful APIs) and the ability to export and upload (local or remote) annotation files in standard file formats (e.g. BAM, CRAM, VCF). Genomes are overlaid with variation/ transcriptome data along with whole genome alignments and pan species comparative relationships; allowing extrapolation of functional annotation, eg from well understood pathogens to under-studied, under-funded pathogens.

To provide a bigger context, we will functionally advance the Knetminer open-source software to integrate the PHI-data and ontologies with biological pathway (BioCyc) and protein-protein interaction data (BioGrid, IntAct) from eight model organisms to elucidate the cascading processes triggered by pathogen effectors and their first targets in the host. This will allow multi-species, cross-kingdom network visualisation and analysis. We will create biannual releases of the integrated knowledge base in FAIR compliant RDF and Neo4j graph formats.

Planned Impact

This FAIR community resource is aligned with the BBSRC fundamental and strategic research priorities to achieve sustainable global food security, and improve human and animal health and wellbeing across the life course.
This resource is of immediate benefit to all researchers in the medical, crop plant, animal and model organism biosciences working on diseases caused by fungi, protists and bacteria, and will remove bottlenecks to new discoveries caused by data sets being unavailable, non-integrated and/or incompatible for simple queries/complex analyses. Priority infectious microbes have previously been selected and included according to UK industrial and academic researcher interests. This project will provide standardised annotation, more powerful comparative analyses, and greater data access through interactive interfaces and new tools.
The interpretation of genome-scale molecular biology and phenotyping data is a key component in the development of novel strategies for sustainable disease control in humans, cropped plant, farmed animals and has considerable academic, economic, social and ecological value. Specifically, this FAIR resource will organise genome sequence, genetic variation and phenotypic data and make it widely accessible through a new set of interfaces and new tools to permit genome-wide enquiries, linked to literature-curated pathogenic phenotypes associated with gene mutations.
The driving rationale for the project, as well as its greatest potential for societal impact, is in two targeted sectors. Firstly, sustainably increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health). This FAIR resource and the associated new tools will provide access to existing and new knowledge for numerous phytopathogenic species. The second targeted sector is human health and medical interventions to ensure healthy ageing throughout the life course. Understanding pathogen gene function, host targets and downstream biological functions will aid novel drug discoveries, track clinical efficacy and help diagnostic companies follow emerging problematic pathogenic microbes.
The main route to achieving impact will be through raising (academic and commercial) user awareness and use of the resource. Potential beneficiaries include AgCompanies developing pesticides or attempting to breed new varieties of pathogen-resistant plants and pharmaceutical companies developing new healthcare products to stop/ minimise infectious microbes in the general human populations and within hospitals. More generally, farmers and the wider global population will benefit from improved strategies for disease control, although they are not expected to be among the direct users of the database. The PIs at each organisation will engage with society, the media and policy makers to make the case for the importance of research into crop plant and medically important pathogens in the context of rising global concern about food and energy security, human health, farmed animal health, ecosystem resilience and of the potential benefits of genomics in addressing these concerns.
The five project objectives have been chosen in the light of the above observations. Collectively, the objective is to put the increasing quantities of data being generated back in the hands of researchers in as useful a form as possible, and to allow them to see the full spectrum of experimental results - from the study of an individual mutant phenotype to information about gene expression or its variance in a population - in an integrated fashion.

Publications

10 25 50
 
Description Overview

This narrative gives the progress on developing a community resource for pathogens, hosts and their interactions. This project brings together five groups with demonstrable expertise in the areas of capturing, integrating and interrogating useful data from literature; data that would otherwise remain dispersed. Highlighted are advances in our data capture tools and ontologies, and continuous updates to open-access data available to the research community. The researchers involved are based at Rothamsted, with the ENSEMBL team at EMBL-EBI Cambridge the University of Cambridge and the company Molecular Connections located in Bangalore, India. The outputs from the Rothamsted team are highlighted below.

Releases of data to the community

PHI-base makes a data release twice a year. The current release version 4.8 was made on 16th September 2019 and contains 6,780 genes with 13,801 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi: 10.1093/nar/gku1165). This data has come from 268 pathogen species and 210 host species and has been manually biocurated from 3,454 peer-reviewed publications. Typically we curate 400-440 publications each year. The PHI-base resource has been cited 59 times in peer reviewed articles in 2019. In total 98% of the gene to phenotype entries available in PHI-base are also available from the genome browsers available in ENSEMBL fungi, protists or bacteria as well as FungiDB. In 2019, PHI-base had users in 130 countries, this included 10,960 users worldwide, of which 4639 were based in Europe and 3,183 were based in the UK.

New PHI-base gene-centric display

Molecular Connections created the current PHI-base user interface (version 4) in 2015. A new user interface is required to display future data curated by PHI-Canto, which will be far richer in content. We have decided on a two-step process to update the user interface. Firstly, to display the current 15 years' worth of PHI-base version 4 data within the new gene-centric pages; and secondly, to display the new PHI-Canto data in the same format.
A draft version of the first step is available at http://poc.molecularconnections.com/Phibase/#/home
Future public releases, from May 2020 onwards, will involve running the existing PHI-base user interface and the new gene-centric version in parallel. The gene-centric display is required to logically display phenotypes reported in multiple publications for one gene, as well as all gene synonyms and identifiers. The UniProt ID for the gene is provided on each gene page. A similar gene-centric approach is used by Pombase (pombase.org). In addition, the gene-centric display allows us to provide multiple pathogen-host interaction phenotypes reported for different tissues or hosts on a single page, together with all the assigned references.

Curation tool and ontology development

Robust curation systems are critical for interpreting, recording, sharing and analysing phenotypic observations. Our overall aim is to develop an open, generic, extensible infrastructure for the curation of pathogen-host interactions, which can be applied to any pathogen or host. We will use this infrastructure to create high-quality shareable annotations in the pathogen-host interaction curation space. Moreover, we will also enable the community to participate in the curation of their own publications, via a proven community curation system.

The PHI-Canto community curation tool - done in collaboration with the University Of Cambridge

We have developed PHI-Canto, a multi-species community curation tool for the PHI-base database (canto.phi-base.org) PHI-Canto is a web-based tool that enables professional curators and publication authors to curate peer-reviewed papers using terms from biological ontologies. PHI-Canto is an extension of the Canto (curation.pombase.org) software developed by PomBase (pombase.org), the model organism database for fission yeast (Rutherford et al., (2014) Bioinformatics 30: 1791-1792).
Canto required major development to support curation for PHI-base. Firstly, Canto was originally defined for single-species curation, and could therefore only be configured with a set of single-species identifiers. PHI-Canto adds support for UniProt identifiers, which cover gene products from many species. Using UniProt allows PHI-Canto to automatically retrieve information about the gene product required by PHI-base, including its description, sequence, cross-references to other databases, taxonomic lineage, etc. The curator is no longer required to enter any of this information manually, which reduces curation overhead and room for error.

Canto supported phenotype annotation, but only for a single species phenotype-genotype combination. PHI-Canto extended this to allow annotations on a pathogen-host interaction genotype, which is modelled as a composition of a pathogen genotype and a host genotype (referred to as a metagenotype). Curators can curate allele variants in either the host or the pathogen, or both, to permit maximum flexibility for each interaction.

PHI-Canto phenotype annotation also allows the curation of one or more strains for each species in a curation session. Note that the term 'strain' in PHI-Canto refers to any variation below the species level (including subspecies, pathovars, etc.); this matches the historical use of 'strain' in PHI-base. Changes introduced into strains (laboratory strain) are described in the genotype, as well as any background mutations, for example Ku-70.

Canto contains built-in help pages to assist users in the curation process. Much of this documentation was only applicable to fission yeast curation; we are in the process of adapting and extending the help text to cover pathogen-host curation.

The Pathogen-Host Interaction Phenotype Ontology

We are developing the Pathogen-Host Interactions Phenotype Ontology (PHIPO): a logically defined, pre-composed phenotype ontology comprising of two branches that describes either the single species phenotypes of pathogens or hosts singly, or the phenotypes of multi-species pathogen and host interactions. PHIPO has been accepted into the OBO Foundry to fill this domain space. As of February 2020, PHIPO contains 1068 terms, of which 993 have text definitions, and 447 have logical definitions.
PHIPO is built using the Ontology Development Kit, which aims to standardise ontology development according to the OBO Library principles. International efforts to improve interoperability between existing phenotype ontologies - by standardising their underlying formal design patterns (logical definitions) - are ongoing; PHIPO is part of this effort, and is in the process of being integrated with the Unified Phenotype Ontology (uPheno), which aims to establish the logical correspondence between PHIPO and phenotypes in other ontologies. PHIPO is freely available under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0), and is hosted both on GitHub and the OBO Foundry.

Relating PHI-base phenotypes to PHIPO phenotypes

PHI-base contains thousands of records annotated with the set of nine high-level phenotype terms (Urban et al., (2015) Frontiers in Plant Sciences 6, 605. doi: 10.3389/fpls.2015.00605). To enable us to continue to summarise the data using these classifiers, and to map legacy data to the new system, it was necessary to establish a correspondence between the legacy high-level terms and the new terms contained in PHIPO. Three methods were used to achieve this.

Terms describing chemical resistance and sensitivity have been directly mapped to new terms in the single species branch of PHIPO, specifically 'increased resistance to chemical' (PHIPO:0000022) and 'increased sensitivity to chemical' (PHIPO:0000021). These phenotypes contain subclasses describing resistance or sensitivity to a particular chemical entity (logically defined using the ChEBI ontology), for example 'resistance to ampicillin'.

Terms describing changes in pathogenicity, virulence, and mutualism are now captured using an annotation extension in PHI-Canto, that applies to all terms in the pathogen-host interaction phenotype branch. This allows an interaction to be annotated with a phenotypic outcome (called 'infective ability' in PHI-Canto), in addition to an observed phenotype. For example, an observed phenotype 'decreased extent of pathogen-associated host lesions' can be annotated with the extension 'reduced virulence'.

Terms describing effector phenotypes are now annotated using GO terms: first, the relevant pathogen gene is annotated with its molecular function, then the molecular function is related to the biological process 'effector-mediated modulation of host process' (GO:0140418). In cases where the molecular function is unknown, the pathogen gene is annotated with the biological process directly.

Following this new process, it is possible to capture information on the phenotypes observed when an effector is involved in a host interaction; this was not possible with the curation method previously used for PHI-base.

Publications selected for curation to improve PHIPO

A significant curation effort started in autumn 2019 and is still being progressed by Rothamsted and the University of Cambridge. We have partially or fully curated 31 publications, generating 678 annotations (as of February 2020). The papers selected for this effort cover a range of biological areas of interest to pathogen-host interactions: early-acting pathogen virulence proteins, receptor decoys, R-Avr interactions, secondary metabolite clusters required for pathogen virulence, first host targets of pathogen effectors, fungal toxins, bacteria-human and fungal-human interactions, and antifungal targets.

These papers were used to seed the ontology with a basic range of terms required for curating pathogen-host phenotypes. Curating the papers allowed us to check the applicability of these ontology terms to real publications, and also allowed us to assess the suitability of PHI-Canto itself. Specifically, the initial curations have already revealed a need to extend Canto with new annotation types: for example, interspecies complementation, and protein-protein interactions between pathogen and host.

Validating strain and disease names

Due to a former lack of use of controlled vocabularies, PHI-base has numerous data quality issues, particularly with regards to strain names and disease names. Work is ongoing to clean and validate the list of strains in PHI-base, cross-referencing against external authorities where possible (including the ATCC, and model organism databases such as MGI and FlyBase). The disease name list in PHI-base has also undergone cleaning and cross-referencing against the Mondo disease ontology, with the revised list used to create a supplementary ontology called PHIDO (the Pathogen-Host Interactions Disease Ontology). These changes are planned to be applied to PHI-base in a future release.

AI and curation of the anti-infective literature

Text mining is increasingly used to extract important information from research articles. A major challenge here is to handle the heterogeneity, varied quality, and diverse identifiers of the data. Studies by Pletscher-Frankild et al (Methods: 74, 83-89, 2015) suggests that automated data extraction of disease-gene associations from biomedical abstracts can assist and shorten the work of biocurators. For PHI-base, a pilot study has been done by Molecular Connections focusing on identifying all the publications over the past 10 years relevant to fungicides, and their targets in pathogens of plant and humans. To seed the first AI study, Rothamsted created a 'gold' set corpus consisting of both relevant and contextually-irrelevant documents. The positive corpus consists of 45 research articles from the MARDy database (downloaded from mardy.net on 4 October 2019) and 14 articles from PHI-base version 4.8. The negative corpus consists of 3439 curated articles from PHI-base where no fungicide chemistry is reported. The PubMed ID, author, chemistry, organism, and gene details will be programmatically extracted from the identified articles and used to prefill a PHI-Canto curation session for the benefit of future curators.

KnetMiner updates

KnetMiner (www.knetminer.org) is a digital research assistant that has a Google-like search interface, and makes use of predictive graph algorithms and interactive features to help scientists tell the stories of complex traits and diseases in any species. KnetMiner knowledge graphs follow FAIR principles, by modelling the data using standardised ontologies (where possible), and making the graph database accessible through standardised Cypher and SPARQL endpoints.
The first aim of the KnetMiner work package (WP) is to develop an integrated pathogen-host knowledge graph for major crop, pathogen and model organism genomes. Enriched with manually curated data from PHI-base, ChEBI and speculative information extracted from the scientific literature corpus, or through other means. The second aim of the WP is to customise KnetMiner to the needs of the pathogen-host scientists, improve its FAIRness, and tightly integrate KnetMiner and PHI-base to enable a more advanced user experience.

To achieve these two aims, we have started with improvements to the KnetMiner API endpoints and wrapper scripts (https://github.com/josephhearnshaw/genelist-api), to demonstrate a use case for programmatic access to analyse a large gene dataset, e.g. differentially expressed genes from an RNA-Seq experiment. In December 2019, we started the development of a major new platform (KnetSpace), which is tightly coupled with KnetMiner. It will allow scientists to store and collaborate on the curation of knowledge networks. Users will be able to share their knowledge graph data with other research groups, encouraging collaboration, and thus increasing the potential outreach of PHI-base and PHI-Canto data.

PHI-base links:
1. PHI-base version 4: www.phi-base.org
2. PHI-base GitHub page: http://github.com/PHI-base
3. PHIB-BLAST: http://phi-blast.phi-base.org
4. Gene-centric PHI-base 5 (beta version): http://poc.molecularconnections.com/Phibase/#/home
5. PHI-base Wikipedia page: http://en.wikipedia.org/wiki/PHI-base
6. PHI-Canto (multi-species community annotation tool): http://canto.phi-base.org/

2021 report
Over the past twenty years, techniques around sequencing have improved dramatically, enabling us to interpret the genomic makeups of any species with increasing accuracy and speed. This has paved the way for deeper, biological explorations; for instance, how exactly are these species interacting with each other on a molecular level, how do these interactions influence the outcome and what factors can change them. This grant has allowed multiple groups specialising in standardised information capture and the representation of genomic data to work together to focus on these interactions between pathogens and their hosts (animals, plants, insects and humans). This work enables the development of ontologies (precise definitions) to describe these interactions, intuitive interfaces to allow scientists to curate their experimental findings and software and databases to represent these in a way that they can be queried by anyone, freely, around the world to make predictions and develop testable hypotheses for the laboratory. The results of this grant can (and will) extend well beyond pathogens and hosts, into species in a variety of ecosystems (human gut, soil, water) and the applications of these efforts range from the discovery of new therapeutics, agriculture and understanding the impact of climatic fluctuations.

Data Releases and improving interoperability
PHI-base makes a data release twice a year. The current release, version 4.10, was made on 2 November 2020 and contains 7,681 genes with 15,928 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi:10.1093/nar/gku1165). This data has come from 274 pathogen species and 216 host species and have been manually biocurated from 3,914 peer-reviewed publications. Molecular Connections curated 457 publications in 2020. Of these 253 publications came from single gene studies, 110 publications from two-gene studies and 94 publications from complicated studies (three or more genes involved in study). The PHI-base resource has been cited 104 times in peer reviewed articles from 2019 to 5 March 2021.

These relevant genomes in Ensembl have been annotated with interaction data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity. More than 95% of the entries in PHI-base now map directly to the corresponding protein in Ensembl, ensuring a continuously high level of interoperability between these two data resources.

The microbial team within Ensembl are developing a data schema to model a test data set of microbial genes linked to first host target genes provided by Rothamsted Research. This will be accompanied by updates to their visualisation and query strategies. Initially these 1st host target entries will appear in Ensembl Plants.

New PHI-base gene-centric display
Molecular Connections created the current PHI-base user interface (version 4) in 2015. A new user interface is required to display the richer future PHI-Canto curation. We are implementing a two-stage process to update the user interface. Firstly, to display the existing knowledge corpus (curated over the part 15 years) within the new (version 5) gene-centric pages; and secondly, to display the new richer PHI-Canto data in the same display. The first version of the PHI-base user interface (version 5) has been developed in 2020 by Molecular Connections under guidance from Rothamsted Research, and is currently under further revision and testing. We expect the new interface to be completed by Nov 2021. From then onwards, future public releases will involve running the existing PHI-base 4 user interface and the new gene-centric PHI-base 5 version in parallel, until the user community becomes familiar with the new interface.

The PHI-Canto community annotation tool
We have further developed PHI-Canto, a multi-species community curation tool for the PHI-base database (canto.phi-base.org).
In the past year we have improved on and tested the following:
? Added support for strain synonyms in PHI-Canto: each primary name of a strain is now linked to its synonymous names, and curators can search for strains using these synonyms. This process required manual review of all unique strain names in PHI-base, plus revisions to reduce data redundancy and inaccuracy.
? Added support for curating experimental controls, through use of an extension on a 'pathogen-host interaction phenotype' annotation that relates it to a control interaction (usually wild type). The control interaction can also be annotated with phenotypes describing the 'normal' or wild type interaction outcome.
? Added a new 'gene-for-gene phenotype' annotation type to simplify curation of gene-for-gene experiments. The annotation type also links to an ontology of predefined interaction outcomes (PHIPO_EXT) for ease of use by the curator, which also supports inverse gene-for-gene interactions.
? Added a new disease annotation type, which replaces the previous process of curating the disease as an extension of every phenotype annotation. This reduces manual work for the curator since the disease only needs to be curated once for each interaction.
? Added a dedicated field for curating the Figure in the publication that relates to a particular annotation. Previously, this was captured in the annotation comments.
? Enabled the curation of wild type RNA and protein expression levels.
? Added a shortcut to quickly curate wild type alleles for any gene in the curation session.
? Formalised the curation of the delivery mechanism in an experiment as an experimental condition (as opposed to an independent data type).
? At least 81 issues closed on the PHI-Canto tracker - covering feature requests, bug-fixes, and discussion.
The original single species curation tool Canto contains built-in help pages to assist users in the curation process. Much of this documentation was only applicable to fission yeast curation; over the past year we have been adapting and extending the help text to cover pathogen-host curation in PHI-Canto.

Disease name curation
Initially, information pertaining to the disease caused by a pathogen-host interaction was captured by an annotation extension. The disadvantage of this approach was that disease information had to be related to every annotation on every metagenotype, despite the relevant information (host species, pathogen species, and infected host tissue) being unique to the metagenotype, and not its annotations. This resulted in excessive manual curation of redundant information. In response to these problems, we decided to add a dedicated annotation type for disease information. The annotation captures the host and pathogen species involved (as a metagenotype) and the infected host tissue (as an 'infected tissue' annotation extension). We expect this will reduce the amount of redundant information captured. During 2020, we flagged up all disease synonyms and aligned these to the predominantly used current disease name.

The Pathogen-Host Interaction Phenotype Ontology
We have continued to develop the Pathogen-Host Interactions Phenotype Ontology (PHIPO) which is maintained on the OBO Foundry. PHIPO provides terms with logical definitions that are already composed from other ontology terms (pre-composition). As of February 2021, PHIPO contains 1,109 terms, of which 1,035 have text definitions, and 438 have logical definitions. PHIPO is freely available under the Creative Commons Attribution 3.0 Unported license (CC BY 3.0), and is hosted both on GitHub and the OBO Foundry.

Relating PHI-base phenotypes to PHIPO phenotypes
PHI-base contains thousands of records annotated with the set of nine high-level phenotype terms (Urban et al., (2015) Frontiers in Plant Sciences 6, 605. doi:10.3389/fpls.2015.00605). To enable us to continue to summarise the data using these classifiers, and to map legacy data to the new system, it was necessary to establish a correspondence between the legacy high-level terms and the new terms contained in PHIPO. Three methods were used to achieve this.
Terms describing chemical resistance and sensitivity have been directly mapped to new terms in the single species branch of PHIPO, specifically 'increased resistance to chemical' (PHIPO:0000022) and 'increased sensitivity to chemical' (PHIPO:0000021). These phenotypes contain subclasses describing resistance or sensitivity to a particular chemical entity (logically defined using the ChEBI ontology), for example 'resistance to ampicillin'.
Terms describing changes in pathogenicity, virulence, and mutualism are now captured using an annotation extension in PHI-Canto, that applies to all terms in the 'pathogen-host interaction phenotype' branch. This allows an interaction to be annotated with a phenotypic outcome (called 'infective ability' in PHI-Canto), in addition to an observed phenotype. For example, an observed phenotype 'decreased extent of pathogen-associated host lesions' can be annotated with the extension 'reduced virulence'.
Terms describing effector phenotypes are now annotated using GO terms: first, the relevant pathogen gene is annotated with its molecular function, then the molecular function is related to the biological process 'effector-mediated modulation of host process by symbiont' (GO:0140418). In cases where the molecular function is unknown, the pathogen gene is annotated with the biological process directly.
Following this new process, it is possible to capture information on the phenotypes observed when an effector is involved in a host interaction; this was not possible with the curation method previously used for PHI-base.

Publications selected for curation
A significant curation effort is now in progress by Rothamsted Research and the University of Cambridge, and we have partially or fully curated 34 publications, generating 846 annotations (as of January 2021). The papers selected for this effort cover a range of biological areas of interest to pathogen-host interactions: early-acting pathogen virulence proteins, receptor decoys, R-Avr interactions, secondary metabolite clusters required for pathogen virulence, first host targets of pathogen effectors, fungal toxins, bacteria-human and fungal-human interactions, and antifungal targets.

KnetMiner updates
Meetings have been conducted between the KnetMiner and the PHI-base team to ascertain how to model the PHI-base data and PHIPO ontologies as a RDF and Linked Property Graph. A prototype parser for PHI-Canto data has been developed to model and integrate the data into KnetMiner. We have started the development of Fungal knowledge graphs for key species including Fusarium culmorum, Fusarium graminearum and Zymoseptoria tritici.

We have started with improvements to the KnetMiner API endpoints and wrapper scripts (https://github.com/josephhearnshaw/genelist-api), to demonstrate a use case for programmatic access to analyse a large gene dataset, e.g. differentially expressed genes from an RNA-Seq experiment.
We started the development of a major new platform (KnetSpace), which is tightly coupled with KnetMiner. Knetscape will allow scientists to store and collaborate on the curation of knowledge networks. Users will be able to share their knowledge graph data with other research groups, encouraging collaboration, and thus increasing the potential outreach of PHI-base and PHI-Canto data. KnetSpace follows the FAIR principles, by making user networks easily findable, accessible, and enables networks to be reusable and interoperable in other web applications (such as PHI-base) via APIs. We aim to make KnetSpace beneficial to PHI-base and PHI-Canto data curators by facilitating their curation effort through simple access to a rich set of auto-generated annotations and links to relevant literature.


PHI-base data releases: The latest release of PHI-base, version 4.12, was on 2 September 2021 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.7, released on 27 May 2019 (i.e. pre the start of this funding), the data increased as follows: genes from 6304 to 8411 (25%), interactions from 12,467 to 18,190 (31%), pathogenic species from 266 to 279 (5%), host species from 199 to 228 (13%), diseases from 490 to 533 (8%), and references (publications) from 3216 to 4387 (27%). PHI-base phenotyping data has been freely available at FungiDB since 2019. UniprotKB links to PHI-base records since 2020.
PHI-base gene-centric display: PHI-base, in collaboration with Molecular Connections, developed a new version of the PHI-base website (available at www.phi5.phi-base.org) that aggregates all data in PHI-base by its related gene, i.e. one page per gene per species. Each gene is assigned a stable and unique identifier that is cross-referenced with identifiers from other genetic databases. Currently, the new interface only displays data curated with PHI-Canto; the remaining data in PHI-base will be migrated in Q1 2022. We extended the search interface of the new website to allow querying the new data types added by PHI-Canto curation.
PHI-Canto: PHI-base, in collaboration with PomBase, developed PHI-Canto: a free and open source online biocuration tool that allows the research community to curate and annotate pathogen-host literature with terms from biological ontologies. PHI-Canto is derived from Canto, the community annotation tool developed by PomBase. We extended Canto with the following features: support for any gene with an accession in UniProtKB; the ability to curate and annotate pathogen genotypes, host genotypes, and pathogen-host interactions involving mutant pathogens and mutant or wild-type hosts; support for annotating tissue type, disease caused, relation to wild-type controls, and outcomes of gene-for-gene interactions; and the ability to specify one or more strains for any species, either from a controlled list (which includes strain synonyms) or as free text. We documented these features in an online manual included with the software. So far, 36 publications have been fully or partially curated in PHI-Canto by professional curators.
PHI-base ontology development: We developed the Pathogen-Host Interaction Phenotype Ontology (PHIPO) to enable annotation of the outcomes of pathogen-host interactions across multiple species, and single species phenotypes for pathogens and hosts. We integrated PHIPO with the OBO Foundry and the Unified Phenotype Ontology (uPheno): the latter aligns our ontology semantics with other phenotype ontologies. As of 14 July 2021, PHIPO contains 920 terms, of which 536 (58%) have logical equivalence and interoperability with other ontologies, following a standard format defined by uPheno. We also developed supplementary ontologies to enable annotation of experimental conditions and diseases most relevant to pathogen-host experiments.
Identification of relevant anti-infective literature: Molecular Connections developed a bespoke text mining approach written in Java to identify candidate peer reviewed publications containing information on 150 commercial and/or experimental antifungal and antimicrobial chemistries. This literature (approx. 3000 articles between 1975 to 2020) is now available for further triaging and community curation using PHIPO ontology terms in PHI-Canto.
KnetMiner development: KnetMiner released version 5.0 of the KnetMiner software, developed a new knowledge graph for Fusarium culmorum and updated existing knowledge graphs for Fusarium graminearum and Zymoseptoria tritici. The knowledge graphs now integrate protein-phenotype relations from PHI-base. In addition, a COVID-19 KnetMiner knowledge graph was developed with endpoints for data access (https://f1000research.com/articles/10-703). KnetMiner was promoted to a full ELIXIR-UK service.
Ensembl Genomes developments: We have made 3-4 Ensembl microbial releases every year and now host 1505 fungal, 237 protist and 31,332 prokaryotic genomes. Our fungal genomes coverage has increased significantly (+491 genomes) due to a fresh import of data from the public archives and 15 genomes originating from VEuPathDB's fungal database, FungiDB. Our bacteria resources have adopted UniProt's redundancy definitions resulting in the removal of over 12,000 genomes, whilst maintaining coverage across 527 bacterial families. Where appropriate, genomes have been annotated with interaction phenotypic data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity and used the Ontology Lookup Service to facilitate standardised queries and vocabulary of our data. In support of objective 4, we have developed a new data schema to capture multiple types of interactors within an extensible schema (genes, proteins, synthetic molecules) capable of creating links across species. This will enable researchers to browse a fungal effector gene in Ensembl Fungi and navigate to its first host target in Ensembl Plants (and vice versa). A new Python pipeline has been written to verify and capture curated protein-protein interaction data from PHI-base into this model. Fungal and protist protein-coding genes have continued to be used to infer evolutionary trees by determining orthology and paralogy, and several species have pairwise whole genome alignments; allowing information from well studied pathogens to help elucidate mechanisms and processes occuring in novel or understudied pathogens. Key bacterial species have continued to be embedded in Ensembl's pan-taxonomic compara.
UK and international resource usage trends: Unique visitors to Ensembl microbial portals, PHI-base and KnetMiner continue to increase: For 2021, the total usage figures were: Ensembl microbial portals > 97,000 (protists -12%, fungi-35% and bacteria -53%, PHI-base >22,000 and > 900 full database downloads, Knetminer > 7,000 unique websites and > 11,000 API users. Total users > 137,000.
Project publications: We now have a total of six peer reviewed publications one on community curation (Helder, 2019), one on using PHI-base data in multi-pathogen species network analyses (Janowska-Sejda, 2019), two PHI-base database updates (Urban 2020, 2022) and two Ensembl Genomes non-vertebrate updates (Howe, 2020 , Yates, 2022). In particular, the two 2020 NAR articles are being well cited: Howe (265 citations-Google Scholar (GS), 167 citations-Web of Science (WoS)) and Urban (88 citations - GS, 91 citations WoS).

Janowska-Sejda et al. (2019) Front. Microbiol. 10, 2721; Pedro et al., (2019) Front. Microbiol. 07, 2477; Howe et al. (2020) Nucleic Acids Research 48, D689-D695; Urban et al. (2020) Nucleic Acids Research 48, D613-D620; Urban et al. (2021) "PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1037; Yates et al. "Ensembl Genomes 2022: an expanding genome resource for non-vertebrates." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1007.

2023 entry

PHI-base data releases: The latest release of PHI-base, version 4.14, was on 1 November 2022 (http://www.phi-base.org/releaseNote.htm). Compared to version 4.7, released on 27 May 2019 (i.e. pre the start of this funding), the data increased as follows: genes from 6304 to 8993 (43%), interactions from 12,467 to 19,881 (59%), pathogenic species from 266 to 283 (6%), host species from 199 to 234 (18%), diseases from 490 to 542 (11%), and references (publications) from 3216 to 4847 (51%). PHI-base phenotyping data has been freely available at FungiDB since 2019. UniprotKB links to PHI-base records since 2020.
PHI-base gene-centric display: PHI-base, in collaboration with Molecular Connections, developed a new version of the PHI-base website (available at www.phi5.phi-base.org) that aggregates all data in PHI-base by its related gene, i.e. one page per gene per species. Each gene is assigned a stable and unique identifier that is cross-referenced with identifiers from other genetic databases. Currently, the new interface only displays data curated with PHI-Canto; data from previous versions of PHI-base are still to be migrated (discussed below). We extended the search interface of the new website to allow querying the new data types added by PHI-Canto curation.
PHI-base 4 data migration: We are developing a unified data import format for the new version of the PHI-base database that can merge data exported from PHI-Canto with data from a previous version of PHI-base (version 4). Of the 18,984 records in PHI-base 4.13, 15,483 records (82%) were found to be compatible with this new import format. So far, 16,556 phenotype annotations have been extracted from these records using an automated data pipeline. A small subset of this data has been confirmed to successfully load into the new PHI-base database and can be displayed on the new PHI-base website. Still to be migrated are annotations for disease names, Gene Ontology terms, and protein-protein interactions.
PHI-base data cleaning: Starting with PHI-base version 4.9, the list of pathogen and host strains in PHI-base have been manually reviewed to standardise nomenclature, remove redundancy, and correct curation errors. From version 4.9 to version 4.14, 965 pathogen strains and 1,500 host strains were amended. Host strains use the nomenclature of model organism databases wherever possible (e.g. MGI, WormBase, and TAIR). Cross-references were added to Expasy Cellosaurus and the Brenda Tissue Ontology for host cell lines. Disease names in PHI-base were also manually reviewed, starting with 568 names in version 4.10, of which 506 were amended, up to a total of 633 disease names amended as of version 4.14.
PHI-Canto: PHI-base, in collaboration with PomBase, developed PHI-Canto: a free and open source online biocuration tool that allows the research community to curate and annotate pathogen-host literature with terms from biological ontologies. PHI-Canto is derived from Canto, the community annotation tool developed by PomBase. We extended Canto with the following features: support for any gene with an accession in UniProtKB; the ability to curate and annotate pathogen genotypes, host genotypes, and pathogen-host interactions involving mutant pathogens and mutant or wild-type hosts; support for annotating tissue type, disease caused, relation to wild-type controls, and outcomes of gene-for-gene interactions; and the ability to specify one or more strains for any species, either from a controlled list (which includes strain synonyms) or as free text. We documented these features in an online manual included with the software. As of 7 March 2023, 36 publications have been fully curated and approved in PHI-Canto by professional curators. Eight videos with sub-titles have been made covering all different aspects of PHI-Canto curation process and these are accompanied by three PHI-base videos with sub-titles explaining how to search in this database and retrieve information for inclusion in other types of analyses. All 11 videos are available from YouTube.
PHI-base ontology development: We developed the Pathogen-Host Interaction Phenotype Ontology (PHIPO) to enable annotation of the outcomes of pathogen-host interactions across multiple species, and single species phenotypes for pathogens and hosts. We integrated PHIPO with the OBO Foundry and the Unified Phenotype Ontology (uPheno): the latter aligns our ontology semantics with other phenotype ontologies. As of 7 March 2023, PHIPO contains 952 terms, of which 532 (56%) have logical equivalence and interoperability with other ontologies, following a standard format defined by uPheno. We also developed supplementary ontologies to enable annotation of experimental conditions and diseases most relevant to pathogen-host experiments.
Identification of relevant anti-infective literature: Molecular Connections developed a bespoke text mining approach written in Java to identify candidate peer reviewed publications containing information on 150 commercial and/or experimental antifungal and antimicrobial chemistries. This literature (approx. 3000 articles between 1975 to 2020) is now available for further triaging and community curation using PHIPO ontology terms in PHI-Canto.
KnetMiner development: KnetMiner released version 5.0 of the KnetMiner software, developed a new combined knowledge graph for nine ascomycete fungiwhich are either of globally agricultural or medical importance pathogenic species or are key non-pathogenic model species: Aspergillus_fumigatus, Aspergillus_nidulans, Candida_albicans, Fusarium_culmorum, Fusarium_graminearum, Magnaporthe_oryzae, Neurospora_crassa, Saccharomyces_cerevisiae, Schizosaccharomyces_pombe, and Zymoseptoria tritici. This knowledge graph also now integrates protein-phenotype relations from PHI-base. In addition, a COVID-19 KnetMiner knowledge graph was developed with endpoints for data access (https://f1000research.com/articles/10-703). KnetMiner was promoted to a full ELIXIR-UK service.
Ensembl Genomes developments: We have made 3-4 Ensembl microbial releases every year and now host 1505 fungal, 237 protist and 31,332 prokaryotic genomes. Our fungal genomes coverage has increased significantly (+491 genomes) due to a fresh import of data from the public archives and 15 genomes originating from VEuPathDB's fungal database, FungiDB. Our bacteria resources have adopted UniProt's redundancy definitions resulting in the removal of over 12,000 genomes, whilst maintaining coverage across 527 bacterial families. Where appropriate, genomes have been annotated with interaction phenotypic data from PHI-base using an improved pipeline to match proteins in Ensembl using both names and sequence similarity and used the Ontology Lookup Service to facilitate standardised queries and vocabulary of our data. In support of objective 4, we have developed a new data schema to capture multiple types of interactors within an extensible schema (genes, proteins, synthetic molecules) capable of creating links across species. This will enable researchers to browse a fungal effector gene in Ensembl Fungi and navigate to its first host target in Ensembl Plants (and vice versa). A new Python pipeline has been written to verify and capture curated protein-protein interaction data from PHI-base into this model. Fungal and protist protein-coding genes have continued to be used to infer evolutionary trees by determining orthology and paralogy, and several species have pairwise whole genome alignments; allowing information from well-studied pathogens to help elucidate mechanisms and processes occuring in novel or understudied pathogens. Key bacterial species have continued to be embedded in Ensembl's pan-taxonomic compara.
UK and international resource usage trends: Unique visitors to Ensembl microbial portals, PHI-base and KnetMiner continue to increase. For 2022, the total usage figures were: Ensembl microbial portals >87,900 (47.45% Bacteria, 38.15% Fungi, 14.39% Protists).PHI-base > 25,800 and > 1007 full database downloads, > 6,000 unique users and > 400 registered API users. Total users > 119,700.
Project publications: We now have a total of six peer reviewed publications one on community curation (Helder, 2019), one on using PHI-base data in multi-pathogen species network analyses (Janowska-Sejda, 2019), two PHI-base database updates (Urban 2020, 2022) and two Ensembl Genomes non-vertebrate updates (Howe, 2020 , Yates, 2022). In particular, the two 2020 NAR articles are being well cited: Howe (265 citations-Google Scholar (GS), 167 citations-Web of Science (WoS)) and Urban (88 citations - GS, 91 citations WoS). A 7th Manuscript on the PHI-Canto author curation tool and the PHI-Phenotype Ontology (PHI-PO) has been placed on bioRxiv whilst under review at eLife Cuzick et al. (2022). https://biorxiv.org/cgi/content/short/2022.12.15.520601v1.

Janowska-Sejda et al. (2019) Front. Microbiol. 10, 2721; Pedro et al., (2019) Front. Microbiol. 07, 2477; Howe et al. (2020) Nucleic Acids Research 48, D689-D695; Urban et al. (2020) Nucleic Acids Research 48, D613-D620; Urban et al. (2021) "PHI-base in 2022: a multi-species phenotype database for Pathogen-Host Interactions." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1037; Yates et al. "Ensembl Genomes 2022: an expanding genome resource for non-vertebrates." Nucleic Acids Research database issue, doi: 10.1093/nar/gkab1007.

A nine month no cost extension was awarded to this project to cover the period 1st July 2022 - 31st March 2023 to catch up on several planned activities that were either delayed or suspended due to Covid-19.
Exploitation Route In October 2019, PHI-base successful applied to remain as a gold standard provider of Agrigenomics data within the UK node of the ELIXIR @Data for Life ' project.

In 2020, Knetminer successful applied to be a provider of Agrigenomics and genomics data within the UK node of the ELIXIR 'Data for Life ' project.

In 2021 we engaged with various scientific journals to determine whether they would become early adopters on the PHI-Canto author curation tool. We are now working with the journal New Phytologist to capture at source all the information from newly published articles that are with the PHI-base scope.
Sectors Agriculture, Food and Drink,Chemicals,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.PHI-base.org
 
Description This database provides information to help SMEs and larger companies develop diagnostics for specific genes of interest in key pathogenic species and to explore for new intervention targets. The same data can be used in the public sector to develop diagnostic markers for key pathogens and key alleles The AgroChemical industry and in particular the Fungicide Resistance Action Committee (FRAC) have become considerably more interested in this database and how it can in the future be further developed to help their R and D activities. The database has been used in the Smart Crop Protection (SCP) strategic programme (BBS/OS/CP/000001) funded through Biotechnology and Biological Sciences Research Council's Industrial Strategy Challenge Fund. The curation of 1st host targets of pathogen effectors into PHI-base and then ENSEMBL plants s useful for plant breeding companies, to remove / modify susceptibility alleles from breeding programmes
First Year Of Impact 2020
Sector Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Environment,Healthcare
Impact Types Economic

 
Description Jade Smith - Investigating fungal pathogen effector localisation within plant cells - SWBioDTP 2023-2027
Amount £130,000 (GBP)
Funding ID 229139594 SWBio DTP Rothamsted studentship - University of Bath 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2023 
End 09/2027
 
Description SWBio-DTP - Erika Kroll - Fusarium disease of wheat - exploring tissue specific host-pathogen interactions using a systems biology approach.
Amount £120,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2020 
End 09/2024
 
Description UKRI/BBSRC-NSF/BIO Determining the Roles of Fusarium Effector Proteases in Plant Pathogenesis
Amount £813,377 (GBP)
Funding ID BB/X012131/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 02/2023 
End 01/2027
 
Title PHI-NETS 
Description Protein-Protein interaction Networks have been developed and published for 13 pathogenic ascomycete species that infect cereal and non-cereal species 
Type Of Material Model of mechanisms or symptoms - non-mammalian in vivo 
Year Produced 2019 
Provided To Others? Yes  
Impact The new PHI-NET resource permits for the 1st time comparative genomics analyses between multiple ascomycete fungal species by network analyses. 
URL http://www.phi-base.org/
 
Title COVID-19 KG 
Description First release of COVID-19 Knowledge Graph for KnetMiner. Available in OXL, RDF and Neo4j format. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Increased citations and collaboration requests for KnetMiner. 
URL https://f1000research.com/articles/10-703
 
Title PHI-base successfully reapplied to be a member of the UK node of the Europe wide Elixir, Data for Life project 
Description PHI-base (www.phi-base.org) is a knowledge database accessed by researchers in over 125 countries. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. Genes not affecting the disease interaction phenotype are also curated. PHI-base data is linked to the genome browsers and advanced query tools in ENSEMBL and FungiDB. The data content provided comes from >3400 manually curated references. PHI-base makes a data release twice a year in May and September. The current release version 4.8 was made on 16th September 2019 and contains 6,780 genes with 13,801 interactions described using nine high-level PHI-base phenotypic categories (Urban et al. (2015), Frontiers Plant Science, doi: 10.1093/nar/gku1165). This data has come from 268 pathogen species (bacteria, fungi and protists) and 210 host species (plant, animal, others) and has been manually biocurated from 3,454 peer-reviewed publications. Typically we curate 400-440 publications each year. The PHI-base resource has been cited over 330 times with 59 citations appearing in peer reviewed articles in 2019. These citations are all listed in the about section of the database. Direct targets of pathogen effector proteins are also included. Recently the PHI-base team in collaboration with the PomBase team based at the University of Cambridge have developed an online author curation tool called PHI-Canto which is under beta testing amongst the UK community. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact PHI-base has since 2016 provided gold-standard Agrigenomics information of plant pathogens and their hosts into the UK node of the Elixir Data for life project. In 2019 we successfully re-applied for PHI-base to remain a member of the UK node of Elixir. Over 330 peer reviewed publications have cited PHI-base use in their article and cites one or more of the PHI-base references. Fifty- nine 59 PHI-base citations have appear in peer reviewed articles in 2019. 
URL http://www.PHI-base.org
 
Title Pathogen-Host Interactions database 
Description Pathogen-Host Interactions database 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact New release - (version 4.8) 
URL http://www.phi-base.org
 
Description Fusarium graminearum effector characterisation using global pangenome analyses 
Organisation University of Bath
Department Department of Biology and Biochemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution We designed this fungal effector PhD project based on soon to be published Fusarium graminearum pangenome analyses arising from an ongoing now unfunded collaboration with EMBRAPA in Brazil. This included detailed bioinformatics analyses and wet biology verification of the in planta destination location of small secreted candidate effector proteins.
Collaborator Contribution The two partners at the University of Bath will provide specialist transcriptome analyses ( Dr Neil Brown) and genome analyses ( Hans -Wilhelm Nützmann to this PhD project.
Impact Hiring of the 4 year PhD student Jade Smith
Start Year 2022
 
Description KnetMiner ELIXIR-UK Service 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We applied for KnetMiner to become a full ELIXIR-UK service.
Collaborator Contribution Following the Scientific Development Group meeting, which was held on 3rd September 2021, we are delighted to tell you that KnetMiner scored 1 - which means the panel considered that this service meets all the ELIXIR-UK criteria and is Infrastructure ready and we are delighted to include it as an ELIXIR-UK service.
Impact Invitation to participate in ELIXIR Implementation Study for Plant Sciences. Invitation to join a EU HORIZON-INFRA grant proposal.
Start Year 2021
 
Description SWBio DTP - PhD student Erika Kroll 
Organisation University of Bath
Department Department of Chemical Engineering
Country United Kingdom 
Sector Academic/University 
PI Contribution The PHI-base databases Unpublished transcriptome data sets on the interaction between the fungal pathogen Fusarium graminearum and wheat plants PHI-Nets datasets which report on predicted protein-protein interactions for Fusarium graminearum A recent review article on Fusarium -cereal plant interactions covering Fusarium genomics, transcriptomics, secondary metabolite gene clusters, virulence genes, effector genomes, cell biologies in different plant species and tissue types and novel approaches to disease control by targeting Fusarium virulence requirements.
Collaborator Contribution Unpublished transcriptome data sets on the interaction between the fungal pathogen Fusarium graminearum and wheat plants Unpublished single gene deletion mutants in Fusarium graminaerum
Impact During the 1st rotation project, the deletion of three different predicted secreted proteases was successfully completed by the PhD student. One of these single gee deletion mutants had reduced virulence. This data will fit into another recently funded project being done jointly with two US based groups where Rothamsted receives funding directly from the US.
Start Year 2020
 
Title KnetMiner 4.0 
Description KnetMiner 4.0 was released with a new feature to save gene networks to KnetSpace 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact User experience is more awarding now. Increased number of users. Increased number of citations. 
URL https://knetminer.com/news/knetminer-4-release.html
 
Title KnetMiner 5.6 
Description Next iteration of KnetMiner with new and enhanced functionality 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Wheat researchers are now able to search and visualise genomics and literature data more easily. 
URL https://github.com/Rothamsted/knetminer/releases
 
Description ? FuturumCareers.com: Saving plants from disease, for 11-19 year-olds (July 2019) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact An article was written for a new publication produced by the London based company called FuturumCareers that is specifically focussing on increasing the number of school pupils in the age group 11-19 to become interested in a STEAM subject and to go on and select an appropriate University Course. The PHI-base team at Rothamsted in collaboration with the ENSEMBL team at EMBL-EBI in Cambridge developed both a comprehensive magazine style article and a quiz to enhance the understanding of pathogen - plant host interactions and the increased use of big data to explore plant diseases.
Year(s) Of Engagement Activity 2019
URL https://futurumcareers.com/saving-plants-from-disease
 
Description ? Press release 'hypervirulence genes' in London Metro (Nov 2019) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact As a result of the initial pres release done by Rothamsted Research, this 2nd press release appeared in the London Metro Magazine which is distributed and read throughout the London Underground.

This 2nd press release also highlighted the increasing number of PHI-base data entries that have recorded an increase in virulence (hypervirulence) phenotype as a result of a deliberate single gene change, typically a gene deletion or point mutation in pathogenic species that infect either plants, animal or human hosts. Collectively, these results highlight that the pathogenic process is controlled by an increasing diversity of negative feedback loops that can be altered through mutation and result in viable new strains that can cause increased disease symptoms and / or in host pathogen burden. In this article the effects on human pathogens were particularly highlighted.
Year(s) Of Engagement Activity 2019
URL https://www.metro.news/superbug-is-one-mutation-away-from-killing-us-all/1812832/
 
Description Blog post on FAIR Knowledge Graphs 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A blog article by Marco Brandizi describing the power of standardised and FAIR knowledge graphs under the hood of KnetMiner.
Year(s) Of Engagement Activity 2020
URL https://knetminer.com/cases/the-power-of-standardised-and-fair-knowledge-graphs.html
 
Description Fusarium secretome 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact This was an invited seminar and discussion period with the company Corteva. The scientific audience was based at several of their sites in the USA who were primarily focussed on maize and soybean improvement and the threat of Fusarium diseases and mycotoxin contamination. The seminar was entitled 'The secretome of Fusarium: The known and unknown'. The audience included bioinformaticians, plant biotechologists, molecular plant pathologists and plant breeders.
Year(s) Of Engagement Activity 2021
 
Description Government Open Access article (Jan 2020), page 370 entitled ' Fighting infectious diseases: Protecting the global wheat crop with big data analysis and knowledge networks' 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact This article was written to increase awareness of the importance of crop plant health amongst UK and European politicians, leaders of industry and leaders of third sector organisations. The article had a focus on the effects of disease causing pathogens on the No 1 arable crop in Europe namely wheat and how experimentation is increasing becoming predictive through using big data sets and network analyses. The two BBSRC funded resources highlighted in this online and hard copy article are the Pathogen-Host Interactions database (PHI-base) and the knowledge graphical visualisation tool Knetminer.
Year(s) Of Engagement Activity 2020
URL https://edition.pagesuite-professional.co.uk/html5/reader/production/default.aspx?pubname=&edid=e7e6...
 
Description Invited guest Lecture - University of Bath - Healthy humans, healthy animals, healthy crops, healthy food and healthy natural ecosystems. What's the problem ? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Invited lecture focussing on the importance of different beneficial and detrimental fungi on society and the global economy. Examples were given for the major crop species in particular fungi that cause diseases on wheat and maize crops, how fungicide resistance emerges in plant, human and farmed animals and how fungi can be used to produce a wide range of high value metabolites and proteins. The PHI-base database was also explained as away to keep track of the genes required by fungi to cause disease. This special lecture concluded with a lively debate with the students many of which were international MSc students on how to improve plant healthy and make crop yield more resilient. We also discussed career pathways in science and the differences and similarities between doing research in a company, for a charity or at an academic institution.
Year(s) Of Engagement Activity 2020
 
Description Invited oral presentation given at Plant Health at the Age of Metagenomics, Scientific Colloquium -Paris France September 2019 by Kim Hammond-Kosack 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited Oral Presentation entitled 'Using a FAIR database and bioinformatics analyses to improve plant, human, animal and ecosystem health' at Plant Health at the Age of Metagenomics, given at a Scientific Colloquium organised by the European and Mediterranean Plant Protection Organization and the Euphresco network for phytosanitary research coordination and funding UNESCO, Paris, France, 26th September 2019.

This scientific colloquium focussed on ways to improve phytosanitation methods and the detection of emerging plant pathogen threats across the 31 different country borders throughout Europe

Link to the video-recordings - http://bit.ly/350Hm1P
Link to the recorded presentation - https://zenodo.org/record/3471776#.XZY8z0Yzbcs
Year(s) Of Engagement Activity 2019
URL https://zenodo.org/record/3471776#.XZY8z0Yzbcs
 
Description Invited oral presentation given at the 6th Plant Genomics and Gene Editing Conference, (May 2019, Rotterdam, The Netherlands) by Kim Hammond-Kosack 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This invited oral presentation and discussion was given to increase awareness of the PHI-base resource to the European Plant Health community that are focussed on delivering improvements to either the arable or horticultural sectors.
Year(s) Of Engagement Activity 2019
 
Description Invited talk on the PHI-base database given at the European Fusarium workshop, in Rome, Italy, February 2020 by Martin Urban 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The invite talk was entitled 'The Pathogen-Host Interactions database (PHI-base): a phenotype database for pathogens, hosts and their interactions to enhance global food security and human health'. This talk was given to raise awareness in the International Fusarium community of the very large number of database entries that have arisen from gene function studies in a wide range of Fusarium species infecting various cereal and non- cereal plant species. The talk also highlighted that the data in the PHI-base resource is also available and is searchable from within the ENSEMBL fungal genomes resource hosted by the EMBL-EBI, Cambridge as well as the FungiDB resource.
Year(s) Of Engagement Activity 2020
 
Description M Urban conference poster 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster 'Harnessing community expertise to determine anti-infective target sites' presented at Resistance '19, Rothamsted UK. 16-18th Sept 2019
Year(s) Of Engagement Activity 2019
 
Description New Scientist Live Excel London - Oct 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact The Fusarium team in collaboration with the PHI-base, Knetminer, EMBL-EBI, the BSPP and the Wellcome Connecting Science teams devised and presented a large display at New Scientist Live (ExCel London, October 2022), providing posters, hands-on interactive activities, and career advice on the topics of disease and mycotoxin control, DNA extraction, genomics, biocuration, and network analysis. On display table No1 were wheat plants infected with Fusarium head blight, infected and non-infected grains, the chemical structure of the DON mycotoxin as well as wheat plants infected with the Take-all fungus, petri dishes with fungal cultures and a binocular microscope to aid detailed viewing of infected plant material and /or the fungus. On display table 6 was a newly devised interactive game that allowed the visitor to learn about current disease control strategies and future NextGen options based around genomics, functional genomics and/or effector biology for ten globally important arable crop, horticultural and animal husbandry disease problems. FHB disease of wheat was one of the disease problems that could be selected. In total, 21,500 visitors attended the event (one day for school-age children and two days for the general public), and ~1,000 visitors explored our display, with ~20% staying for 1-2 hours. We had a diverse team consisting of post-docs, PhD students and undergraduates across six nationalities. Include in the display team for all 3 days from this project were Martin Urban and Kim Hammond-Kosack.
Year(s) Of Engagement Activity 2022
URL https://live.newscientist.com/
 
Description New ways to combat infectious diseases, article for Laboratory News 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This article was written to increase awareness of the importance of crop plant health, human health and ecosystem health amongst UK, European and International Researchers actively engaged in other research disciplines The article has a focus on the effects of disease causing pathogens (fungi, protists and bacteria) on different host species and how experimentation is increasing becoming predictive through using big data sets and network analyses. The two BBSRC funded resources highlighted in this online and hard copy article are the Pathogen-Host Interactions database (PHI-base) and the knowledge graphical visualisation tool Knetminer.
Year(s) Of Engagement Activity 2020
URL https://www.labnews.co.uk/
 
Description Poster presentation given on PHI-base at the 15th European Fungal Genetics conference (Feb 2020, Rome, Italy) given by Martin Urban 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A poster entitled ' PHI-base, a multi-species phenotype database for pathogens, hosts and their interactions to enhance global food security and human health' was given at this international conference to raise awareness of this resource and to ' sign up ' authors to test the newly developed PHI-Canto author curation tool.
Year(s) Of Engagement Activity 2020
URL https://www.ecfg15.org/
 
Description Press release - KnetMiner re-purposed for COVID-19 research 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Taking time off from their own research, the Rothamsted team repurposed a tool they had originally developed to help crop scientists, to provide medical researchers with quick and intuitive access to all documented linkages between genes, medicines, and the virus.
Year(s) Of Engagement Activity 2020
URL https://www.rothamsted.ac.uk/news/rothamsted-answers-white-house-call-coronavirus-data-help
 
Description Press release on PHI-base update article Urban et al. Nucleic Acids Research article (Nov 2019) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This press release highlighted the increasing number of PHI-base entries that have recorded an increase in virulence (hypervirulence) phenotype as a result of a deliberate single gene change, typically a gene deletion or point mutation in pathogenic species that infect either plants, animal or human hosts. Collectively, these results highlight that the pathogenic process is controlled by an increasing diversity of negative feedback loops that can be altered through mutation and result in viable new strains that can cause increased disease symptoms and / or in host pathogen burden.
Year(s) Of Engagement Activity 2019
URL https://www.rothamsted.ac.uk/news/gene-data-suggests-superbug-threat-underestimated