A FAIR community resource for pathogens, hosts and their interactions to enhance global food security and human health

Lead Research Organisation: University of Cambridge
Department Name: Biochemistry

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

PHI-base is the phenotype data source provider. We will continue to curate the literature for ~200 pathogenic species and include emerging problematic species. New advanced curation will include (a) first host plant targets of pathogen effectors, (b) anti-infective targets and variant sequences causing chemical insensitivity, (c) ~8 specific genome landscape features. We will further develop the multi-species PHI-Canto tool to enable rapid, accurate and comprehensive publication based author curation. PHI-base data is to be made available in emerging data exchange formats (eg phenopackets) to increase interoperability and use. The new PHIPO ontologies to underpin this curation will be built using protégé and adhering to strict ontology development principles outlined by the obo-foundry.

The PHI-phenotype information will be mapped onto microbial genes in Ensembl Genomes; an established platform combining a relational database back-end for persistent, non-redundant storage of data with web-based tools, programmatic interfaces (including RESTful APIs) and the ability to export and upload (local or remote) annotation files in standard file formats (e.g. BAM, CRAM, VCF). Genomes are overlaid with variation/ transcriptome data along with whole genome alignments and pan species comparative relationships; allowing extrapolation of functional annotation, eg from well understood pathogens to under-studied, under-funded pathogens.

To provide a bigger context, we will functionally advance the Knetminer open-source software to integrate the PHI-data and ontologies with biological pathway (BioCyc) and protein-protein interaction data (BioGrid, IntAct) from eight model organisms to elucidate the cascading processes triggered by pathogen effectors and their first targets in the host. This will allow multi-species, cross-kingdom network visualisation and analysis. We will create biannual releases of the integrated knowledge base in FAIR compliant RDF and Neo4j graph formats.

Planned Impact

This FAIR community resource is aligned with the BBSRC fundamental and strategic research priorities to achieve sustainable global food security, and improve human and animal health and wellbeing across the life course.
This resource is of immediate benefit to all researchers in the medical, crop plant, animal and model organism biosciences working on diseases caused by fungi, protists and bacteria, and will remove bottlenecks to new discoveries caused by data sets being unavailable, non-integrated and/or incompatible for simple queries/complex analyses. Priority infectious microbes have previously been selected and included according to UK industrial and academic researcher interests. This project will provide standardised annotation, more powerful comparative analyses, and greater data access through interactive interfaces and new tools.
The interpretation of genome-scale molecular biology and phenotyping data is a key component in the development of novel strategies for sustainable disease control in humans, cropped plant, farmed animals and has considerable academic, economic, social and ecological value. Specifically, this FAIR resource will organise genome sequence, genetic variation and phenotypic data and make it widely accessible through a new set of interfaces and new tools to permit genome-wide enquiries, linked to literature-curated pathogenic phenotypes associated with gene mutations.
The driving rationale for the project, as well as its greatest potential for societal impact, is in two targeted sectors. Firstly, sustainably increasing the yields of crop plants, through assisting the development of strategies for pesticide development and plant breeding. Crucially, this depends on an understanding of gene function (effectors and their targets, and other downstream biological functions dependent on these), which determine the range of possible pesticide targets, the total genetic reservoir available to plant breeders, and possible side effects (in terms of the impact on plant growth, development and overall health). This new resource and the associated new tools will provide access to existing and new knowledge for numerous phytopathogenic species. The second targeted sector is human health and medical interventions to ensure health ageing throughout the life course. Understanding pathogen gene function, host targets and downstream biological functions will aid novel drug discoveries, track clinical efficacy and help diagnostic companies follow emerging problematic pathogenic microbes.
The main route to achieving impact will be through raising (academic and commercial) user awareness and use of the resource. Potential beneficiaries include AgCompanies developing pesticides or attempting to breed new varieties of pathogen-resistant plants and pharmaceutical companies developing new health care products to stop / minimise infectious microbes in the general human populations and within hospitals. More generally, farmers and the wider global population will benefit from improved strategies for disease control, although they are not expected to be among the direct users of the database. The PIs at each organisation will engage with society, the media and policy makers to make the case for the importance of research into crop plant and medical important pathogens in the context of rising global concern about food and energy security, human health, farmed animal health, ecosystem resilience and of the potential benefits of genomics in addressing these concerns.
The five project objectives have been chosen in the light of the above observations. Collectively, the objective is to put the increasing quantities of data being generated back in the hands of researchers in as useful a form as possible, and to allow them to see the full spectrum of experimental results - from the study of an individual mutant phenotype to information about the expression of a gene or its variance in populations in an integrated fashion

Publications

10 25 50
 
Description This award has provided novel curation tools and ontologies with which to store, codify, and analyse data relating to the interaction between pathogenic fungi and crop plants, thus establishing a vital resource for plant pathologists in academia and the agrichemical industry. This is an important resource for the maintenance of the nation's food security. Moreover, it has enabled us to make a major input to the international Gene Ontology Consortium's work on the whole range of pathogen-host interactions, thus impacting not just on plant protection, but also human and animal health.
First Year Of Impact 2019
Sector Agriculture, Food and Drink,Chemicals,Environment,Pharmaceuticals and Medical Biotechnology
Impact Types Economic,Policy & public services

 
Description Gene Ontology Mulispecies Working Group on Pathogen-Host Interactions
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or Improved professional practice
Impact The development of appropriate descriptors of plant-pathogen interactions has improved the ability of plant pathologists to model such interactions, study the genetics of these interactions, and to improve agricultural practice.
 
Title Improvements to PHI-Base 
Description Tools and resources that enabled this important data base of pathogen-host interactions in plants to accurately codify, model, and represent those interactions. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Developed methods for gene annotation tailored to plants and plant pathogens. Constructed an ontology for representing plant-pathogen-host interactions. 
URL http://www.phi-base.org/
 
Title PHI-Canto 
Description We have developed PHI-Canto, a multi-species community curation tool for the PHI-base database (canto.phi-base.org) (Rutherford et al., (2014) Bioinformatics 30: 1791-1792). PHI-Canto is a web-based tool that enables professional curators and publication authors to curate peer-reviewed papers using terms from biological ontologies. PHI-Canto is an extension of the Canto (curation.pombase.org) software developed by PomBase (pombase.org), the model organism database for fission yeast. Canto required major development to support curation for PHI-base. Firstly, Canto was originally defined for single-species curation, and could therefore only be configured with a set of single-species identifiers. PHI-Canto adds support for UniProt identifiers, which cover gene products from many species. Using UniProt allows PHI-Canto to automatically retrieve information about the gene product required by PHI-base, including its description, sequence, cross-references to other databases, taxonomic lineage, etc. The curator is no longer required to enter any of this information manually, which reduces curation overhead and room for error. Canto supported phenotype annotation, but only for a single species phenotype-genotype combination. PHI-Canto extended this to allow annotations on a pathogen-host interaction genotype, which is modelled as a composition of a pathogen genotype and a host genotype (referred to as a metagenotype). Curators can curate allele variants in either the host or the pathogen, or both, to permit maximum flexibility for each interaction. PHI-Canto phenotype annotation also allows the curation of one or more strains for each species in a curation session. Note that the term 'strain' in PHI-Canto refers to any variation below the species level (including subspecies, pathovars, etc.); this matches the historical use of 'strain' in PHI-base. Changes introduced into strains (laboratory strain) are described in the genotype, as well as any background mutations, for example Ku-70. Canto contains built-in help pages to assist users in the curation process. Much of this documentation was only applicable to fission yeast curation; we are in the process of adapting and extending the help text to cover pathogen-host curation. We also plan to create tutorial videos to introduce curators to the tool. Initially, these videos will cover the curation process for common areas of research: such as fungicide resistance, early-acting pathogen virulence proteins, and the first host targets of pathogen effectors. The Gene Ontology (GO) is a major resource that describes the normal functions of genes and gene products. Unfortunately, the pathogen-host branch of the Gene Ontology has been developed in an ad hoc manner over the past two decades. Consequently, a large proportion of the existing terms were either unsuitable for use, described redundantly in another area of the ontology, or were otherwise problematic. These issues made this area of the GO difficult to use in annotation; curators tended to use very general terms which are mostly uninformative about the actual functions or process studied (e.g. 'response to fungus'). Val Wood (University of Cambridge) has been working with the senior GO editor Pascale Gaudet (Swiss Institute of Bioinformatics) to revise this part of the ontology. As part of this work, over 125 issues (problems) related to pathogen-host GO terms have been logged on the GO ontology tracker, of which 95 are now closed. Many redundant terms have been merged, over 200 terms have been obsoleted, and many new terms have been created to describe normal pathogen biology. This work was critical for two reasons. Firstly, GO is the main ontology used to logically define biological aspects of PHIPO phenotype terms; thus, to describe abnormal phenotypes, we require clear and unambiguous descriptions of normal biology. Secondly, the PHI-base curation process includes the annotation of normal functions, processes, and cellular locations using GO terms (including those which describe experiments not directly related to pathogenicity). 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This curation tool and the associated ontologies are all publicly available and are used by researchers studying fungal diseases of crop plants in both the agrichemical industry and academia. 
URL https://canto.phi-base.org/
 
Title PHIPO Ontology 
Description An ontology to enable pathogen-host interactions to be accurately represented in databases/ 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact More accurate representation of detailed pathogen-host interactions. 
URL https://obofoundry.org/ontology/phipo.html