DiseaseNetMiner - A novel tool for mining integrated biological networks of host and pathogen interaction

Lead Research Organisation: Rothamsted Research
Department Name: Computational and Analytical Sciences

Abstract

Modern society is increasingly under threat from a plethora of microscopic fungal pathogens, which cause diseases in agricultural and horticultural crops, and in farmed animals. Many of these diseases cause a significant, detrimental impact on global and local food security. Furthermore, there is a worrying tendency for pathogens to become more aggressive towards their hosts, to cause more disease (increased virulence) and for the anti-fungal chemicals (called fungicides) that are often used to control fungal pathogen outbreaks to become less effective.

In the recent past, (during the genomics era) scientists have developed technologies to sequence and assemble all the chromosomes of an organisms and predict the gene content (i.e. obtain their complete genome blueprints). Now, in the post-genomic era, next generation sequencing technologies have been developed and this has led to an explosion of more genomic data alongside a wealth of gene expression, protein expression, genetic and biological data, which are used by scientists to describe pathogen-host interaction phenotypes and disease outcomes. However, for many scientists with expertise in biology, biochemistry or genetics, this 'omics' data explosion is often seen as a burden, 'an infinite data soup of varying qualities' that only those with specialist computing-based interpretation skills (called bioinformatics), but often only minimal specialist biological knowledge, can penetrate. Therefore, new computer based tools urgently need to be developed to allow researchers to connect, explore and compare all the large and small-scale datasets available for pathogenic species that cause diseases. Once we fully understand how fungal pathogens cause disease, and how the host species try to defend themselves, will it be possible to manipulate these processes and mechanisms and go on to devise new ways to reduce disease levels and thereby improve global food security.

In this project, we will develop a novel software tool, called DiseaseNetMiner, which will be user-friendly and can be used by many different types of scientists to explore integrated biological networks that can predict processes controlling the disease-causing abilities of fungal pathogens. DiseaseNetMiner will deliver understandable outputs from diverse and complex large-scale data inputs. DiseaseNetMiner will allow researchers without specialist bioinformatics skill to explore and compare this wealth of existing data from multiple species with their own latest cutting-edge results to permit rapid progress and new discoveries. This fundamental tool will effectively connect different data types and then return the results in an accessible, explorable, as well as scalable, format that can be easily manipulated, displayed and interrogated. DiseaseNetMiner will create a novel research environment from which new scientific insights and biological discoveries can be made.

The UK research community has been at the very forefront of research and discovery in this field. The initiative we propose will be an exceptionally useful and cost-effective way of ensuring that the leadership shown by the UK research community will continue in the decade ahead. We expect this to yield outcomes with huge impact in our field and beyond, to meet the grand challenges of our age.

Technical Summary

DiseaseNetMiner will provide a user-friendly tool for candidate gene prioritisation and hypothesis generation from large host-pathogen knowledge networks. To ensure efficient and cost-effective delivery, we will make use of the free and open-source Ondex and QTLNetMiner frameworks that we have previously developed at Rothamsted Research. DiseaseNetMiner will include significant, new functional advances and provide a timely and novel tool for the plant and fungal research communities.

Objective 1: Use Ondex to integrate public multi-omics datasets, phenotype information, homology data and functional gene annotations for the key fungal model species S. cerevisiae, N. crassa and A. nidulans, and the pathogenic fungi F. graminearum and Z. tritici. This will deliver a semantically integrated knowledge network (graph data warehouse) of interlinked fungal species that we will provide access to through the QTLNetMiner web application (FungiNetMiner).

Objective 2: Create a novel, combined pathogen-host network by integrating an existing wheat-arabidopsis-rice knowledge network and the new fungal network based on annotations to cross-species ontologies, curated pathogen-host interaction databases, text-mining using gene names and disease/phenotype ontologies, and a statistical correlation approach using data from dual long time-series RNA-seq experiments of infected plants. This knowledge network is estimated to have about 750,000 nodes and 4,000,000 edges including all wheat, Arabidopsis and fungal genes.

Objective 3: Develop the DiseaseNetMiner web application by extending the QTLNetMiner client-server framework to fully support combined plant-pathogen networks and user-provided SNP/GWAS input data. This will require the development of new graph queries to explore combined plant-pathogen networks and novel tools to visualise SNPs within biological interaction networks and to include SNP consequences in the QTLNetMiner gene prioritisation algorithm.

Planned Impact

Modern society is increasingly under threat from a plethora of fungal pathogens, many of which
have a significant, detrimental impact on food security. In the post-genomic era, the explosion of 'omics, genetic and phenotypic data is often seen as a burden, 'an infinite data soup of varying qualities' that only those with specialist bioinformatics skills, but often only minimal specialist biological knowledge, can penetrate.

In this 15 month pilot project, we will focus on the development of new in silico tools to assist scientists based in academic, industry, NGOs and /or government departments to connect, explore and compare large-scale fungal and plant datasets to identify the various generic and species-specific biological processes that control pathogenesis, fungicide resistance and avirulence, while connecting this information to the general growth and development of eukaryotic organisms. The new tool, called DiseaseNetMiner, will allow researchers without specialist bioinformatics skills to explore and compare this wealth of existing data from multiple species, with their own latest cutting-edge results to permit rapid progress, new insights and new discoveries.

DiseaseNetMiner will contain and connect information in a combined network for three well studied fungal species (S. cerevisiae, N. crassa and A. nidulans) and two globally economically important fungal pathogens of wheat crops, namely Fusarium graminearum and Zymoseptoria tritici. This fungal network will be connected to a plant network combining information from Arabidopsis, rice and wheat.

Academics will use DiseaseNetMiner to interrogate and prioritise candidate gene lists for both pathogens by gaining additional annotation from model species, the responses of the host plant and from the occurrence of SNPs. Researchers would then test the function of the most promising candidate genes with reverse genetics experiments. Researchers will use the combined plant and pathogen networks to explore functions of genes and their interactions. Researchers involved in sequencing large numbers of natural or laboratory generated strains of F. graminearum and Z. tritici with subtly different phenotypes, e.g. increased aggressiveness, can use DiseasesNetMiner for the biological interpretation of genome wide association studies. This should reveal the domains, hubs and /or proteins in the network most likely to be causally linked to the phenotypic shift.

In the short term, the agrochemical and plant breeding industries will use DiseasesNetMiner to connect publically-available datasets on pathogens, crop plants and model species to their own proprietary data sets. This will give new insights on where product failure, (agrochemical or resistant wheat cultivar) may be occurring as individual population shift occur or when individual isolates in a population mutate and rise in abundance. In the longer term, these connected data sets, when viewed over yearly time series, will help to guide the identification of novel target sites for intervention and, therefore, guide innovative agrochemical product development and influence disease resistance breeding strategies, such as identifying the most effective R gene stacks.

NGOs and government departments will use DiseaseNetMiner to investigate and predict the possibility of emerging disease threats in the UK and elsewhere by investigating the genomic changes present in newly virulence strains and/ or fungicide resistant strains and go on to provide advice to farmers and the AgriIndustry.

To deliver the above impacts, we plan to develop a community of researchers in the UK and elsewhere to beta test the tools with their own unpublished data, publicise the existence of DiseaseNetMiner to the wider scientific community at national/ international conferences, hold virtual and face-to-face workshops, give seminars to various industries / consortia and complete various media activities.

Publications

10 25 50
 
Title KnetMiner overview and demo at PAG 2017 
Description Overview of KnetMiner 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact Increased visibility of KnetMiner 
URL https://www.youtube.com/watch?v=PvxHTems3pA&feature=youtu.be&a
 
Title Lightdrawings of networks by Hugo Dalton 
Description When science meets art, or new meets old. KnetMiner has inspired the artist Hugo Dalton. He has created lighdrawings of our networks and has projected them on old sculptures to bring them back to life. 
Type Of Art Artwork 
Year Produced 2018 
Impact Our work has been displayed for 3 months to visitors of the Fitzwilliam Musuem. 
URL https://www.instagram.com/p/BflFDaHlwdv/
 
Title Slideshare Presentation of Knetminer 
Description This is online presentation introducing the knetminer software application. 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact It helps with presentation to potential collaborators. 
URL https://www.slideshare.net/KeywanHassaniPak/knetminer-overview-oct-2017
 
Title Video Introducing KnetMiner 
Description A short 90-sec clip introducing KnetMiner to the general public. What is it? Who developed it? Who uses it? 
Type Of Art Film/Video/Animation 
Year Produced 2017 
Impact Video inspires general public, students and scientists visiting the KnetMiner website. 
URL https://www.youtube.com/watch?v=4aOv5QXqvLI
 
Description New features and data are now available in KnetMiner for wheat. New features include improved search of gene lists, interactive filtering of search results and improved compatibility of KnetMiner with Cytoscape Desktop. New datasets include wheat gene expression studies from EBI and GWAS data from The Triticeae Toolbox. Wheat specific publications were added to the knowledge graph and abstracts were scanned for gene-phenotype information. Created knowledge networks and KnetMiner for two major disease causing wheat pathogens: Fusarium and Zymoseptoria. Added disease related RNA-seq studies to wheat KnetMiner. Developed RDF and Neo4j versions of the networks. Demonstrated to wheat pathogenomics group at Rothamsted. Developed a new approach to transform tabular data to knowledge networks. This can be used to load a variety of data sources into a form that can be easily mined by the DiseaseNetMiner tools.

Ported and restructured the Ondex code base from SourceForge to GitHub. This will make it much easier to have more frequest software release in the future.

Created the a first version of a knowledge network for yeast based on latest data releases from SGD. Yeast is one of out three model species identified as highly important for pathogenic fungi.
Exploitation Route We have submitted a BBSRC BBR-GCRF proposal to develop KnetMiner 2.0 for mining pangenomic knowledge networks of crop phenotypes and diseases. We are expecting two research papers and one software note to be published on our R&D work on KnetMiner and DiseaseNetMiner. Academic partners such as IRRI in Philipinnes or INRA in France are interested to connect KnetMiner and our knowledge networks with their own genetics or genomics resources. In addition to the academic activities, we are in contact with several companies who have shown interest or are evaluating the use of in-house versions of KnetMiner deployed with their own private datasets.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology

 
Description KnetMiner had over 1500 unique users since April 2017 (+200% compared to previous year) with most coming from the UK (30%), followed by US (24%) and including users from developing countries such as Malaysia (4%), Philippines (2%) and India (2%). The average session duration lasted 5:22 minutes (-35% indication for users can get information faster). We have run a training course to wheat researchers and breeders at the Wheat Bioinformatics Workshop in Bologna and since have started a collaboration with the Tuberosa group to rank genes in wheat QTLs using KnetMiner and our new Neo4j graph database. We were invited to present KnetMiner at PAG2018 and had follow-up interactions with the Quesneville group at INRA to add KnetMiner to the federated WheatIS index. KnetMiner has been integrated into the commercial Genestack platform for use by companies. This allows companies to add their own private data and deploy KnetMiner in a secure cloud environment. We embrace openness and all software and resources developed under this proposal will be freely-available to non-profit users in accordance with FAIR principles. All source-code will be freely available under a permissive, open-source license. However, it is also important to have a business model in place to ensure sustainability and longevity of the resource. For that reason, full download of the knowledge graph in different formats can be requested and will be free for academics but not free for commercial users, e.g. breeding or agri-chemical companies, who will be charged for accessing the resource. This will be a not-for-profit activity and all revenues generated will be directed to the further development of the resource. We already have one breeding company paying under this model and hope to be able to generate further income that can contribute towards sustaining the platform in the future.
First Year Of Impact 2017
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Bioinformatics to advance wheat breeding
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
Impact Trained researchers to use evidence-based practices in biological decision making.
URL http://www.wheatinitiative.org/events/durum-ewg-workshop-bioinformatics-advance-wheat-breeding
 
Title Wheat Knowledge Network - Release Nov 2017 
Description Integrated database of wheat genome, genotype, phenotype and homology information (Hassani-Pak et al, 2016) 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact This database powers the KnetMiner application 
URL http://knetminer.rothamsted.ac.uk/
 
Title Wheat pathogens knowledge network - Release Nov 2017 
Description Knowledge networks of wheat pathogens Fusarium and Zymospeptoria 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Will help to understand wheat diseases 
URL http://knetminer.rothamsted.ac.uk/
 
Description University of Bologna - Tuberosa 
Organisation University of Bologna
Country Italy 
Sector Academic/University 
PI Contribution This collaboration with Roberto Tuberosa's group was to rank genes in 100s of QTL using a Neo4j wheat knowledge network. It was testing the gene ranking methods that we have developed and to demonstrate the potential of knetminer to this group.
Collaborator Contribution The collaborators provided data and expertise of wheat genetics.
Impact Research is still ongoing and no impact yet evident
Start Year 2018
 
Description WheatIS Group interaction with INRA Versailles 
Organisation French National Institute of Agricultural Research
Department INRA Versailles
Country France 
Sector Public 
PI Contribution At INRA Versailles, URGI group is the technical hub of the WheatIS bioinformatics team. We are working with them to better integrate knetminer with the developing WheatIS infrastructure.
Collaborator Contribution The URGI group have been contributing expertise and access to software interfaces.
Impact To early in the collaboration for impact to be recognised
Start Year 2018
 
Title KnetMiner release 1.0 - March 2018 
Description Helps users to analyse biological experiments and put finding into the context of published knowledge. KnetMiner release 1.0 - March 2018 includes new pathogen species Fusarium graminearum and Zymoseptoria tritici 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Accelerates gene discovery and plant breeding. 
URL http://knetminer.rothamsted.ac.uk/
 
Title OWL Ontology for Bioknowledge Networks 
Description This ontology is used as part of a suite of tools to represent knetminer knowledge networks using RDF 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Making knetminer systems shared through semantic web technology will facilitate sharing and access to the data from other tools. 
URL https://github.com/Rothamsted/bioknet-onto
 
Title Ondex to RDF Exporter 
Description Ondex components and applications that are necessary for building genome-scale knowledge networks used in projects like KnetMiner. It includes the Ondex base, CLI, workflow engine and a set of plugins (parsers, mappers, transformers, filters and exporters) that are relevant for building genome-scale knowledge networks 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact It is part of a suite of tools that help knetminer networks, including those developed for wheat, be shared through linked open data methods. 
URL https://github.com/Rothamsted/ondex-knet-builder/tree/master/modules/rdf-export-2
 
Title RDF to Neo4j converter 
Description RDF-Neo4 Converter and config to load KnetMiner data 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Exports our biological knowledge networks into formats that can be more easily re-used. 
 
Title Wheat KnetMiner - Release Nov 2017 
Description Wheat KnetMiner - Release Nov 2017. Added disease related RNA-seq studies and wheat GWAS data. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact Helps wheat researchers in gene discovery and knowledge visualization. 
URL http://knetminer.rothamsted.ac.uk/Triticum_aestivum/
 
Description Blog - The rise of the genomes 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Blog article about the highlights of the Plant and Animal Genomes Conference in San Diego
Year(s) Of Engagement Activity 2018
URL https://www.rothamsted.ac.uk/articles/rise-genomes
 
Description Connecting Crop Phenotype Data Workshop - PAG 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Several groups (INRA France, IPK Germany, Cornell University) approached us to develop programmatic interfaces (APIs) for KnetMiner. This will enable and facilitate tool integration and collaboration.
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/webprogram/Session4249.html
 
Description Durum EWG Workshop: Bioinformatics to advance wheat breeding 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A 2-day workshop organised by Roberto Tuberosa and Luigi Cattivelli and attended by 100 wheat breeders, geneticists and researchers to learn about cutting-edge bioinformatics tools and resources available for wheat. The KnetMiner training led to a collaboration with Roberto Tuberosa's lab to identify potential candidate genes in hundreds of wheat QTL using KnetMiner networks and APIs.
Year(s) Of Engagement Activity 2017
URL http://www.wheatinitiative.org/events/durum-ewg-workshop-bioinformatics-advance-wheat-breeding
 
Description Industry Workshop - PAG 2017 (California) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Engaged with several food and agri companies that showed interest in the KnetMiner tools and approaches for knowledge mining
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/webprogram/Session4338.html
 
Description Invited talk at the Plant and Animal Genomes Conference, San Diego 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited speaker at the largest agri-genomics conference in the world which attracts over 3,000 participants. Presentation sparked follow-up discussions with the International Rice Research Institute (Philippines) and INRA URGI (France).
Year(s) Of Engagement Activity 2018
 
Description KnetMiner training course (EBI 2017 - Introduction to omics data integration) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 30 scientists from academia and industry were trained to use KnetMiner for the discovery of genes involved in specific diseases. Audience was very engaged and found our appraoches very useful but also requested some new features that would improve KnetMiner.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/introduction-omics-data-integration
 
Description Press release on KnetMiner software and collaboration with Genestack 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Press release introducing the KnetMiner software developed in the Hassani-Pak lab at Rothamsted and a recent collaboration to make it available as an App in the Genestack bioinformatics platform. News covered by Rothamsted, Genestack, BBSRC, Aafarmer, Farmbusiness and other websites.
Year(s) Of Engagement Activity 2017
URL https://www.rothamsted.ac.uk/news/visualising-data-connections-promises-faster-discoveries
 
Description Revival Exhibition by Hugo Dalton at Fitzwilliam Museum, Cambridge 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Hugo Dalton's lightdrawings were inspired by two major innovations, i.e. Omega3 plants and KnetMiner software, from Rothamsted Research. His light projections were on display at the Fitzwilliam Museum in Cambridge from Nov 2017 - Feb 2018.
Year(s) Of Engagement Activity 2017,2018
URL http://www.fitzmuseum.cam.ac.uk/calendar/whatson/hugo-dalton-revival-lightdrawings
 
Description Software Evaluation and Licensing - 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentations to an international company, who can't be named for reasons of business confidentiality were made to introduce them to the potential of knetminer as a software platform in their crop science research division.
Year(s) Of Engagement Activity 2018
 
Description Software licensing and evaluation - 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact This was a small webinar to a company who does not want to be named, but was interested in licensing our knetminer software. The webinar had several company scientists on the call. It led to a small commercial contract that enabled the company to undertake a more extensive in-house evaluation of the software.
Year(s) Of Engagement Activity 2017
 
Description Training for IRRI Collaborator 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a small scale training session over several days for IRRI research staff. The training covered the construction and use of the Bioinformatics software tool Knetminer.
Year(s) Of Engagement Activity 2017