QTLNetMiner - Mining Gene Networks from QTL Intervals

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Discovering which genes determine a particular biological trait in crops, animals or humans is a very important finding. There are many applications of such knowledge including: identifying new biomarkers for animal or human diseases which can lead to new diagnostics; designing screens for new drugs, and helping to select new varieties of crop or livestock animals with improved productivity or resistance to stresses such as disease. Searching for these candidate genes in a crop or animal genome is, however, like searching for a needle in a haystack and gathering the evidence that supports the choice of one gene over another is even more daunting. This is because the evidence is scatted among different internet databases and in incompatible forms that are not easily linked together or integrated. One very important class of information used by biologists to begin their search for candidate genes is genetics. Classical genetics methods use studies of populations and families and employ statistical methods to identify the most likely genome segments that are known as Quantitative Trait Loci (QTL). The nature of complex traits, however, means that many QTL may be identified for a particular trait. For example, a recent study in Brassica napus identified 47 QTLs which were relevant for seed yield and studies in pig have discovered in total more than 400 QTLs related to fatness. For many years, the study of complex traits in crops and livestock animals has been an important adjunct to their improvement through selective breeding. Until recently, the focus on mapping of QTLs has been based on genetic maps constructed using relatively small numbers (hundreds) of genetic markers separated by quite large genetic distances. By linking the genetic maps with newly obtained genome sequence information it is now possible to list the genes that underlie each QTL. These studies show that typical QTLs in both plants and animals generally encompass quite sizeable parts of the genome - typically several hundred genes. While genetics improves the chances of finding the right gene (or genes), reducing the options down from 22,000 or so found in a typical genome, to hundreds genes for a particular QTL, it is still a daunting and expensive task to evaluate candidate gene in the laboratory. Furthermore, as is becoming apparent in diseases such as cancer, a complex phenotype may be the consequence of groups of seemingly independent genes interacting through a network of different biological relationships. The software we plan to develop in this project builds on previously-funded BBSRC research in which we have developed general methods for integrating different sources of biological information and exploring the relationships among genes and proteins using network-based approaches. Our methods help biologists mine the networks of information and interactions among genes in order to make better-informed judgments about which gene or gene networks are involved in a particular trait. In this project we will further develop the software and adapt our methods to create prototypes of biologist-friendly web sites for four species representing important crop and farm animal species where genetics and QTL data can be combined with other data resources, including the scientific literature. These species have been chosen because of their importance to the BBSRC and national priorities around improving the security of our food and energy (bioenergy) supplies. In particular, we will develop integrated data and network biology resources as web sites for use by the farm animal research community; thus translating the applications of our data integration research into a new area of BBSRC-funded biology. In addition to developing several novel resources for biologists, we wish to demonstrate that the Ondex data integration platform can be adapted to new areas of biological research in a cost-effective manner.

Technical Summary

The Ondex data integration platform uses graph based methods to integrate biological data sources based on concepts of semantic mapping. In a novel development of the Ondex system we will create a client-server application framework based on Web Services and a lightweight Ondex user interface and visualisation framework (QTLNetMiner) that will support biologists mining genetics and genomics data resources for the networks of information and interactions that can be used to select functional candidate genes from QTL studies. We will re-use and substantially adapt components from the Ondex data integration framework developed in a BBSRC SABR project to create integrated knowledge bases constructed from genetics, genome sequence, trait and gene function ontologies, protein family and biochemical pathway and enzyme function information together with Medline abstracts for four sequenced species of agronomic interest. We have selected the tree species Poplar (used as a model for Willow as a bioenergy crop) and three livestock species (Pig, Cow and Chicken) to demonstrate that our data integration methods and software architecture can be adapted to completely new areas of biology in a cost effective way. Our objective will be to create four separate novel biologist-friendly web resources as working prototypes that will enable users to search the crop and livestock genetics datasets using searches based on visualisation of QTL data using a third-party application (GViewer), keyword and regular expression and through inference over the network of relationships in each species-specific knowledge base. The subnetworks retrieved will be displayed using a new and extended web-enabled version of the Ondex client application and this will allow the user to perform visualisation and interactive analysis of the relevant subsection of the knowledgebase to support network-based evaluation of the sets of positional candidate derived from genetics studies.

Planned Impact

The impact plan for this short sTRDF project focuses on promoting the potential benefits to crop and animal pre-breeding and genetics research through using the tools we are developing to exploit integrated bioinformatics data resources and network analysis methods in the translation of knowledge about model species (e.g. Arabidopsis, Human, Mouse) to trait resolution underpinning pre-breeding research in crop and animal species. This project will develop working and useful prototypes that could be extended for the benefit of other crop and livestock species. We are thus creating a proof of principal for a general approach and platform bioinformatics technology that could be deployed more broadly and with the associated increase in impact. We also consider that the resources being created by QTLNetMiner will be a showcase for how data-intensive biology can be exploited for the benefit of plant and animal science. The impact of the project in the crop science community will be addressed through engagement by the Rothamsted group with the commercial and academic partners in the BBSRC Sustainable Bioenergy Centre consortium. The Poplar knowledge base will be used and tested by the Willow geneticists and breeders at Rothamsted. Any potentially valuable candidate genes identified with the willow genetics team would potentially be evaluated for suitable intellectual property protection by Rothamsted or in partnership with PBL Technology. The Rawlings group is also involved with other crop science communities (wheat and brassica) and during this project we will be participating in stakeholder events (e.g. the WGIN and OREGIN genetic improvement networks) where we will look to demonstrate the potential impact of the QTLNetMiner tools and resources. We will be using the working prototypes developed in this project to engage with the livestock and veterinary research communities. The recent incorporation of the Roslin Institute with the Royal (Dick) School of Veterinary Studies provides excellent opportunities to engage with a wider range of animal and veterinary scientists in the private sector. Andy Law will lead this aspect of our impact activities. Roslin scientists have particular sectoral expertise and industry engagement activities, not just with commercial animal breeding companies but with the pharmaceutical sector who use animal models of human diseases in their research programmes. We will work through the Roslin Division of Genetics and Genomics and Business Development Executive to identify where best to focus our outreach activities given the limited resources and duration of this project.

Publications

10 25 50
publication icon
Hassani-Pak, Keywan (2012) QTLNetMiner - Cow in -

publication icon
Hassani-Pak, Keywan (2012) QTLNetMiner - Pig in -

publication icon
Hassani-Pak, Keywan (2012) QTLNetMiner - Chicken in -