GeneFriends: An RNA-seq co-expression tool for functional annotation and candidate gene prioritization

Lead Research Organisation: University of Liverpool
Department Name: Institute of Integrative Biology

Abstract

Over the past decade, and in part thanks to the sequencing of the human genome, research in the field of genetics has rapidly expanded leading to the identification of many genes associated with multiple human diseases and traits. Even though science has gained a broader understanding of most human diseases, many unanswered questions remain and the genetic basis of the majority of human common diseases and most human traits is only partly understood.

This project aims to construct a new online tool to help researchers infer the functions of unknown genes as well as identify candidate new players for roles in diseases and in biological processes. This tool will be based on data obtained from the state-of-the-art RNA sequencing technology (RNA-seq). This technology allows researchers to measure the activity of genes in a given sample with greater accuracy than previous methods and is capable of measuring genes that to not encode proteins. A growing number of researchers currently employ RNA-seq, yet many genes associated with diseases and processes have unknown functions, which impedes understanding the mechanisms involved. To address this issue, our proposed tool, entitled GeneFriends, employs a guilt-by-association methodology to the analysis of genes. This approach identifies which genes tend to be activated simultaneously across multiple samples. Based on the idea that genes that are activated at the same time are functionally related, it is possible to predict the functions of previously unstudied genes based on the functions of the known genes they are co-activated with. To determine how genes tend to be co-activated, thousands of publicly available samples of previously measured gene levels using RNA-seq will be combined. This way a map will be generated that describes which genes generally tend to act together.

GeneFriends will allow scientists and clinicians to infer the function of unknown genes. Additionally, GeneFriends can be used to associate new genes with diseases and processes based on the fact that they were active at the same time as other genes previously identified as important for that disease or process. The corresponding tool created in this project will then allow researchers to relate new factors to diseases, identify candidate drug targets, potentially leading to new types of diagnosis and treatments. GeneFriends will be initially created for humans and mice and will be made freely available online for researchers, clinicians and commercial companies to use.

Technical Summary

Sequencing the transcriptome (RNA-seq) is a powerful and emerging technology that allows researchers to measure differential expression of genes more accurately than with microarrays. One major advantage of RNA-seq is that it allows the measurement of different splice variants as well as non-coding RNAs (ncRNAs), which can play important biological roles and be involved in various diseases. However, a bottleneck in RNA-seq analyses is that even though many transcripts can be found differentially expressed, often most have not been well studied and it is often unclear which possible functions, for example, ncRNAs may have. The complexity of the already challenging analysis of RNA-seq data would be greatly reduced by the availability of information on the putative functions of all these transcripts.

Co-expression tools created from microarray data have successfully allowed researchers to assign putative functions to poorly annotated genes and identify candidate genes related to various diseases and biological processes using a guilt-by-association approach. In this project, we will create the first RNA-seq-based co-expression tool, focused on mice and humans. It will be entitled GeneFriends and it will allow researchers to assign putative functions to new and unstudied transcripts, such as ncRNAs and splice variants. Employing a guilt-by-association approach, GeneFriends will also allow researchers to identify new candidates for a role in a given disease or process. Given the growing importance of next-generation sequencing, GeneFriends will greatly benefit the research community. Overall, GeneFriends will be a new approach for the analysis and interpretation of genetics and genomics data from different types of studies. The resulting RNA-seq based co-expression tool will be and made freely available online for everyone to use.

Planned Impact

Although many diseases have been well characterized at the molecular level, the underlying mechanisms are often unknown. The tool created in this project will allow researchers to relate previously unstudied transcripts to functions, diseases and biological processes, allowing candidate genes to be identified and testable hypothesis to be generated. It will also help infer the function of many unstudied transcripts.

This project will primarily benefit a wide range of researchers, as described in the Academic Beneficiaries section. Given the growing number of high-throughput technologies, including next-generation sequencing, employed in clinical genetic testing, inferring putative functions and interaction partners of poorly studied genes will have applications in the clinic. Numerous companies may also benefit from GeneFriends. Companies focused on personalized medicine services may benefit from having an online tool to infer function(s) of unknown genes. Besides, numerous pharmaceutical companies are focused on identifying novel targets for drug development. By providing and prioritizing candidates for further studies, GeneFriends may have applications in industry.
 
Description In this project, we have expanded GeneFriends, an online database that allows users to identify co-expressed genes with one or more user-defined genes. This expansion entails an RNA-seq-based co-expression map that includes genes and transcripts that are not present in the microarray-based co-expression maps, including over 10,000 non-coding RNAs. The results users obtain from GeneFriends include a co-expression network as well as a summary of the functional enrichment among the co-expressed genes. Novel insights can be gathered from this database for different splice variants and ncRNAs, such as microRNAs and lincRNAs. Furthermore, our updated tool allows candidate transcripts to be linked to diseases and processes using a guilt-by-association approach. GeneFriends is freely available from http://www.GeneFriends.org and can be used to quickly identify and rank candidate targets relevant to the process or disease under study.

A paper describing GeneFriends is currently in press:

van Dam S, Craig T, de Magalhães JP (in press) "GeneFriends: A human RNAseq-based gene and transcript co-expression database". Nucleic Acids Research.
Exploitation Route Beneficiaries in academia include researchers employing RNA-seq to study diseases or processes in mice and humans, researchers performing genome-wide association studies and researchers studying transcriptional regulation and non-coding RNAs.

Industry beneficiaries include companies focused on personalized medicine. Because GeneFriends can help identify new candidate transcripts involved in disease, it will also be relevant the pharmaceutical companies.

Beneficiaries in the clinic include clinicians performing genetic testing since identifying putative functions of poorly studied genes can help in diagnosis.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://genefriends.org/
 
Description This ongoing project has thus far not had any measurable impact.