Using GO to enhance the utility of Drosophila data to medical research

Lead Research Organisation: University of Cambridge
Department Name: Gurdon Institute

Abstract

Ten years ago the fruit fly Drosophila melanogaster became one of the first complex animals to have its genome sequenced. This may seem an odd choice - who cares about flies and why didn?t we start with the human genome? In fact, the fruit fly has been a favourite species for genetic research for a century. Not only is it easy to work with and free from many of the ethical concerns associated with using animals in medical research but there are enough similarities between flies and man to make it directly relevant to medical research. Around 75% of the genes known to be associated with human diseases are also found in the fly and in many cases there is evidence that they work in similar ways in both species. Figuring out how something works in a fly can give scientists clues as to how to fix things when the same genes go wrong in human disease. The fly is proving to be a good model for various human diseases such as brain disorders. As a result of its
popularity, scientists publish thousands of research papers on fly genes every year. For scientists working on a specific human disease, it can be a daunting task to find all of published information that may be relevant to their work and make sense of experiments preformed in different species. The aim of this project is to summarise fly research data in a way that makes it easier for all biological researchers to find and make use of it. The work will involve database curators reading peer reviewed research papers and summarising any data about gene function. To do this we will make use of a special dictionary of interconnected scientific terms, the Gene Ontology, that has been developed to succinctly describe what the products of genes do and where in a cell they carry out these functions. The connections between terms makes it easy to search and display biological processes that are related in different species. We will also label genes in the fly
database to indicate that they are related to human genes and/or human diseases - this will make it easier for scientists to find the information they need to inform future research. The work will be carried out by scientists at the University of Cambridge in collaboration with the Drosophila database FlyBase and the Gene Ontology Consortium.

Technical Summary

The aim of this proposal is to improve the accessibility and utility of Drosophila research data to scientists involved in medical research. We will achieve this by improving the breadth and depth of Drosophila Gene Ontology (GO) annotation with emphasis on annotating orthologs of human genes. We will prioritise Drosophila genes relevant to human disease and genes where data about the gene in human is limited. Each gene target will be comprehensively annotated with GO terms based on a manual review of the published literature. In a complementary approach, we will improve consistency of GO annotation by annotating sets of genes with shared functions. We will also improve annotation coverage of all Drosophila genes by incorporation of more high-throughput data sets and introducing new annotation strategies such as proven text mining approaches and community annotation. The more comprehensive GO Annotation will improve the accessibility of data on Drosophila gene function to the wider research community. In collaboration with FlyBase, we will seek to improve ontology based links between Drosophila mutant phenotypes and the defects caused by hereditary diseases, enabling researchers to search for genes related to a particular disease.

Publications

10 25 50