GO annotation: maximizing the potential of Drosophila research to benefit human health

Lead Research Organisation: University of Cambridge
Department Name: Physiology Development and Neuroscience

Abstract

How can it be that the tiny, innocuous fruit fly, Drosophila melanogaster, has revolutionised the understanding of genetics, development, growth, aging and disease for the benefit of human health and well-being? The answer is: we are not that different. The fruit fly has many commonalities with us - body parts: limbs, brain, heart, eyes, kidney; it behaves in similar ways: sleeps, walks, remembers, forgets; and perceives the world in the similar manner: sight, smell, taste and touch.
At the level of DNA we can see that many genes in Drosophila and humans are equivalent and many basic physiological, cell biological and behavioural mechanisms are fundamentally the same. We can do many studies in flies that simply would not be possible in more complex organisms - especially in humans, both for practical and ethical reasons. Importantly, human diseases can be 'modelled' in flies - for example by making the same mutations in the equivalent fly genes or by exposing flies to damaging environments. In this way we 'give' Drosophila human diseases - from Alzheimer's disease to kidney stones. Then we can use these flies to discover what causes the disease symptoms and help develop treatments. It seems remarkable, but fruit flies can provide real answers to individuals and families with complex diseases.
All the research using Drosophila generates a tremendous amount of information - more than any researcher can read (over 2,800 research articles/year). This is where biological databases come in: they have dedicated teams of people (curators) who read the published research papers and enter the information into a computer database in a standardized manner. This means that the information can be easily found, rapidly assimilated, compared with other data, and integrated to generate new discoveries.
The aim of our project is to use a standardized scientific vocabulary, called the Gene Ontology (GO), to describe what genes do and where they do it. The importance of having a standardized vocabulary is that it unifies research findings across different organisms that are used for basic and medical research, including flies, mice, zebrafish and yeast. These standardized descriptions make it possible to rapidly understand the function of many genes and compare across organisms. Despite intensive research, we still do not know what a surprisingly large number of genes do - approximately 22% of Drosophila genes and 20% of human genes lack a known function. . Discovering what a Drosophila gene does and recording this with the GO allows us to infer that the equivalent human gene is likely to have a similar function, helping researchers plan experiments to understand the human gene. Thus, continued description of gene function with the GO is a vital activity for research progress.
In addition to annotating newly characterized Drosophila genes, we will undertake focused GO curation on key areas of new discovery and selected areas of medical interest. These include: processes that are disrupted in neurological diseases, such as Parkinson's and motor neurone disease; processes that drive aggressive tumours; and viral infection, with regard to both the pathology of viral infection and viral spread by insects such as mosquitoes. Such focused curation will be conducted with consultation of experts in the field, and will improve the vocabulary of the GO as well as the consistency and accuracy of the annotations.
In summary, this project aims to facilitate the transfer of knowledge gained from research on fruit flies to the medical community, ultimately helping the development of effective treatments of human diseases.

Technical Summary

The aim of this proposal is to employ curation of gene function with the Gene Ontology (GO) to enhance the immediacy and utility of data from Drosophila research papers. GO provides a unifying and controlled vocabulary for describing gene product function across species and databases. We will utilize four parallel approaches. First, we will continue to prioritize research papers describing the functions of previously uncharacterized genes, as this information propagates to all other species with orthologs of that gene. Increased coverage of gene annotation benefits the interpretation of big data studies, such as Genome-Wide Association Studies and single-cell RNAseq. Second, we will further develop the use of GO to link the function of molecules together. Comprehensive curation of ligand-receptor interactions, enzyme-substrate, and transcription factor-target relationships will be captured, thus allowing computational derivation of cellular pathways, as exemplified by the bioinformatics tools that identify signaling and receiving cells from expression of ligands and receptors in single cell RNAseq data. Third, we will apply focused curation and ontology revision to topical biological processes. These include processes often disrupted in neuropathologies, such as mitochondrial processes and localized mRNA translation, and those disrupted in metastatic cancer, such as cell adhesion, cell migration and planar cell polarity. We will also focus on research using Drosophila to study virus-host interactions, given the spread of zika virus and SARS-CoV-2. Fourth, we will continue to develop new tools and pipelines to improve the capture of gene function data with GO and the use of this data in biomedical discovery. As members of the FlyBase, GO and Alliance of Genome Resources consortia we are well-positioned to present these data in intuitive and accessible ways to the wider research community, thus maximizing the impact of Drosophila research on biomedical advances.

Publications

10 25 50