GO annotation: maximizing the potential of Drosophila research to benefit human health

Lead Research Organisation: University of Cambridge
Department Name: Physiology Development and Neuroscience

Abstract

How can it be that the tiny, innocuous fruit fly, Drosophila melanogaster, has revolutionised the understanding of genetics, development, growth, aging and disease for the benefit of human health and well-being? The answer is: we are not that different. The fruit fly has many commonalities with us - body parts: limbs, brain, heart, eyes, kidney; it behaves in similar ways: sleeps, walks, remembers, forgets; and perceives the world in the similar manner: sight, smell, taste and touch.
At the level of DNA we can see that many genes in Drosophila and humans are equivalent and many basic physiological, cell biological and behavioural mechanisms are fundamentally the same. We can do many studies in flies that simply would not be possible in more complex organisms - especially in humans, both for practical and ethical reasons. Importantly, human diseases can be 'modelled' in flies - for example by making the same mutations in the equivalent fly genes or by exposing flies to damaging environments. In this way we 'give' Drosophila human diseases - from Alzheimer's disease to kidney stones. Then we can use these flies to discover what causes the disease symptoms and help develop treatments. It seems remarkable, but fruit flies can provide real answers to individuals and families with complex diseases.
All the research using Drosophila generates a tremendous amount of information - more than any researcher can read (over 2,800 research articles/year). This is where biological databases come in: they have dedicated teams of people (curators) who read the published research papers and enter the information into a computer database in a standardized manner. This means that the information can be easily found, rapidly assimilated, compared with other data, and integrated to generate new discoveries.
The aim of our project is to use a standardized scientific vocabulary, called the Gene Ontology (GO), to describe what genes do and where they do it. The importance of having a standardized vocabulary is that it unifies research findings across different organisms that are used for basic and medical research, including flies, mice, zebrafish and yeast. These standardized descriptions make it possible to rapidly understand the function of many genes and compare across organisms. Despite intensive research, we still do not know what a surprisingly large number of genes do - approximately 22% of Drosophila genes and 20% of human genes lack a known function. . Discovering what a Drosophila gene does and recording this with the GO allows us to infer that the equivalent human gene is likely to have a similar function, helping researchers plan experiments to understand the human gene. Thus, continued description of gene function with the GO is a vital activity for research progress.
In addition to annotating newly characterized Drosophila genes, we will undertake focused GO curation on key areas of new discovery and selected areas of medical interest. These include: processes that are disrupted in neurological diseases, such as Parkinson's and motor neurone disease; processes that drive aggressive tumours; and viral infection, with regard to both the pathology of viral infection and viral spread by insects such as mosquitoes. Such focused curation will be conducted with consultation of experts in the field, and will improve the vocabulary of the GO as well as the consistency and accuracy of the annotations.
In summary, this project aims to facilitate the transfer of knowledge gained from research on fruit flies to the medical community, ultimately helping the development of effective treatments of human diseases.

Technical Summary

The aim of this proposal is to employ curation of gene function with the Gene Ontology (GO) to enhance the immediacy and utility of data from Drosophila research papers. GO provides a unifying and controlled vocabulary for describing gene product function across species and databases. We will utilize four parallel approaches. First, we will continue to prioritize research papers describing the functions of previously uncharacterized genes, as this information propagates to all other species with orthologs of that gene. Increased coverage of gene annotation benefits the interpretation of big data studies, such as Genome-Wide Association Studies and single-cell RNAseq. Second, we will further develop the use of GO to link the function of molecules together. Comprehensive curation of ligand-receptor interactions, enzyme-substrate, and transcription factor-target relationships will be captured, thus allowing computational derivation of cellular pathways, as exemplified by the bioinformatics tools that identify signaling and receiving cells from expression of ligands and receptors in single cell RNAseq data. Third, we will apply focused curation and ontology revision to topical biological processes. These include processes often disrupted in neuropathologies, such as mitochondrial processes and localized mRNA translation, and those disrupted in metastatic cancer, such as cell adhesion, cell migration and planar cell polarity. We will also focus on research using Drosophila to study virus-host interactions, given the spread of zika virus and SARS-CoV-2. Fourth, we will continue to develop new tools and pipelines to improve the capture of gene function data with GO and the use of this data in biomedical discovery. As members of the FlyBase, GO and Alliance of Genome Resources consortia we are well-positioned to present these data in intuitive and accessible ways to the wider research community, thus maximizing the impact of Drosophila research on biomedical advances.

Publications

10 25 50
 
Title GO annotation data 2022-08-01 - 2024-03-01 
Description GO annotations made by MRC-funded curators in GOA database. These are available to users of QuickGO, UniProt, FlyBase (D.mel only) and GOC sites and are used by multiple other websites and resources. Models and annotations created in the Noctua annotation tool are also reported. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact 6201 annotations were created or updated. Of these 257 were to human gene products. 534 annotations were to non-RNAs. 3484 annotations were removed to improve data quality. 4406 annotations were added manual to the FlyBase D.mel GO annotation set from 1006 research publications. 3872 made by FlyBase curators (from release last load date 2024-01022, FB_2024_01). 116 annotations were created using the GO Consortium (GOC) Noctua annotation tool from the creation of 19 GO-Causal Activity Models. 
 
Title PAthway, Network and Gene-set Enrichment Analysis (PANGEA) 
Description With the Drosophila RNAi Screening Center (DRSC) Drosophila Research & Screening Center-Biomedical Technology Research Resource, we have developed a new a gene set enrichment analysis (GSEA) and classification tool for the statistical analysis of genes classes from experimental data. This tool incorporates various annotation data: GO, phenotypes, expression patterns, disease involvement, pathway membership and protein complex. It includes data from FlyBase, the Alliance of Genome Resources and ComplexPortal, which are not available in other GSEA tools. Alongside Drosophila, human and other major model organism data is included. 
Type Of Material Data analysis technique 
Year Produced 2023 
Provided To Others? Yes  
Impact The tool (by citation of the paper in PubMed) has been used to analyse data in 6 papers as of March 2024. 
URL https://www.flyrnai.org/tools/pangea/
 
Description Alliance of Genome Resources 
Organisation National Institutes of Health (NIH)
Department National Human Genome Research Institute (NHGRI)
Country United States 
Sector Public 
PI Contribution FlyBase is one of the founding members of the Alliance of Genome Resources (Alliance). Helen Attrill has contributed to the Expression working group as part of aligning FlyBase expression data and Alliance ribbon summarization displays and to the Biological Function working group, dealing with the display of GO data on Alliance gene pages and cross-species comparison of GO data. Giulia Antonazzo also contributes to the Biological Function working group, attending monthly calls. Helen Attrill and Giulia Antonazzo are currently working with the Alliance Pathway working group to harmonize and display pathway data on Alliance pages.
Collaborator Contribution The Alliance has contributes from major model organism databases:FlyBase, Mouse Genome Database (MGD), Saccharomyces Genome Database (SGD), Rat Genome Database (RGD), WormBase, and the Zebrafish Information Network (ZFIN), and the Gene Ontology Consortium (GOC). Each member database contributes to the work on supplying, harmonizing and displaying cross-species data to compare with human data and facilitate translational research.
Impact The principal output is the Alliance website: https://www.alliancegenome.org/ and cross-database collaboration on data harmonization and sharing of resources and infrastructure. PMID:31552413
Start Year 2017
 
Description Collaboration with Complex Portal 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Curated protein complexes and supplied Complex Portal, EMBL-EBI with data to populate entries.
Collaborator Contribution QC/QA and addition of data to Complex Portal database.
Impact Fly complexes will be searchable at both flybase.org and https://www.ebi.ac.uk/complexportal/home
Start Year 2021
 
Description DRSC/TRiP Screening Center-Biomedical Technology Research Resource 
Organisation Harvard University
Department Harvard Medical School
Country United States 
Sector Academic/University 
PI Contribution We have collaborated with the Drosophila RNAi Screening Center (DRSC), Transgenic RNAi Project (TRiP) and Drosophila Research & Screening Center-Biomedical Technology Research Resource (DRSC-BTRR) to help them in the development of bioinformatics tools by providing data and feedback. This includes a gene set enrichment tool and a single cell RNA sequence data analysis tool.
Collaborator Contribution The Screening Center-Biomedical Technology Research Resource develop the tools and integrate data.
Impact DRSC/TRiP-FGR tools can be found at https://fgr.hms.harvard.edu/tools Publications: Gene2Function: An Integrated Online Resource for Gene Function Discovery. PMID:28663344 FlyPhoneDB: an integrated web-based resource for cell-cell communication prediction in Drosophila. PMID:35100387
Start Year 2018
 
Description FlyBase Consortium membership 
Organisation FlyBase Consortium
Country Global 
Sector Academic/University 
PI Contribution We make GO annotations to D.mel genes which are housed in FlyBase and ensure that the Gene Ontology is updated for each release of FlyBase and revise the GO annotations in line with ontology changes. We request new ontology terms from GOC as required by FlyBase curators. We train FlyBase curators to make functional annotations using the GO. As part of the GO consortium, we keep FlyBase updated on changes to GO annotation policy and work with FlyBase curators and developers to implement these changes. We attend FlyBase consortium meetings and lobby for changes to FlyBase that will make Drosophila research more accessible to researchers studying human genes or disease and assist in implementing these changes. We answer GO related-queries from FlyBase users. We are have worked with FlyBase to develop a text-mining approach to triage papers for GO and disease model annotation. We have worked with FlyBase to develop visual summaries of data including expression, GO and signalling pathways data. We curate gene groups (gene families and macromolecular complexes) into FlyBase. We add and keep author-submitted gene snapshots to FlyBase.
Collaborator Contribution They maintain the FlyBase database and associated website where our GO annotations, pathways and gene groups are stored and displayed. They provide developer support for implementing new data types associated with GO annotation, disease model curation and data visualization. FlyBase literature curators make GO annotations that supplement those made by the MRC funded GO curator.
Impact PMID:22127867 PMID:22554788 PMID:23125371 PMID:23160412 PMID:24234449 PMID:24715220 PMID:25398896 PMID:26109356 PMID:26109357 PMID:26467478 PMID:26935103 PMID:27494710 PMID:27730573 PMID:27799470 PMID:27930807 PMID:28663344 PMID:29761468 PMID:30364959 PMID:31933406 PMID:31960022 PMID:33219682
Start Year 2006
 
Description Gene Ontology (GO) consortium membership 
Organisation Gene Ontology Consortium
Country Global 
Sector Charity/Non Profit 
PI Contribution We make GO annotations and are the responsible for collating all Drosophila melanogaster annotation and submitting them to GOC. We contribute to the development of the Gene Ontology (request new terms, participate in specialist term development workshops, report errors). We attend GO consortium project meetings, workshops and regular conference calls where we contribute to discussion on all aspect of the Gene Ontology project particularly decisions related to annotation policy and quality control.
Collaborator Contribution The GO consortium load our Drosophila GO annotation set into their database (along with annotations from other species) and make it available for searching and download via their website. They provide us with quality control reports and suggest Drosophila additional annotations (based on phylogenetic analysis and inferences from relationships between terms in the ontology). They provide editorial assistance to change the Gene Ontology in response to our requests for new terms or error fixes.
Impact PMID:22102568 PMID:23161678 PMID:25428369 PMID:27899567 PMID:30395331 PMID:30715275 PMID:33290552
Start Year 2006
 
Description HUGO gene nomenclature committee (HGNC) 
Organisation HUGO Gene Nomenclature Committee
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution A yearly meeting with HGNC to compare strategies for collating, presenting and aligning human and fly gene lists (Gene Groups). Where possible, we make links to Gene Groups at HGNC Gene Groups. FlyBase supplies a correspondance file for HGNC so that they can add reciprocal links to FlyBase from their pages. There are now 464 links between FlyBase Gene Groups and equivalent human sets at HGNC.
Collaborator Contribution A yearly meeting to compare strategies for collating, presenting and aligning human and fly gene groups. HGNC add reciprocal gene group links to FlyBase from their pages.
Impact This facilitates comparison between protein complexes and functional classes (such as glycoside hydrolases) between D.mel and human Gene Groups.
Start Year 2016
 
Description RNAcentral collaboration 
Organisation RNAcentral
Sector Public 
PI Contribution FlyBase became on of the Expert Databases that contribute data to RNAcentral. FlyBase has made its GO annotations to ncRNA for D.melanogaster available to RNAcentral via the GOA database.
Collaborator Contribution RNAcentral has worked on QC/QA and establishing a pipeline/links to FlyBase and has imported GO annotations to ncRNA made by FlyBase.
Impact GO annotations to D.melanogaster ncRNAs have been made available to users of RNAcentral and QuickGO users. Update of Sequence Ontology in FlyBase to provide more descriptive labeling of ncRNA classes and better alignment with external resources, including RNAcentral. PMID:33106848 PMID:30395267
Start Year 2017
 
Description UniProtKB, Gene Ontology Annotation (GOA), InterPro collaboration 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Department Protein Sequences Resources
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide GO annotations and mappings between FlyBase genes and UniProt proteins. We incorporate GO annotations for Drosophila made by UniProt into FlyBase and display them on our webisite. We provide feedback to the InterPro group about their mappings between protein domains and GO terms. UniProt display our GO annotations on their website. We add GO annotations for non-Drosophilid species, including human, to directly to the GOA database. We have collaborated on a review of Drosophila RNA polymerases with curators at UniProtKB.
Collaborator Contribution UniProtKB display our GO annotations on their website and make Drosophila GO annotations that we display on our website. They assign InterPro domains to UniProtKB entries and maintain a mapping between InterPro domains and GO terms; this information is used to infer automatic GO annotations in FlyBase. We have used the EBI GOA database and curation interface, Protein2GO, for GO curation since 2017. We have collaborated on a review of Drosophila RNA polymerases with curators at UniProtKB.
Impact GO annotations based on InterPro domains were updated with each release of FlyBase. In Oct 2013 there were 12,227 such annotations for Drosophila melanogaster. PMID:31933406
Start Year 2006