GO annotation: maximizing the potential of Drosophila research to benefit human health
Lead Research Organisation:
University of Cambridge
Department Name: Physiology Development and Neuroscience
Abstract
How can it be that the tiny, innocuous fruit fly, Drosophila melanogaster, has revolutionised the understanding of genetics, development, growth, aging and disease for the benefit of human health and well-being? The answer is: we are not that different. The fruit fly has many commonalities with us - body parts: limbs, brain, heart, eyes, kidney; it behaves in similar ways: sleeps, walks, remembers, forgets; and perceives the world in the similar manner: sight, smell, taste and touch.
At the level of DNA we can see that many genes in Drosophila and humans are equivalent and many basic physiological, cell biological and behavioural mechanisms are fundamentally the same. We can do many studies in flies that simply would not be possible in more complex organisms - especially in humans, both for practical and ethical reasons. Importantly, human diseases can be 'modelled' in flies - for example by making the same mutations in the equivalent fly genes or by exposing flies to damaging environments. In this way we 'give' Drosophila human diseases - from Alzheimer's disease to kidney stones. Then we can use these flies to discover what causes the disease symptoms and help develop treatments. It seems remarkable, but fruit flies can provide real answers to individuals and families with complex diseases.
All the research using Drosophila generates a tremendous amount of information - more than any researcher can read (over 2,800 research articles/year). This is where biological databases come in: they have dedicated teams of people (curators) who read the published research papers and enter the information into a computer database in a standardized manner. This means that the information can be easily found, rapidly assimilated, compared with other data, and integrated to generate new discoveries.
The aim of our project is to use a standardized scientific vocabulary, called the Gene Ontology (GO), to describe what genes do and where they do it. The importance of having a standardized vocabulary is that it unifies research findings across different organisms that are used for basic and medical research, including flies, mice, zebrafish and yeast. These standardized descriptions make it possible to rapidly understand the function of many genes and compare across organisms. Despite intensive research, we still do not know what a surprisingly large number of genes do - approximately 22% of Drosophila genes and 20% of human genes lack a known function. . Discovering what a Drosophila gene does and recording this with the GO allows us to infer that the equivalent human gene is likely to have a similar function, helping researchers plan experiments to understand the human gene. Thus, continued description of gene function with the GO is a vital activity for research progress.
In addition to annotating newly characterized Drosophila genes, we will undertake focused GO curation on key areas of new discovery and selected areas of medical interest. These include: processes that are disrupted in neurological diseases, such as Parkinson's and motor neurone disease; processes that drive aggressive tumours; and viral infection, with regard to both the pathology of viral infection and viral spread by insects such as mosquitoes. Such focused curation will be conducted with consultation of experts in the field, and will improve the vocabulary of the GO as well as the consistency and accuracy of the annotations.
In summary, this project aims to facilitate the transfer of knowledge gained from research on fruit flies to the medical community, ultimately helping the development of effective treatments of human diseases.
At the level of DNA we can see that many genes in Drosophila and humans are equivalent and many basic physiological, cell biological and behavioural mechanisms are fundamentally the same. We can do many studies in flies that simply would not be possible in more complex organisms - especially in humans, both for practical and ethical reasons. Importantly, human diseases can be 'modelled' in flies - for example by making the same mutations in the equivalent fly genes or by exposing flies to damaging environments. In this way we 'give' Drosophila human diseases - from Alzheimer's disease to kidney stones. Then we can use these flies to discover what causes the disease symptoms and help develop treatments. It seems remarkable, but fruit flies can provide real answers to individuals and families with complex diseases.
All the research using Drosophila generates a tremendous amount of information - more than any researcher can read (over 2,800 research articles/year). This is where biological databases come in: they have dedicated teams of people (curators) who read the published research papers and enter the information into a computer database in a standardized manner. This means that the information can be easily found, rapidly assimilated, compared with other data, and integrated to generate new discoveries.
The aim of our project is to use a standardized scientific vocabulary, called the Gene Ontology (GO), to describe what genes do and where they do it. The importance of having a standardized vocabulary is that it unifies research findings across different organisms that are used for basic and medical research, including flies, mice, zebrafish and yeast. These standardized descriptions make it possible to rapidly understand the function of many genes and compare across organisms. Despite intensive research, we still do not know what a surprisingly large number of genes do - approximately 22% of Drosophila genes and 20% of human genes lack a known function. . Discovering what a Drosophila gene does and recording this with the GO allows us to infer that the equivalent human gene is likely to have a similar function, helping researchers plan experiments to understand the human gene. Thus, continued description of gene function with the GO is a vital activity for research progress.
In addition to annotating newly characterized Drosophila genes, we will undertake focused GO curation on key areas of new discovery and selected areas of medical interest. These include: processes that are disrupted in neurological diseases, such as Parkinson's and motor neurone disease; processes that drive aggressive tumours; and viral infection, with regard to both the pathology of viral infection and viral spread by insects such as mosquitoes. Such focused curation will be conducted with consultation of experts in the field, and will improve the vocabulary of the GO as well as the consistency and accuracy of the annotations.
In summary, this project aims to facilitate the transfer of knowledge gained from research on fruit flies to the medical community, ultimately helping the development of effective treatments of human diseases.
Technical Summary
The aim of this proposal is to employ curation of gene function with the Gene Ontology (GO) to enhance the immediacy and utility of data from Drosophila research papers. GO provides a unifying and controlled vocabulary for describing gene product function across species and databases. We will utilize four parallel approaches. First, we will continue to prioritize research papers describing the functions of previously uncharacterized genes, as this information propagates to all other species with orthologs of that gene. Increased coverage of gene annotation benefits the interpretation of big data studies, such as Genome-Wide Association Studies and single-cell RNAseq. Second, we will further develop the use of GO to link the function of molecules together. Comprehensive curation of ligand-receptor interactions, enzyme-substrate, and transcription factor-target relationships will be captured, thus allowing computational derivation of cellular pathways, as exemplified by the bioinformatics tools that identify signaling and receiving cells from expression of ligands and receptors in single cell RNAseq data. Third, we will apply focused curation and ontology revision to topical biological processes. These include processes often disrupted in neuropathologies, such as mitochondrial processes and localized mRNA translation, and those disrupted in metastatic cancer, such as cell adhesion, cell migration and planar cell polarity. We will also focus on research using Drosophila to study virus-host interactions, given the spread of zika virus and SARS-CoV-2. Fourth, we will continue to develop new tools and pipelines to improve the capture of gene function data with GO and the use of this data in biomedical discovery. As members of the FlyBase, GO and Alliance of Genome Resources consortia we are well-positioned to present these data in intuitive and accessible ways to the wider research community, thus maximizing the impact of Drosophila research on biomedical advances.
Organisations
- University of Cambridge (Lead Research Organisation)
- EMBL European Bioinformatics Institute (EMBL - EBI) (Collaboration)
- HARVARD UNIVERSITY (Collaboration)
- RNAcentral (Collaboration)
- HUGO Gene Nomenclature Committee (Collaboration)
- FlyBase Consortium (Collaboration)
- National Institutes of Health (NIH) (Collaboration)
- Gene Ontology Consortium (Collaboration)
Publications
Attrill H
(2023)
Comparing the history of signalling pathway research using the research publication record of representative genes.
in microPublication biology
Attrill H
(2023)
A new experimental evidence-weighted signaling pathway resource in FlyBase.
in bioRxiv : the preprint server for biology
Attrill H
(2024)
A new experimental evidence-weighted signaling pathway resource in FlyBase.
in Development (Cambridge, England)
Gene Ontology Consortium
(2023)
The Gene Ontology knowledgebase in 2023.
in Genetics
Hu Y
(2023)
PANGEA: a new gene set enrichment tool for Drosophila and common research organisms.
in Nucleic acids research
Marygold S
(2023)
Exploring FlyBase Data Using QuickSearch
in Current Protocols
Marygold S
(2023)
Exploring FlyBase Data Using QuickSearch.
Öztürk-Çolak A
(2024)
FlyBase: updates to the Drosophila genes and genomes database
in GENETICS
Title | GO annotation data 2022-08-01 - 2024-03-01 |
Description | GO annotations made by MRC-funded curators in GOA database. These are available to users of QuickGO, UniProt, FlyBase (D.mel only) and GOC sites and are used by multiple other websites and resources. Models and annotations created in the Noctua annotation tool are also reported. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | 6201 annotations were created or updated. Of these 257 were to human gene products. 534 annotations were to non-RNAs. 3484 annotations were removed to improve data quality. 4406 annotations were added manual to the FlyBase D.mel GO annotation set from 1006 research publications. 3872 made by FlyBase curators (from release last load date 2024-01022, FB_2024_01). 116 annotations were created using the GO Consortium (GOC) Noctua annotation tool from the creation of 19 GO-Causal Activity Models. |
Title | PAthway, Network and Gene-set Enrichment Analysis (PANGEA) |
Description | With the Drosophila RNAi Screening Center (DRSC) Drosophila Research & Screening Center-Biomedical Technology Research Resource, we have developed a new a gene set enrichment analysis (GSEA) and classification tool for the statistical analysis of genes classes from experimental data. This tool incorporates various annotation data: GO, phenotypes, expression patterns, disease involvement, pathway membership and protein complex. It includes data from FlyBase, the Alliance of Genome Resources and ComplexPortal, which are not available in other GSEA tools. Alongside Drosophila, human and other major model organism data is included. |
Type Of Material | Data analysis technique |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | The tool (by citation of the paper in PubMed) has been used to analyse data in 6 papers as of March 2024. |
URL | https://www.flyrnai.org/tools/pangea/ |
Description | Alliance of Genome Resources |
Organisation | National Institutes of Health (NIH) |
Department | National Human Genome Research Institute (NHGRI) |
Country | United States |
Sector | Public |
PI Contribution | FlyBase is one of the founding members of the Alliance of Genome Resources (Alliance). Helen Attrill has contributed to the Expression working group as part of aligning FlyBase expression data and Alliance ribbon summarization displays and to the Biological Function working group, dealing with the display of GO data on Alliance gene pages and cross-species comparison of GO data. Giulia Antonazzo also contributes to the Biological Function working group, attending monthly calls. Helen Attrill and Giulia Antonazzo are currently working with the Alliance Pathway working group to harmonize and display pathway data on Alliance pages. |
Collaborator Contribution | The Alliance has contributes from major model organism databases:FlyBase, Mouse Genome Database (MGD), Saccharomyces Genome Database (SGD), Rat Genome Database (RGD), WormBase, and the Zebrafish Information Network (ZFIN), and the Gene Ontology Consortium (GOC). Each member database contributes to the work on supplying, harmonizing and displaying cross-species data to compare with human data and facilitate translational research. |
Impact | The principal output is the Alliance website: https://www.alliancegenome.org/ and cross-database collaboration on data harmonization and sharing of resources and infrastructure. PMID:31552413 |
Start Year | 2017 |
Description | Collaboration with Complex Portal |
Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Curated protein complexes and supplied Complex Portal, EMBL-EBI with data to populate entries. |
Collaborator Contribution | QC/QA and addition of data to Complex Portal database. |
Impact | Fly complexes will be searchable at both flybase.org and https://www.ebi.ac.uk/complexportal/home |
Start Year | 2021 |
Description | DRSC/TRiP Screening Center-Biomedical Technology Research Resource |
Organisation | Harvard University |
Department | Harvard Medical School |
Country | United States |
Sector | Academic/University |
PI Contribution | We have collaborated with the Drosophila RNAi Screening Center (DRSC), Transgenic RNAi Project (TRiP) and Drosophila Research & Screening Center-Biomedical Technology Research Resource (DRSC-BTRR) to help them in the development of bioinformatics tools by providing data and feedback. This includes a gene set enrichment tool and a single cell RNA sequence data analysis tool. |
Collaborator Contribution | The Screening Center-Biomedical Technology Research Resource develop the tools and integrate data. |
Impact | DRSC/TRiP-FGR tools can be found at https://fgr.hms.harvard.edu/tools Publications: Gene2Function: An Integrated Online Resource for Gene Function Discovery. PMID:28663344 FlyPhoneDB: an integrated web-based resource for cell-cell communication prediction in Drosophila. PMID:35100387 |
Start Year | 2018 |
Description | FlyBase Consortium membership |
Organisation | FlyBase Consortium |
Country | Global |
Sector | Academic/University |
PI Contribution | We make GO annotations to D.mel genes which are housed in FlyBase and ensure that the Gene Ontology is updated for each release of FlyBase and revise the GO annotations in line with ontology changes. We request new ontology terms from GOC as required by FlyBase curators. We train FlyBase curators to make functional annotations using the GO. As part of the GO consortium, we keep FlyBase updated on changes to GO annotation policy and work with FlyBase curators and developers to implement these changes. We attend FlyBase consortium meetings and lobby for changes to FlyBase that will make Drosophila research more accessible to researchers studying human genes or disease and assist in implementing these changes. We answer GO related-queries from FlyBase users. We are have worked with FlyBase to develop a text-mining approach to triage papers for GO and disease model annotation. We have worked with FlyBase to develop visual summaries of data including expression, GO and signalling pathways data. We curate gene groups (gene families and macromolecular complexes) into FlyBase. We add and keep author-submitted gene snapshots to FlyBase. |
Collaborator Contribution | They maintain the FlyBase database and associated website where our GO annotations, pathways and gene groups are stored and displayed. They provide developer support for implementing new data types associated with GO annotation, disease model curation and data visualization. FlyBase literature curators make GO annotations that supplement those made by the MRC funded GO curator. |
Impact | PMID:22127867 PMID:22554788 PMID:23125371 PMID:23160412 PMID:24234449 PMID:24715220 PMID:25398896 PMID:26109356 PMID:26109357 PMID:26467478 PMID:26935103 PMID:27494710 PMID:27730573 PMID:27799470 PMID:27930807 PMID:28663344 PMID:29761468 PMID:30364959 PMID:31933406 PMID:31960022 PMID:33219682 |
Start Year | 2006 |
Description | Gene Ontology (GO) consortium membership |
Organisation | Gene Ontology Consortium |
Country | Global |
Sector | Charity/Non Profit |
PI Contribution | We make GO annotations and are the responsible for collating all Drosophila melanogaster annotation and submitting them to GOC. We contribute to the development of the Gene Ontology (request new terms, participate in specialist term development workshops, report errors). We attend GO consortium project meetings, workshops and regular conference calls where we contribute to discussion on all aspect of the Gene Ontology project particularly decisions related to annotation policy and quality control. |
Collaborator Contribution | The GO consortium load our Drosophila GO annotation set into their database (along with annotations from other species) and make it available for searching and download via their website. They provide us with quality control reports and suggest Drosophila additional annotations (based on phylogenetic analysis and inferences from relationships between terms in the ontology). They provide editorial assistance to change the Gene Ontology in response to our requests for new terms or error fixes. |
Impact | PMID:22102568 PMID:23161678 PMID:25428369 PMID:27899567 PMID:30395331 PMID:30715275 PMID:33290552 |
Start Year | 2006 |
Description | HUGO gene nomenclature committee (HGNC) |
Organisation | HUGO Gene Nomenclature Committee |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | A yearly meeting with HGNC to compare strategies for collating, presenting and aligning human and fly gene lists (Gene Groups). Where possible, we make links to Gene Groups at HGNC Gene Groups. FlyBase supplies a correspondance file for HGNC so that they can add reciprocal links to FlyBase from their pages. There are now 464 links between FlyBase Gene Groups and equivalent human sets at HGNC. |
Collaborator Contribution | A yearly meeting to compare strategies for collating, presenting and aligning human and fly gene groups. HGNC add reciprocal gene group links to FlyBase from their pages. |
Impact | This facilitates comparison between protein complexes and functional classes (such as glycoside hydrolases) between D.mel and human Gene Groups. |
Start Year | 2016 |
Description | RNAcentral collaboration |
Organisation | RNAcentral |
Sector | Public |
PI Contribution | FlyBase became on of the Expert Databases that contribute data to RNAcentral. FlyBase has made its GO annotations to ncRNA for D.melanogaster available to RNAcentral via the GOA database. |
Collaborator Contribution | RNAcentral has worked on QC/QA and establishing a pipeline/links to FlyBase and has imported GO annotations to ncRNA made by FlyBase. |
Impact | GO annotations to D.melanogaster ncRNAs have been made available to users of RNAcentral and QuickGO users. Update of Sequence Ontology in FlyBase to provide more descriptive labeling of ncRNA classes and better alignment with external resources, including RNAcentral. PMID:33106848 PMID:30395267 |
Start Year | 2017 |
Description | UniProtKB, Gene Ontology Annotation (GOA), InterPro collaboration |
Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
Department | Protein Sequences Resources |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We provide GO annotations and mappings between FlyBase genes and UniProt proteins. We incorporate GO annotations for Drosophila made by UniProt into FlyBase and display them on our webisite. We provide feedback to the InterPro group about their mappings between protein domains and GO terms. UniProt display our GO annotations on their website. We add GO annotations for non-Drosophilid species, including human, to directly to the GOA database. We have collaborated on a review of Drosophila RNA polymerases with curators at UniProtKB. |
Collaborator Contribution | UniProtKB display our GO annotations on their website and make Drosophila GO annotations that we display on our website. They assign InterPro domains to UniProtKB entries and maintain a mapping between InterPro domains and GO terms; this information is used to infer automatic GO annotations in FlyBase. We have used the EBI GOA database and curation interface, Protein2GO, for GO curation since 2017. We have collaborated on a review of Drosophila RNA polymerases with curators at UniProtKB. |
Impact | GO annotations based on InterPro domains were updated with each release of FlyBase. In Oct 2013 there were 12,227 such annotations for Drosophila melanogaster. PMID:31933406 |
Start Year | 2006 |