Making connections with GO: an integrative approach to highlighting medically relevant Drosophila data

Lead Research Organisation: University of Cambridge
Department Name: Physiology Development and Neuroscience

Abstract

At first glance fruit flies and humans seem very different. But if we look closer we see that they share certain features: limbs, a brain, a heart and much more besides. If we look even more closely we see that this is because we share genes in our DNA. We can use these similarities to study human disease because three quarters of genes associated with human disease are found in fruit flies too. Disrupting genes in fruit flies can tell us how they normally work in humans, how they cause diseases when they are not working well and give us clues about how to we might be able to fix problems caused by the faulty genes. Fruit flies have many advantages for studying disease. For example, we can do many different experiments with flies very quickly, and they get old within weeks, so they are really useful for studying neurodegenerative conditions such as Alzheimer's disease.
Extensive research is carried out using fruit flies by scientists around the world, generating a huge amount of data. It is important that scientists are able to easily and quickly access these data so that they can build on existing knowledge to devise new experiments in their own research. This is where biological databases come in: they have dedicated teams of people (curators) who read the published research papers and enter the information into a computer database. Curators do this by linking scientific terms, taken from special dictionaries, to biological objects such as genes. The aim of our project is to use one such scientific dictionary, called the Gene Ontology, to attach labels to genes that describe what they do and where they do it. We will focus on those fruit fly genes that are equivalent to human genes that are implicated in diseases. This work will be done in the context of gene networks - knowing how genes function together gives a better understanding of how they contribute to our day to day life, and how their disruption leads to disease. Altogether, this project will facilitate the transfer of knowledge gained in fruit flies to the medical community, ultimately helping the development of effective treatments of human diseases.

Technical Summary

The aim of this research proposal is to enhance the utility and accessibility of data from Drosophila to the biomedical research community. Our approach will center on the Gene Ontology (GO), a unifying vocabulary for describing the attributes of gene products across species and databases. We will focus our efforts on improving the annotation of pathways and protein complexes relevant to human disease. In parallel, we shall undertake a comprehensive annotation of non-coding RNAs - an important emerging area in human health. Whilst achieving these aims, we will work with the GO consortium to develop the ontology itself and adopt their enhanced annotation framework. Furthermore, in collaboration with FlyBase, we shall present pathways and complex components as easily accessible lists that are highly integrated with all other data in FlyBase. Subsequently, we will enhance these pages by adding graphical models of pathways that will highlight intersections with human disease gene orthologs and those genes studied in Drosophila disease models. Overall, our approach will improve the functional annotation of Drosophila genes related to human disease, and present these data in intuitive and accessible ways to the wider research community to maximise the impact of fly research on biomedical advances.

Planned Impact

The proposed research will also have beneficiaries in non-academic sectors and the general public in the UK and abroad. There will be an economic benefit to the pharmaceutical industry. The number of new drugs coming on to the market has declined significantly and traditional in vitro high-throughput screening techniques are failing to yield new therapies. As we gain more understanding of human disease, it is clear that most diseases are multi-factorial; the hopes of 'the-one gene-one-drug' or 'magic bullet' model will not be realized in the majority of cases. The data we collate can be used to inform basic pathway biology in humans. As the data from Next-Generation Sequencing of human genomes enters the clinical arena, we will need good functional annotations to interpret the masses of data to aid the hunt for new therapeutic targets. Drosophila can be used at all stages of the drug-discovery process, from modeling the disease (e.g. Alzheimer's disease, Parkinson's disease, cancers) to an in vivo model for high-throughput screening. Therefore linking gene function data to Drosophila disease models and human disease genes is of great value to such research strategies. Ultimately, this will help bring new drugs to the market and improve the health of the population. With an ageing population there is increased financial pressure on the health service, charities, local government and ultimately the UK taxpayer. The benefits of Drosophila as a model for neurodegenerative and cardiovascular disease will be particularly needed as we head towards an ageing population. We will facilitate this by disseminating the functional annotation of Drosophila gene products by providing our most up-to-date data every 2 months to the GO consortium, the FlyBase database, UniprotKB and RNAcentral. From these sites, the data will be freely available and searchable. All tools for the enhanced searching of Drosophila data are freely available from the FlyBase website.
Drosophila is also an important model for researchers studying insects that seriously impact on human health. There are many disease-causing parasites and viruses borne by insect hosts (vector borne diseases, e.g. sleeping sickness, malaria, dengue). They have a huge medical and economic impact in the developing world and as climate patterns shift, the effects could reach further into Europe and the UK. Globally, insect crop pests have a serious economic impact on agriculture. In the UK it is estimated that the agri-food sector contributes 7% of the economic activity of the nation. With a rise in pesticide resistance and the negative environmental impact of using such technologies, the understanding of Drosophila biology underpins the development of many new strategies to pest control. Functional interpretation of the genomes of disease-carrying insects such as mosquitoes, the Tsetse-fly, and crop pests such as the black fly and the Mediterranean fruit fly, relies heavily on the extensive experimental data available in Drosophila. By making this data freely available, it can be used to annotate the genomes of these closely-related species.

Publications

10 25 50
publication icon
Attrill H (2019) Annotation of gene product function from high-throughput studies using the Gene Ontology. in Database : the journal of biological databases and curation

publication icon
Rey AJ (2018) Using FlyBase to Find Functionally Related Drosophila Genes. in Methods in molecular biology (Clifton, N.J.)

publication icon
The Gene Ontology Consortium (2019) The Gene Ontology Resource: 20 years and still GOing strong. in Nucleic acids research

publication icon
Thurmond J (2019) FlyBase 2.0: the next generation. in Nucleic acids research

 
Description FlyMet Scientific Advisory Board Member
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a advisory committee
URL http://FlyMet.org
 
Description Introduction of an annotation framework and standards for high-throughput data using the Gene Ontology
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
 
Title Developing pipeline to produce lists of selected references in FlyBase 
Description An algorithm was developed based on curated data in FlyBase that selects research publications most likely to contain substantial information about a particular gene/gene product. This allows users of FlyBase to view a selected subset of the publications where a gene is studied. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact Previously key papers on the function of a particular gene were lost in a list of all papers on the gene. These lists provide key papers for each gene accelarating researchers ability to perform their own research on that gene. 
URL http://flybase.org/reports/FBgn0000014.html#pubs
 
Title Gene Ontology and Evidence and Conclusion Ontology 
Description Helen Atrill led a working group on annotation rules and standards in high-throughput data annotation using the GO. Introduced new evidence codes to make annotation derived from high-throughput experiments visible. Wrote guidelines for use. 
Type Of Material Data handling & control 
Year Produced 2017 
Provided To Others? Yes  
Impact Members of the GO consortium were asked to review lists of potential high-throughput annotations. Most groups have now reviewed these sets and retrofitted where required (either by removing the annotations as they do not satisfy criteria for GO annotation or using a high-throughput evidence code. This has helped raise the quality of GO annotated data across many contributing groups. 
URL http://wiki.geneontology.org/index.php/Guide_to_GO_Evidence_Codes
 
Title Pathway Pages in FlyBase - phase 1 
Description The annotation of genes using the Gene Ontology has been used to build pathway reports in FlyBase. Using an evidence-weighted model of curation, we have produced high-quality lists of pathway members and regulators for 12 major signaling pathways. This project is in its first iteration and will form the foundation for more advanced pathway modeling. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This has allowed users to quickly access validated lists of pathway members/regulators and assess how much experimantal evidence supports their inclusion. Concurrent improvements to GO annotation allow this benefit to spread beyond FlyBase to users of secondary bioinformatics tools. The data is being used to predict new pathway members and drive network modeling. 
URL http://flybase.org/lists/FBgg/pathways
 
Title Summarizing Gene Expression Data in FlyBase 
Description A graphical overview of expression data describing temporal and spatial gene expression patterns. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact Expression data has been made more accessible to users. 
 
Title Summarizing Gene Ontology data in FlyBase 
Description We helped generate Gene Ontology Summary Ribbons summarizing Gene Ontology data for each gene in FlyBase. These Ribbons are derived from the functional data associated with each gene captured using the Gene Ontology. They are graphical gene signatures, designed to give an immediate overview of a gene product's properties and depth of characterization. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact The feedback from the research community that uses FlyBase has been very positive. Researchers report that it provides a very useful overview of what can be an overwhelming amount of data. 
URL http://flybase.org/reports/FBgn0000014.html#go_summary
 
Description FlyBase Consortium membership 
Organisation FlyBase Consortium
Country Global 
Sector Academic/University 
PI Contribution We make GO annotations that are added to FlyBase. We keep FlyBase updated on changes to GO annotation policy set by the Gene Ontology Consortium (GOC) and work with FlyBase curators and developers to implement these changes. We ensure that the Gene Ontology is updated for each release of FlyBase and revise the GO annotations in line with ontology changes. We request new ontology terms from GOC as required by FlyBase curators. We attend FlyBase consortium meetings and lobby for changes to FlyBase that will make Drosophila research more accessible to researchers studying human genes or disease and assist in implementing these changes. We answer GO related-queries from FlyBase users. We are working with FlyBase to develop a text-mining approach to triage papers for GO and disease model annotation.
Collaborator Contribution They maintain the FlyBase database and associated website where our GO annotations are stored and displayed. They provide developer support for implementing new data types associated with GO annotation and disease model curation. FlyBase literature curators make GO annotations that supplement those made by the MRC funded GO curator.
Impact PMID:22554788 PMID:22127867 PMID:23125371 PMID:23160412 PMID:24715220
Start Year 2006
 
Description Gene Ontology (GO) consortium membership 
Organisation Gene Ontology Consortium
Country Global 
Sector Charity/Non Profit 
PI Contribution We make GO annotations and are the responsible for collating all Drosophila melanogaster annotation and submitting them to GOC. We contribute to the development of the Gene Ontology (request new terms, participate in specialist term development workshops, report errors). We attend GO consortium project meetings and regular conference calls where we contribute to discussion on all aspect of the Gene Ontology project particularly decisions related to annotation policy and quality control. We participate in the rota that responds to user queries directed to the GO consortium.
Collaborator Contribution The GO consortium load our Drosophila GO annotation set into their database (along with annotations from other species) and make it available for searching and download via their website. They provide us with quality control reports and suggest Drosophila additional annotations (based on phylogenetic analysis and inferences from relationships between terms in the ontology). They provide editorial assistance to change the Gene Ontology in response to our requests for new terms or error fixes.
Impact PMID:23161678 PMID:22102568
Start Year 2006
 
Description UniProt and InterPro collaboration 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Department Protein Sequences Resources
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide GO annotations and mappings between FlyBase genes and UniProt proteins. We incorporate GO annotations for Drosophila made by UniProt into FlyBase and display them on our webisite. We provide feedback to the InterPro group about their mappings between protein domains and GO terms. UniProt display our GO annotations on their website. We occasionally add GO annotations for non-Drosophilid species to directly to the UniProt database.
Collaborator Contribution UniProt display our GO annotations on their website and make Drosophila GO annotations that we display on our website. They assign InterPro domains to UniProt proteins and maintain a mapping between InterPro domains and GO terms; this information is used to infer automatic GO annotations in FlyBase. They provide access to their GO annotation tool for making annotations to non-Drosophilids.
Impact GO annotations based on InterPro domains were updated with each release of FlyBase. In Oct 2013 there were 12,227 such annotations for Drosophila melanogaster.
Start Year 2006
 
Description Biological Function Working Group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Contribution (Helen Attrill) to Biological Function working group of Alliance (of genome resources) to help harmonise the way in which GO data is presented in a mulit-organism platform.
Year(s) Of Engagement Activity 2017,2018,2019
 
Description Expression Working Group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Contribution (Helen Attrill) to Expression Working group of Alliance (of genome resources) to help harmonise the way in which expression data is presented in a mulit-organism platform and inform the development of Expression summary displays in FlyBase.
Year(s) Of Engagement Activity 2018,2019
 
Description Introduction to Bioinformatic Resources and Gene Ontology Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Assisted at a 2 day event organized and hosted by UCL aimed at helping researchers with accessing bioinformatics. The event was a hands-on workshop in which participants tried to use several databases and analysis tool via a step-by-step guide followed by helping them with analyzing them with own their data.
Year(s) Of Engagement Activity 2017
 
Description Lead working group on the annotation of gene product function from high throughput studies 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Helen Attrill lead a working group to establish standards in the annotation of results from highpthroughput studies using the GO. As a result, 13 annotation projects reviewed and retrofitted data annotated using the GO. A paper resulted from this work and, at the time of publishing, the number of high throughput-evidenced annotations in the GO database was: 34,533 annotations, representing 4.5% of the total number of experimentally evidenced annotations in the GO database (from AmiGO, 2018-12-02, 10.5281/zenodo.1899458).
Year(s) Of Engagement Activity 2018
 
Description Metamorphosis workshop for schools 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact School hands-on event for 22 Year 1/Foundation year children looking at life cycles of frogs, dragonflies and solitary bees. Followed by questions from children about animals and work in science. To allow observation and recording of lifecycles as part of the children's topic work, this activity was extended raising frogs and butterflies in the classroom.
Year(s) Of Engagement Activity 2017
 
Description Organization and attendance at GO consortium meeting 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This annual meeting of the GO consortium is essential for improving GO annotation and developing priorities for the upcoming year
Year(s) Of Engagement Activity 2018
URL http://wiki.geneontology.org/index.php/2017_Cambridge_GOC_Meeting_Agenda
 
Description Pathway Advisor Interviews & Consultation 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Four experts from different fields of pathway research were interviewed about what data they would find useful for their research and how FlyBase could provide this. This information was then used for generate options for phase 1 and 2 of FlyBase pathway pages. These options were then reviewed by researchers and the FlyBase team to generate the specification for the first phase.
The information was used to inform the evidene-weighted model for pathway curation.
Year(s) Of Engagement Activity 2016,2017
 
Description Protein Complex Working Group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participation (Helen Attrill) in protein complex working group for annotation standards and practices for protein-containing complexes. Representatives from different databases meeting on weekly - monthly basis.
Year(s) Of Engagement Activity 2018,2019
 
Description School visit (Cambridgeshire) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Primary school hands-on activity for children aged between 5-6 years old. Fun activity aimed at increasing interest in science and as a primer for evolution for younger audiences.
Year(s) Of Engagement Activity 2018
 
Description School visit - Form and Function 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Demonstration and hands-on activity within school looking at the link between form and function for 30 children between the ages of 5-7. At the end, children engaged in a craft activty encouraging them to think about an animal's environment and behavour and how that relates to coloration.
Year(s) Of Engagement Activity 2019
 
Description Signaling Pathway Working Group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Curator-led working group to review and revise signaling pathway annotations for wnt, ras and GPCR signaling. Subgroups identified annotations to review from other databases and supervised revision.
Year(s) Of Engagement Activity 2018
 
Description Signaling Pathway Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Organization and participation in a Signaling Pathways Workshop aimed at improving GO annotation practices of signaling pathway. Approximately ~20 attended from different annotation groups, databases and bioinformatics projects to discuss harmonizing curation of Signaling pathways.
Year(s) Of Engagement Activity 2018
 
Description Talk at Postgraduate Certificate in Biocuration course 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Presentation and discussion on the pros, cons and pitfalls of curating high-throughput data to a group of ~10 students taking a PG certificate in biocuration, plus ~5 leaders of other databases/curation projects. Students were from different backgrounds, including commercial settings, and many aspects were discussed, imparticular FAIR principles.
Year(s) Of Engagement Activity 2018