BBSRC-NSF/BIO: Integrative analysis and Visualisation of Fly Cell Atlas datasets to enable cross-species comparisons

Lead Research Organisation: University of Cambridge
Department Name: Physiology Development and Neuroscience

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This proposal is comprised of three main aims: the first, will develop the computational analysis pipelines for scRNA-seq data in Drosophila melanogaster, including batch correction, cell clustering, marker gene detection, trajectory and differential analysis, in addition to cell type annotation. This will create standardised workflows which can be run across the different Fly Cell Atlas (FCA) datasets and the metadata will be curated with using ontology terms and genetic feature identifiers. The annotation stage will encourage and capture curation from the fly community by scientists with expertise in various tissues and cell types.

The second, is to develop the fly-specific functionality of scExpression Atlas allowing easy identification of both the raw and processed data, as well as functionality to visualise cell type expression data in FlyBase and the Drosophila resources at Harvard University. Key to this is the enhancement of FCA data visualisation, gene set enrichment analysis tools will be developed and Anatomograms will be available as embeddable widgets allowing specific experiments to be easily embedded by different websites.

Lastly, comparative analyses will be performed using datasets from FCA, Mouse Cell Atlas and the Human Cell Atlas. Orthologous relationships will be used to map genes from one species to another to generate a combined, integrated dataset. Different methodologies will be explored for dataset comparison and both mappings and ontologies will be extended and improved. The scExpression Atlas user interface will be further developed to enable users to interrogate the data cross species, in addition to analysis by cell type and tissue. In collaboration with the FCA community we will extend the scExpression Atlas APIs, download formats and associated software to allow data re-use and re-analysis and so promote Open Science.

Planned Impact

The fruit fly, Drosophila melanogaster, has for the last century been fundamental to the study of genetics. It is used in many areas of research as the model organism of choice, as it provides the ability to study genetics in the laboratory and apply findings to human genetics. The vast majority of the fundamental biochemical mechanisms and pathways are conserved between fly and humans. Indeed, 75% of the genes that cause human disease are found in fly and, thus, fly data provide insights into the same processes within humans.

The emergence of a new technology, single cell RNA sequencing (scRNA-seq), has provided information as to which genes are switched on or most active within a single cell. This data are generating fundamental new insights into how cells differentiate into specific cell types, and what a cell type represents at the molecular level. The increasing number of scRNA-seq datasets from different species encouraged us to develop the Single Cell Expression Atlas (scEA). This is a web portal which enables users to more easily access and interpret this data. It is anticipated that Drosophila single cell data will increase from 10 datasets to ~100 in 2020 and further two-fold increase in 2021. Key to the scientific exploitation of this data will be the effective analysis of the fly data and the ability to explore interconnections between fly data and human and mouse data.

In this project we will provide the means by which fly datasets can be easily interpreted and linked to mouse and human datasets via scEA. This project will enable analysis pipelines to be developed to combine the available and emerging datasets, alongside the necessary computational infrastructure to host the Fly Cell Atlas (FCA) datasets. ScEA will provide users with an easy to navigate web service with exploratory querying capability, in addition to data download capabilities for further data analysis. The service will be fully integrated with the established fly resources, FlyBase and the Drosophila Resources at Harvard University.
Data sets and derived analysis results will be easily accessible in standard formats to be reused by: (1) wet-lab biologists investigating new experimental hypotheses, and comparing published datasets to their own new results; (2) computational biologists engaged in new development of analysis tools or machine learning applications where access to well curated and standardised data sets is essential.

The availability of the combined Fly Cell Atlas through user-friendly interfaces at Harvard and EMBL-EBI will contribute greatly to all projects investigating transcription at the single cell level. By providing molecular signatures of each cell type, the Fly Cell Atlas data will aid the identification of the cell types altered when genes are mutated, including models of human diseases. Mapping cell types across species will permit verification of the similarities in the underlying cellular defects caused by loss of similar gene function in human and fly.

Establishing a robust atlas of cell types in Drosophila will also aid projects aimed at controlling insects that are vectors of disease or agricultural pests, by providing basic knowledge of the cell types that can be used to target novel control strategies. With a rise in pesticide resistance and the negative environmental impact of pesticides, the understanding of Drosophila biology underpins development of new strategies. Functional interpretation of the genomes of disease-carrying insects and crop pests relies heavily on the extensive experimental data from Drosophila.

Methods developed within this project will be applicable to biological and bioinformatics communities beyond researchers working in fly, mouse or human. With the dissemination of analysis tools in containerised form and their availability in public registries, we expect their usage to expand over a wider spectrum of computational biologists.

Publications

10 25 50
publication icon
Gramates LS (2022) FlyBase: a guided tour of highlighted features. in Genetics

publication icon
Matentzoglu N (2022) Ontology Development Kit: a toolkit for building, maintaining and standardizing biomedical ontologies. in Database : the journal of biological databases and curation