DanioPeaks: A Central Resource for Standardised Annotation and Re-annotation of Whole-Genome Data for the Model Vertebrate Zebrafish
Lead Research Organisation:
Imperial College London
Department Name: Institute of Clinical Sciences
Abstract
We address the scientific demand that stems from recent developments in high-throughput genomics, including the landmark ENCODE project and the new 100K genomes project (UK): the need for a suitable vertebrate model that enables high-throughput in vivo functional testing of hypotheses generated from genome-scale annotation projects. With its abundantly available, transparent and externally developing embryos and larvae, large biomass that is crucial for high-throughput methods, fast assays of gene loss of function, a reference genome sequence, and thousands of genetic mutants, zebrafish is one of the best models for studying the structure and function of genomes in vertebrate development and disease. However, zebrafish will not be able to fulfill its potential unless its genome is comprehensively annotated for functional coding and non-coding elements, similarly to human and mouse. DanioPeaks addresses this problem by developing a bioinformatic annotation and re-annotation pipeline and providing a genomics resource for the wider genomics community. DanioPeaks aims to develop the processing pipeline for analysis of all published NGS sequencing datasets (over ten thousand NGS sequencing datasets) available for zebrafish by using established standardised protocols of ENCODE and modENCODE. It will provide the means to secure the computational power and analysis tools for remapping and reanalysing up to 16.2 TB of NGS experiment data to the most recent (final) version of the zebrafish genome sequence and to make these data comparable and available for metaanalysis to the wider scientific community. DanioPeaks will collect all zebrafish NGS raw data to a single database by upload using the zebrafish Data Coordination Centre. Raw data will be processed by ENCODE processing pipeline and mapped to GRZc10 genome assembly. Secondary analysis for feature/peak calling will be carried out and submitted to ZFIN-based track hub for visualisation in gene browsers (e.g. Ensembl). The outcome will be a community repository, a publicly accessible epigenome resource and a multicenter genome resource paper with new biology identified from the reanalyzed zebrafish data in a major genomics journal.
Technical Summary
DanioPeaks project consists of two main components: first, it generates a computational pipeline for collecting, reprocessing of all published zebrafish next generation sequencing based epigenome datasets totalling 16.2TB; second, it consist of coordinated activity of 3 UK laboratories to manage the pipeline and integrate this pipeline together with two activities of international consortium efforts such as ZENCODE-ITN and DANIO-CODE consortia, which aims to standardise zebrafish epigenomics efforts. DanioPeaks will retrieve all published zebrafish epigenome datasets into DANIO-CODE data coordination centre. DanioPeaks will then outsource processing of the data to DNAnexus which uses the ENCODE processing pipelines. The pipelines include remapping of all zebrafish data. Next, uniform presentation of processed data from dozens of laboratories will be carried out and hosted at Imperial College London as track hubs. Finally, the tracks will be made publicly available and mirrored at the DANIO-CODE track hub by ZFIN (the widely used zebrafish Information Resource Center) at the University of Oregon. investigators will also initiate networking of PIs of the associated consortia, manage network meetings and publicise DanioPeaks activitites to the zebrafish and broader genomics communicties. As a result of the DanioPeaks activity, zebrafish epigenome datasets will be freely and conveniently available for various genome browsers including Ensembl, UCSC Zenbu etc. for over 1000 zebrafish laboratories worldwide, over 60 zebrafish labs in the UK, and for cross-species analysis. The outcome of the project are the processed datasets and bioinformatics pipeline for future data generation and submission, and represent an important resource for comparative genomics experts, for human geneticists seeking zebrafish models for disease and for toxicologists and epigeneticists.
Planned Impact
In addition to the academic beneficiaries described in the previous section, the implementation of this project and research generated from it will also have a wider impact on society and patient groups in the longer term. These beneficiaries include:
1. Patient groups
The research in this proposal will be fed into research programmes that identify disease loci. Recent GWA and other studies suggest that the majority of disease causing SNPs are located in genomic regulatory regions, but to date these have been understudied and very few have been verified functionally. The use of zebrafish could change this, as the only vertebrate model with high throughput capabilities for screening regulatory function of these regions. The identification of such functional elements will be of benefit since it will lead to better diagnostic tests and potentially therapies .
2. The wider public
The wider public, and in particular schoolchildren, will benefit from the work in this proposal and the activities of the staff employed on it. In collaboration with Imperial Public Engagement team the PIs will hold a workshop for secondary school children in West London and their teachers. These events have the potential to inspire children to study science at A level and University and apply this knowledge in a wide range of STEM careers that enhance the UK's knowledge economy and global competiveness in the longer term.
1. Patient groups
The research in this proposal will be fed into research programmes that identify disease loci. Recent GWA and other studies suggest that the majority of disease causing SNPs are located in genomic regulatory regions, but to date these have been understudied and very few have been verified functionally. The use of zebrafish could change this, as the only vertebrate model with high throughput capabilities for screening regulatory function of these regions. The identification of such functional elements will be of benefit since it will lead to better diagnostic tests and potentially therapies .
2. The wider public
The wider public, and in particular schoolchildren, will benefit from the work in this proposal and the activities of the staff employed on it. In collaboration with Imperial Public Engagement team the PIs will hold a workshop for secondary school children in West London and their teachers. These events have the potential to inspire children to study science at A level and University and apply this knowledge in a wide range of STEM careers that enhance the UK's knowledge economy and global competiveness in the longer term.
Publications
Baranasic D
(2022)
Multiomic atlas with functional stratification and developmental dynamics of zebrafish cis-regulatory elements.
in Nature genetics
Castro-Mondragon JA
(2022)
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.
in Nucleic acids research
Fornes O
(2020)
JASPAR 2020: update of the open-access database of transcription factor binding profiles.
in Nucleic acids research
Khan A
(2018)
JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.
in Nucleic acids research
Khan A
(2018)
JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.
in Nucleic acids research
Description | We have lead the development of the DANIO-CODE Data Coordination Centre and developed uniform pipelines for the the reprocessing of the published zebrafish high-throughput sequencing data. We have completed the functional annotation of regulatory elements in the zebrafish genome using the collected and reprocessed data. The data was released (fourth freeze, mid 2019). In the past year we have made large progress in the integrative analysis of the collected data, discovering some of the ground |
Exploitation Route | All the data is accessible in genome browser track hubs and DCC for the use of scientific community (registration required until the official release tied to database publication). In addition to its role as a model object for vertebrate development, the close relation of zebrafish to commercial fish species of the Cyprinidae family (carp and others) makes its regulatory annotation of interest for comparative genomics of commercial fish species. This project initiated the DANIO-CODE consortiu |
Sectors | Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology |
Title | DANIO-CODE DCC (Data Coordination Centre) |
Description | DANIO-CODE is an international collaborative effort that aims to annotate the functional elements of the zebrafish genome. DanioPeaks is a key contributor to this effort. The DCC aims to collect, process and serve to users all available high-throughput sequencing experimental datasets for zebrafish, and to provide data standards and infrastructure for the upload and processing of future data. |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | When released, DANIO-CODE DCC is envisioned to be the main repository of uniformly processed zebrafish transcriptomic, regulatory and epigenetic experimental data based on high-throughput sequencing. |
URL | https://danio-code.zfin.org |
Description | DanioCODE |
Organisation | Karolinska Institute |
Department | Department of Medicine, Huddinge |
Country | Sweden |
Sector | Academic/University |
PI Contribution | DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans). |
Collaborator Contribution | The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines. |
Impact | No outputs yet the project has just started. |
Start Year | 2016 |
Description | DanioCODE |
Organisation | King's College London |
Department | Department of Informatics |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans). |
Collaborator Contribution | The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines. |
Impact | No outputs yet the project has just started. |
Start Year | 2016 |
Description | DanioCODE |
Organisation | University of Birmingham |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans). |
Collaborator Contribution | The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines. |
Impact | No outputs yet the project has just started. |
Start Year | 2016 |
Description | ZENCODE-ITN |
Organisation | Karolinska Institute |
Department | Department of Biosciences and Nutrition |
Country | Sweden |
Sector | Academic/University |
PI Contribution | ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics. |
Collaborator Contribution | ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx |
Impact | As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them. |
Start Year | 2015 |
Description | ZENCODE-ITN |
Organisation | Max Planck Society |
Department | Max Planck Institute for Molecular Biomedicine |
Country | Germany |
Sector | Academic/University |
PI Contribution | ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics. |
Collaborator Contribution | ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx |
Impact | As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them. |
Start Year | 2015 |
Description | ZENCODE-ITN |
Organisation | RIKEN |
Department | Omics Science Center |
Country | Japan |
Sector | Public |
PI Contribution | ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics. |
Collaborator Contribution | ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx |
Impact | As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them. |
Start Year | 2015 |
Description | ZENCODE-ITN |
Organisation | University of Birmingham |
Department | College of Medical and Dental Sciences |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics. |
Collaborator Contribution | ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx |
Impact | As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them. |
Start Year | 2015 |
Description | ZENCODE-ITN |
Organisation | University of Liege |
Department | Interdisciplinary Cluster for Applied Genoproteomics (GIGA) |
Country | Belgium |
Sector | Academic/University |
PI Contribution | ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics. |
Collaborator Contribution | ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx |
Impact | As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them. |
Start Year | 2015 |
Description | Participation as a guest speaker at the European Schools' Science Symposium 2018, European School/Ecole Europeene, Luxembourg |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Schools |
Results and Impact | I presented a talk for secondary school students at the science symposium of European schools, and international competition and conference for students of European schools (full list of schools at https://www.eursc.eu/en/European-Schools/locations ). The abstract of the talk was: Fish-and-Chips Science: Studying the secrets of life using small fish and large computers Boris Lenhard Imperial College London Experimental biology studies a variety of model organisms. Many discoveries on model organisms apply to human biology, too. The choice of model is a compromise between its ease of use in the laboratory and its relatedness to humans. Some of the choices include bacteria like E. coli, yeast, frogs, mice, and nonhuman primates. Bacteria and humans share the same genetic code and basic metabolic pathways, so they are good models for studying both. But, if we want to study how an embryo develops into adult, we need to study a multicellular animal - the closer to human the better. Zebrafish is a small fish found in the rivers and lakes of south Asia. It has become a favourite model to study vertebrate embryonic development. Its embryo is transparent and develops outside of the body, so it can be easily manipulated and studied under microscope. It takes only a day to grow from fertilised egg to an enbryo with eyes, brain and muscles. The embryo then takes two months to grow into an adult that can produce next generation of embryos. Comparison of human and zebrafish genomes can tell us which parts of the genome are the most important. These parts include both the genes and the bits that turn genes on and off, and they are under strong evolutionary pressure not to change. I will show how we compare genomes using computer algorithms and computer graphics. This is how we have discovered that thousands of parts of genome are almost identical between human and fish. These parts control when and where genes switch on and off during embryo development. It was a big surprise to us that these control parts were much more similar than the genes themselves. We still do not know how they manage to stay so similar for hundreds of millions of years: it is one of the unsolved mysteries of genome biology. ------- After the talk there were multiple questions from students, teachers, and their guests, followed by the tour of the host school and meeting with its biology teachers. |
Year(s) Of Engagement Activity | 2018 |
URL | http://esss.wp.eursc.eu/wp-content/uploads/sites/2/2019/02/Booklet-ESSS2018.pdf |