DanioPeaks: A Central Resource for Standardised Annotation and Re-annotation of Whole-Genome Data for the Model Vertebrate Zebrafish

Lead Research Organisation: Imperial College London
Department Name: Institute of Clinical Sciences

Abstract

We address the scientific demand that stems from recent developments in high-throughput genomics, including the landmark ENCODE project and the new 100K genomes project (UK): the need for a suitable vertebrate model that enables high-throughput in vivo functional testing of hypotheses generated from genome-scale annotation projects. With its abundantly available, transparent and externally developing embryos and larvae, large biomass that is crucial for high-throughput methods, fast assays of gene loss of function, a reference genome sequence, and thousands of genetic mutants, zebrafish is one of the best models for studying the structure and function of genomes in vertebrate development and disease. However, zebrafish will not be able to fulfill its potential unless its genome is comprehensively annotated for functional coding and non-coding elements, similarly to human and mouse. DanioPeaks addresses this problem by developing a bioinformatic annotation and re-annotation pipeline and providing a genomics resource for the wider genomics community. DanioPeaks aims to develop the processing pipeline for analysis of all published NGS sequencing datasets (over ten thousand NGS sequencing datasets) available for zebrafish by using established standardised protocols of ENCODE and modENCODE. It will provide the means to secure the computational power and analysis tools for remapping and reanalysing up to 16.2 TB of NGS experiment data to the most recent (final) version of the zebrafish genome sequence and to make these data comparable and available for metaanalysis to the wider scientific community. DanioPeaks will collect all zebrafish NGS raw data to a single database by upload using the zebrafish Data Coordination Centre. Raw data will be processed by ENCODE processing pipeline and mapped to GRZc10 genome assembly. Secondary analysis for feature/peak calling will be carried out and submitted to ZFIN-based track hub for visualisation in gene browsers (e.g. Ensembl). The outcome will be a community repository, a publicly accessible epigenome resource and a multicenter genome resource paper with new biology identified from the reanalyzed zebrafish data in a major genomics journal.

Technical Summary

DanioPeaks project consists of two main components: first, it generates a computational pipeline for collecting, reprocessing of all published zebrafish next generation sequencing based epigenome datasets totalling 16.2TB; second, it consist of coordinated activity of 3 UK laboratories to manage the pipeline and integrate this pipeline together with two activities of international consortium efforts such as ZENCODE-ITN and DANIO-CODE consortia, which aims to standardise zebrafish epigenomics efforts. DanioPeaks will retrieve all published zebrafish epigenome datasets into DANIO-CODE data coordination centre. DanioPeaks will then outsource processing of the data to DNAnexus which uses the ENCODE processing pipelines. The pipelines include remapping of all zebrafish data. Next, uniform presentation of processed data from dozens of laboratories will be carried out and hosted at Imperial College London as track hubs. Finally, the tracks will be made publicly available and mirrored at the DANIO-CODE track hub by ZFIN (the widely used zebrafish Information Resource Center) at the University of Oregon. investigators will also initiate networking of PIs of the associated consortia, manage network meetings and publicise DanioPeaks activitites to the zebrafish and broader genomics communicties. As a result of the DanioPeaks activity, zebrafish epigenome datasets will be freely and conveniently available for various genome browsers including Ensembl, UCSC Zenbu etc. for over 1000 zebrafish laboratories worldwide, over 60 zebrafish labs in the UK, and for cross-species analysis. The outcome of the project are the processed datasets and bioinformatics pipeline for future data generation and submission, and represent an important resource for comparative genomics experts, for human geneticists seeking zebrafish models for disease and for toxicologists and epigeneticists.

Planned Impact

In addition to the academic beneficiaries described in the previous section, the implementation of this project and research generated from it will also have a wider impact on society and patient groups in the longer term. These beneficiaries include:

1. Patient groups
The research in this proposal will be fed into research programmes that identify disease loci. Recent GWA and other studies suggest that the majority of disease causing SNPs are located in genomic regulatory regions, but to date these have been understudied and very few have been verified functionally. The use of zebrafish could change this, as the only vertebrate model with high throughput capabilities for screening regulatory function of these regions. The identification of such functional elements will be of benefit since it will lead to better diagnostic tests and potentially therapies .

2. The wider public
The wider public, and in particular schoolchildren, will benefit from the work in this proposal and the activities of the staff employed on it. In collaboration with Imperial Public Engagement team the PIs will hold a workshop for secondary school children in West London and their teachers. These events have the potential to inspire children to study science at A level and University and apply this knowledge in a wide range of STEM careers that enhance the UK's knowledge economy and global competiveness in the longer term.
 
Description We have lead the development of the DANIO-CODE Data Coordination Centre and developed uniform pipelines for the the reprocessing of the published zebrafish high-throughput sequencing data. We have completed the functional annotation of regulatory elements in the zebrafish genome using the collected and reprocessed data. The data was released (fourth freeze, mid 2019). In the past year we have made large progress in the integrative analysis of the collected data, discovering some of the ground
Exploitation Route All the data is accessible in genome browser track hubs and DCC for the use of scientific community (registration required until the official release tied to database publication). In addition to its role as a model object for vertebrate development, the close relation of zebrafish to commercial fish species of the Cyprinidae family (carp and others) makes its regulatory annotation of interest for comparative genomics of commercial fish species.

This project initiated the DANIO-CODE consortiu
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Title DANIO-CODE DCC (Data Coordination Centre) 
Description DANIO-CODE is an international collaborative effort that aims to annotate the functional elements of the zebrafish genome. DanioPeaks is a key contributor to this effort. The DCC aims to collect, process and serve to users all available high-throughput sequencing experimental datasets for zebrafish, and to provide data standards and infrastructure for the upload and processing of future data. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact When released, DANIO-CODE DCC is envisioned to be the main repository of uniformly processed zebrafish transcriptomic, regulatory and epigenetic experimental data based on high-throughput sequencing. 
URL https://danio-code.zfin.org
 
Description DanioCODE 
Organisation Karolinska Institute
Department Department of Medicine, Huddinge
Country Sweden 
Sector Academic/University 
PI Contribution DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans).
Collaborator Contribution The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines.
Impact No outputs yet the project has just started.
Start Year 2016
 
Description DanioCODE 
Organisation King's College London
Department Department of Informatics
Country United Kingdom 
Sector Academic/University 
PI Contribution DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans).
Collaborator Contribution The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines.
Impact No outputs yet the project has just started.
Start Year 2016
 
Description DanioCODE 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution DanioCODE is a collaborative initiative to collect, reprocess and reanalyse all publicly available data from next generation sequencing (NGS) experiments for zebrafish (Danio rerio), a leading model organism for vertebrate developmental biology. The aim is to have a resource of similar scope, standardisation and quality as ENCODE (for human and mouse) and modENCODE (fruit fly D. melanogaster and the nematode C. elegans).
Collaborator Contribution The postdoc employed on this project (Dr Damir Baranasic) will build and manage the data coordination centre for zebrafish genomics, and with collaborators develop integrative approaches to analysing the data. Specific focus will be on integrative analysis of developmental time courses, a feature not present in ENCODE data or pipelines.
Impact No outputs yet the project has just started.
Start Year 2016
 
Description ZENCODE-ITN 
Organisation Karolinska Institute
Department Department of Biosciences and Nutrition
Country Sweden 
Sector Academic/University 
PI Contribution ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics.
Collaborator Contribution ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx
Impact As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them.
Start Year 2015
 
Description ZENCODE-ITN 
Organisation Max Planck Society
Department Max Planck Institute for Molecular Biomedicine
Country Germany 
Sector Academic/University 
PI Contribution ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics.
Collaborator Contribution ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx
Impact As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them.
Start Year 2015
 
Description ZENCODE-ITN 
Organisation RIKEN
Department Omics Science Center
Country Japan 
Sector Public 
PI Contribution ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics.
Collaborator Contribution ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx
Impact As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them.
Start Year 2015
 
Description ZENCODE-ITN 
Organisation University of Birmingham
Department College of Medical and Dental Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics.
Collaborator Contribution ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx
Impact As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them.
Start Year 2015
 
Description ZENCODE-ITN 
Organisation University of Liege
Department Interdisciplinary Cluster for Applied Genoproteomics (GIGA)
Country Belgium 
Sector Academic/University 
PI Contribution ZENCODE-ITN is a Marie Curie initial training programme funded by the European Union under the H020 programme. The ZENCODE Initial Training Network aims to improve career perspectives of early-stage researchers (ESR) in both public and private sectors, thereby making research careers more attractive to young people. The scientific focus of the ZENCODE-ITN consortium is to understand genome regulation through combined experimental and computational approaches in a model vertebrate. The consortium recognises the urgent need for highly skilled young scientists trained in both computational biology and experimental wet lab biology. This network provides multi-disciplinary skills for a solid foundation in computational biology and developmental genomics.
Collaborator Contribution ZENCODE-ITN as a whole aims to comprehensively annotate functional epigenetic and transcribed elements, decipher genomic codes of transcription, as well as coding and non-coding gene function during vertebrate development and enhance zebrafish as an attractive developmental, comparative genomic and disease model. The participants include major zebrafish genomics laboratories, eminent computational biologists and world-class genomics technology experts. The training program is designed for 15 ESRs, with more than 40 intersectoral and interdisciplinary secondments available, 7 training courses and 2 workshops/conferences. Through a trans-national network of public and private partners we aim to enhance the employability of the recruited ESRs through exposure to both academia and enterprise, thus extending the traditional academic research training setting and eliminating cultural and other barriers to mobility. The full list of partners is available at https://www.birmingham.ac.uk/generic/zencode-itn/partners/index.aspx
Impact As part of ZENCODE-ITN had direct collaborations with the laboratories of Ferenc Mueller (University of Birmingham), Juan M. Vaquerizas (Max Planck Institute, Münster, Germany), Carsten Daub (Karolinska Institutet, Stockholm, Sweden), Bernard Peers (University of Liege, Belgium) and Piero Carninci (RIKEN, Yokohama, Japan). It was a collaboration of computational biology research groups and experimental groups that use zebrafish as a model system. We have published jointly authored papers with all of them.
Start Year 2015
 
Description Participation as a guest speaker at the European Schools' Science Symposium 2018, European School/Ecole Europeene, Luxembourg 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact I presented a talk for secondary school students at the science symposium of European schools, and international competition and conference for students of European schools (full list of schools at https://www.eursc.eu/en/European-Schools/locations ). The abstract of the talk was:

Fish-and-Chips Science: Studying the secrets of life using small fish and large computers
Boris Lenhard
Imperial College London

Experimental biology studies a variety of model organisms. Many discoveries on model organisms apply to human biology, too. The choice of model is a compromise between its ease of use in the laboratory and its relatedness to humans. Some of the choices include bacteria like E. coli, yeast, frogs, mice, and nonhuman primates. Bacteria and humans share the same genetic code and basic metabolic pathways, so they are good models for studying both. But, if we want to study how an embryo develops into adult, we need to study a multicellular animal - the closer to human the better.
Zebrafish is a small fish found in the rivers and lakes of south Asia. It has become a favourite model to study vertebrate embryonic development. Its embryo is transparent and develops outside of the body, so it can be easily manipulated and studied under microscope. It takes only a day to grow from fertilised egg to an enbryo with eyes, brain and muscles. The embryo then takes two months to grow into an adult that can produce next generation of embryos. 
Comparison of human and zebrafish genomes can tell us which parts of the genome  are the most important. These parts include both the genes and the bits that turn genes on and off, and they are under strong evolutionary pressure not to change.  I will show how we compare genomes using computer algorithms and computer graphics. This is how we have discovered that thousands of parts of genome are almost identical between human and fish. These parts control when and where genes switch on and off during embryo development. It was a big surprise to us that these control parts were much more similar than the genes themselves.  We still do not know how they manage to stay so similar for hundreds of millions of years: it is one of the unsolved mysteries of genome biology.

-------

After the talk there were multiple questions from students, teachers, and their guests, followed by the tour of the host school and meeting with its biology teachers.
Year(s) Of Engagement Activity 2018
URL http://esss.wp.eursc.eu/wp-content/uploads/sites/2/2019/02/Booklet-ESSS2018.pdf