Computational Regulatory Genomics

Lead Research Organisation: MRC London Institute of Medical Sciences

Abstract

Development of complex organisms critically depends of the regulation of gene activity across dozens, hundreds or millions of cells. This regulation ensures the fidelity of the development of complex structures and overall function of the organism. Its disruption often leads to disease. We investigate how regulatory information is encoded in the DNA, and how it is communicated across cells in multicellular organisms. The main objects of our research are:
• The structure and function of gene promoters - parts of DNA at the beginning ofeach gene that bind molecular machinery that transcribes DNA into RNA;
• The function and genomic distribution of gene regulatory elements - parts of DNA at varying distances from promoters that affect the rate/level of transcription;
• The function of transcription factors - proteins that bind to specific motifs in the regulatory elements and this way affect the activity of genes;
• Transcription factor proteins are themselves products of gene transcription and translation, and the transcription of genes that encode transcription factors is regulated by other transcription factors and often by that factor itself. The complex regulatory interrelationships of this kind are known as transcriptional regulatory networks (TRNs);
• The association of different modes of regulation with epigenetic marks - chemicalmodifications of DNA or protein complexes it is packaged with in chromosomes - and their inheritance.

Technical Summary

The central interests of the Computational Regulatory Genomics group are genomics and epigenomics of gene regulation in higher eukaryotes. We discovered several fundamental features of genes under developmental regulation and established the link between the type of core promoter and its responsiveness to long-range regulation, and developed a number of widely used computational methods and data resources for studying the regulatory content of genomes. Our current aim is to understand the molecular and mechanistic basis for different modes of gene regulation associated with the control of cell cycle, multicellular processes and regulation in terminally differentiated cells. In addition to our models of long-range regulation, the newly available epigenomic data contains crucial information for distinguishing these modes in development, differentiation and disease.
Our core research program is to identify the features of genes under different modesand regimes of gene regulation. Different genes have regulatory territories that vastly differ in size, are often nested, and exhibit distinct patterns of epigenetic modifications, or succession of such patterns. By investigating genomic regulatory territories, we want to establish sequence features and epigenomic patterns and sequence features associated with different modes of gene regulation. We want to find out which properties of genes, are responsible for the correct sorting and assignment of inputs in complex, nonlinear gene loci. Among the most exciting hypotheses we are currently investigating are the fundamentally different modes of regulation of genes in dividing vs non-dividing, terminally differentiated cells, and time-sharing of core promoters. We plan to investigate how complex input from the cell environment is interpreted by megabase-sized cis-regulatory arrays of these genes to result in tightly coordinated development of complex multicellular structures.
Our primary focus is biological insight. Nonetheless, we develop reusable methods for our analysis, to be shared by the community. This has proven to be a highly productive approach for subsequent analyses. In the forthcoming period our method development will focus on i) Transcription factor binding and transcription factor complexes: We are developing a computational toolbox for the analysis and visualisation of large sets of transcription factor binding sites and other regulatory elements and their integration with other types of genome-wide data; ii) Computational methodology for comparative epigenomics – The next several years will witness the rise of the need to analyse hundreds or thousands of epigenomes simultaneously - across tissues, time courses, epidemiological cohorts or different species. Computational capacity expertise and tools for the analysis of this scale are currently largely unavailable. Building on and inspired by methods for the segmentation of epigenomic regions, we plan to build methods for the characterisation of individual regulatory elements and their ensembles around genes under complex regulation.

Publications

10 25 50

publication icon
Cvetesic N (2017) Core promoters across the genome. in Nature biotechnology

publication icon
FANTOM Consortium And The RIKEN PMI And CLST (DGT) (2014) A promoter-level mammalian expression atlas. in Nature

publication icon
Haberle V (2012) Dissecting genomic regulatory elements in vivo. in Nature biotechnology

publication icon
Harmston N (2013) Chromatin and epigenetic features of long-range gene regulation. in Nucleic acids research

publication icon
Harmston N (2013) The mystery of extreme non-coding conservation. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Newton MD (2019) DNA stretching induces Cas9 off-target activity. in Nature structural & molecular biology

publication icon
Stadhouders R (2014) HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. in The Journal of clinical investigation

publication icon
Tan G (2019) CNEr: A toolkit for exploring extreme noncoding conservation in PLOS Computational Biology

publication icon
Van Riel B (2012) A novel complex, RUNX1-MYEF2, represses hematopoietic genes in erythroid cells. in Molecular and cellular biology

 
Title JASPAR 
Description A leading open-access database of transcription factor binding site profiles. 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact JASPAR is the most popular open-access database of transcription factor matrix profiles. It was first released in 2004, and the new update was released in January 2014. 
URL http://jaspar.genereg.net
 
Description FANTOM5 
Organisation RIKEN
Department Omics Science Center
Country Japan 
Sector Public 
PI Contribution Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation.
Collaborator Contribution The partners produced the experimental data to analyse.
Impact Several papers have been published.
Start Year 2008
 
Description FANTOM5 
Organisation University of British Columbia
Department Centre for Molecular Medicine and Therapeutics
Country Canada 
Sector Academic/University 
PI Contribution Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation.
Collaborator Contribution The partners produced the experimental data to analyse.
Impact Several papers have been published.
Start Year 2008
 
Description FANTOM5 
Organisation University of Copenhagen
Department Department of Biology
Country Denmark 
Sector Academic/University 
PI Contribution Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation.
Collaborator Contribution The partners produced the experimental data to analyse.
Impact Several papers have been published.
Start Year 2008
 
Description FANTOM5 
Organisation University of Edinburgh
Department The Roslin Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation.
Collaborator Contribution The partners produced the experimental data to analyse.
Impact Several papers have been published.
Start Year 2008
 
Description FANTOM6 
Organisation RIKEN
Department Division of Genomic Technologies
Country Japan 
Sector Charity/Non Profit 
PI Contribution We have been members of FANTOM consortia since FANTOM2. Our task has been to analyse the data produced by RIKEN and experimental collaborators of the consortium.
Collaborator Contribution The focus of FANTOM6 is massively parallel functional analysis of long noncoding RNAs. In the pilot phase, the RIKEN Division of Genomic Technologies has produced expression data (CAGE and RNA-seq) for the knockdown of a large number of lncRNAs in one human cell type. Main part of the project will expand to multiple cell types and include CRISPR/Cas9 knockouts of lncRNA in addition to shRNA knockdowns.
Impact There are no outcomes yet - the collaboration is still in the early stage. It is an interdisciplinary collaboration between - genomic technology development (RIKEN) - experimental molecular biology - computational biology.
Start Year 2015
 
Description ZEPROME 
Organisation Karlsruhe Institute of Technology
Department Institute of Toxicology and Genetics
Country Germany 
Sector Academic/University 
PI Contribution My group has provided scientific leadership and expertise in computational genomics.
Collaborator Contribution Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos
Impact Several papers under review or revision
Start Year 2008
 
Description ZEPROME 
Organisation RIKEN
Department Omics Science Center
Country Japan 
Sector Public 
PI Contribution My group has provided scientific leadership and expertise in computational genomics.
Collaborator Contribution Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos
Impact Several papers under review or revision
Start Year 2008
 
Description ZEPROME 
Organisation University of Birmingham
Department College of Medical and Dental Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution My group has provided scientific leadership and expertise in computational genomics.
Collaborator Contribution Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos
Impact Several papers under review or revision
Start Year 2008
 
Title CAGEr 
Description A Bioconductor package for the analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining. Originally released in 2013, it has recently been expanded to serve the goals of the BBSRC project. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact The software enables rapid and efficient retrieval of promoter CAGE data from existing collections, as well as the processing, analysis and visualisation of new CAGE data. It facilitates easy integration of CAGE with other genome-wide data. 
URL http://www.bioconductor.org/packages/release/bioc/html/CAGEr.html
 
Title CNEr 
Description CNEr is a R/Bioconductor package for the large-scale identification and advanced visualization of sets of conserved noncoding elements. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The package is used extensively within our research group to produce sets of extremely conserved genomic elements for further analysis. A publication describing the package is planned for 2016. 
URL http://bioconductor.org/packages/release/bioc/html/CNEr.html
 
Title GenomicInteractions 
Description R package for handling Genomic interaction data, such as ChIA-PET/Hi-C, annotating genomic features with interaction information and producing various plots / statistics 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact This is a newly released tool. We have developed it to analyse CHIA-PET and Hi-C data for the needs of our own research (two papers submitted). 
URL http://www.bioconductor.org/packages/release/bioc/html/GenomicInteractions.html
 
Title TFBSTools 
Description A R/Bioconductor package for the analysis and manipulation of transcription factor binding sites and their matrix profiles 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact This is R/Bioconductor extended port of our popular Perl software toolkit, which brings TFBS manipulation and detection software to genome-wide level. 
URL http://www.bioconductor.org/packages/release/bioc/html/TFBSTools.html
 
Title heatmaps 
Description heatmaps is a R/Bioconductor package that provides functions for plotting heatmaps of genome-wide data across genomic intervals, such as ChIP-seq signals at peaks or across promoters. Many functions are also provided for investigating sequence features. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact heatmaps has been used to produce visualisations for several of our recent publications. 
URL https://bioconductor.org/packages/release/bioc/html/heatmaps.html
 
Title r3Cseq 
Description A R/Bioconductor package for the analysis of Chromosome Conformation Capture and Next-generation Sequencing (3C-seq). 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact The software enables easy analysis and visualisation of chromosomal 3D interactions from 3C-seq and 4C data. 
URL http://www.bioconductor.org/packages/release/bioc/html/r3Cseq.html
 
Title seqPattern 
Description seqPattern is a R/Bioconductor package for visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The software was originally developed for the visualisation of data in this publication: Haberle, V., Li, N., Hadzhiev, Y., Plessy, C., Previti, C., Nepal, C., et al. (2014). Two independent transcription initiation codes overlap on vertebrate core promoters. Nature, 507(7492), 381-385. http://doi.org/10.1038/nature12974 Since then it has been used for several publications that are currently accepted for publication or under review. 
URL http://bioconductor.org/packages/release/bioc/html/seqPattern.html