Computational Regulatory Genomics
Lead Research Organisation:
MRC London Institute of Medical Sciences
Abstract
Development of complex organisms critically depends of the regulation of gene activity across dozens, hundreds or millions of cells. This regulation ensures the fidelity of the development of complex structures and overall function of the organism. Its disruption often leads to disease. We investigate how regulatory information is encoded in the DNA, and how it is communicated across cells in multicellular organisms. The main objects of our research are:
• The structure and function of gene promoters - parts of DNA at the beginning ofeach gene that bind molecular machinery that transcribes DNA into RNA;
• The function and genomic distribution of gene regulatory elements - parts of DNA at varying distances from promoters that affect the rate/level of transcription;
• The function of transcription factors - proteins that bind to specific motifs in the regulatory elements and this way affect the activity of genes;
• Transcription factor proteins are themselves products of gene transcription and translation, and the transcription of genes that encode transcription factors is regulated by other transcription factors and often by that factor itself. The complex regulatory interrelationships of this kind are known as transcriptional regulatory networks (TRNs);
• The association of different modes of regulation with epigenetic marks - chemicalmodifications of DNA or protein complexes it is packaged with in chromosomes - and their inheritance.
• The structure and function of gene promoters - parts of DNA at the beginning ofeach gene that bind molecular machinery that transcribes DNA into RNA;
• The function and genomic distribution of gene regulatory elements - parts of DNA at varying distances from promoters that affect the rate/level of transcription;
• The function of transcription factors - proteins that bind to specific motifs in the regulatory elements and this way affect the activity of genes;
• Transcription factor proteins are themselves products of gene transcription and translation, and the transcription of genes that encode transcription factors is regulated by other transcription factors and often by that factor itself. The complex regulatory interrelationships of this kind are known as transcriptional regulatory networks (TRNs);
• The association of different modes of regulation with epigenetic marks - chemicalmodifications of DNA or protein complexes it is packaged with in chromosomes - and their inheritance.
Technical Summary
The central interests of the Computational Regulatory Genomics group are genomics and epigenomics of gene regulation in higher eukaryotes. We discovered several fundamental features of genes under developmental regulation and established the link between the type of core promoter and its responsiveness to long-range regulation, and developed a number of widely used computational methods and data resources for studying the regulatory content of genomes. Our current aim is to understand the molecular and mechanistic basis for different modes of gene regulation associated with the control of cell cycle, multicellular processes and regulation in terminally differentiated cells. In addition to our models of long-range regulation, the newly available epigenomic data contains crucial information for distinguishing these modes in development, differentiation and disease.
Our core research program is to identify the features of genes under different modesand regimes of gene regulation. Different genes have regulatory territories that vastly differ in size, are often nested, and exhibit distinct patterns of epigenetic modifications, or succession of such patterns. By investigating genomic regulatory territories, we want to establish sequence features and epigenomic patterns and sequence features associated with different modes of gene regulation. We want to find out which properties of genes, are responsible for the correct sorting and assignment of inputs in complex, nonlinear gene loci. Among the most exciting hypotheses we are currently investigating are the fundamentally different modes of regulation of genes in dividing vs non-dividing, terminally differentiated cells, and time-sharing of core promoters. We plan to investigate how complex input from the cell environment is interpreted by megabase-sized cis-regulatory arrays of these genes to result in tightly coordinated development of complex multicellular structures.
Our primary focus is biological insight. Nonetheless, we develop reusable methods for our analysis, to be shared by the community. This has proven to be a highly productive approach for subsequent analyses. In the forthcoming period our method development will focus on i) Transcription factor binding and transcription factor complexes: We are developing a computational toolbox for the analysis and visualisation of large sets of transcription factor binding sites and other regulatory elements and their integration with other types of genome-wide data; ii) Computational methodology for comparative epigenomics – The next several years will witness the rise of the need to analyse hundreds or thousands of epigenomes simultaneously - across tissues, time courses, epidemiological cohorts or different species. Computational capacity expertise and tools for the analysis of this scale are currently largely unavailable. Building on and inspired by methods for the segmentation of epigenomic regions, we plan to build methods for the characterisation of individual regulatory elements and their ensembles around genes under complex regulation.
Our core research program is to identify the features of genes under different modesand regimes of gene regulation. Different genes have regulatory territories that vastly differ in size, are often nested, and exhibit distinct patterns of epigenetic modifications, or succession of such patterns. By investigating genomic regulatory territories, we want to establish sequence features and epigenomic patterns and sequence features associated with different modes of gene regulation. We want to find out which properties of genes, are responsible for the correct sorting and assignment of inputs in complex, nonlinear gene loci. Among the most exciting hypotheses we are currently investigating are the fundamentally different modes of regulation of genes in dividing vs non-dividing, terminally differentiated cells, and time-sharing of core promoters. We plan to investigate how complex input from the cell environment is interpreted by megabase-sized cis-regulatory arrays of these genes to result in tightly coordinated development of complex multicellular structures.
Our primary focus is biological insight. Nonetheless, we develop reusable methods for our analysis, to be shared by the community. This has proven to be a highly productive approach for subsequent analyses. In the forthcoming period our method development will focus on i) Transcription factor binding and transcription factor complexes: We are developing a computational toolbox for the analysis and visualisation of large sets of transcription factor binding sites and other regulatory elements and their integration with other types of genome-wide data; ii) Computational methodology for comparative epigenomics – The next several years will witness the rise of the need to analyse hundreds or thousands of epigenomes simultaneously - across tissues, time courses, epidemiological cohorts or different species. Computational capacity expertise and tools for the analysis of this scale are currently largely unavailable. Building on and inspired by methods for the segmentation of epigenomic regions, we plan to build methods for the characterisation of individual regulatory elements and their ensembles around genes under complex regulation.
Organisations
- MRC London Institute of Medical Sciences, United Kingdom (Lead Research Organisation)
- University of Copenhagen, Denmark (Collaboration)
- University of Edinburgh, United Kingdom (Collaboration)
- Karlsruhe Institute of Technology (Collaboration)
- University of British Columbia, Canada (Collaboration)
- University of Birmingham, United Kingdom (Collaboration)
- RIKEN, Japan (Collaboration)
People |
ORCID iD |
Boris Lenhard (Principal Investigator) |
Publications

Mathelier A
(2016)
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.
in Nucleic acids research

Nash AJ
(2018)
A Novel Measure of Non-coding Genome Conservation Identifies Genomic Regulatory Blocks Within Primates.
in Bioinformatics (Oxford, England)

Naville M
(2015)
Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome.
in Nature communications

Nepal C
(2013)
Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis.
in Genome research

Nepal C
(2016)
Transcriptional, post-transcriptional and chromatin-associated regulation of pri-miRNAs, pre-miRNAs and moRNAs.
in Nucleic acids research

Newton MD
(2019)
DNA stretching induces Cas9 off-target activity.
in Nature structural & molecular biology

Polychronopoulos D
(2017)
Conserved non-coding elements: developmental gene regulation meets genome organization.
in Nucleic acids research

Seitan VC
(2013)
Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments.
in Genome research

Sharma Y
(2014)
Computational characterization of modes of transcriptional regulation of nuclear receptor genes.
in PloS one

Sheffield NC
(2013)
Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions.
in Genome research
Title | JASPAR |
Description | A leading open-access database of transcription factor binding site profiles. |
Type Of Material | Database/Collection of data |
Provided To Others? | Yes |
Impact | JASPAR is the most popular open-access database of transcription factor matrix profiles. It was first released in 2004, and the new update was released in January 2014. |
URL | http://jaspar.genereg.net |
Description | FANTOM5 |
Organisation | RIKEN |
Department | Omics Science Center |
Country | Japan |
Sector | Public |
PI Contribution | Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation. |
Collaborator Contribution | The partners produced the experimental data to analyse. |
Impact | Several papers have been published. |
Start Year | 2008 |
Description | FANTOM5 |
Organisation | University of British Columbia |
Department | Centre for Molecular Medicine and Therapeutics |
Country | Canada |
Sector | Academic/University |
PI Contribution | Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation. |
Collaborator Contribution | The partners produced the experimental data to analyse. |
Impact | Several papers have been published. |
Start Year | 2008 |
Description | FANTOM5 |
Organisation | University of Copenhagen |
Department | Department of Biology |
Country | Denmark |
Sector | Academic/University |
PI Contribution | Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation. |
Collaborator Contribution | The partners produced the experimental data to analyse. |
Impact | Several papers have been published. |
Start Year | 2008 |
Description | FANTOM5 |
Organisation | University of Edinburgh |
Department | The Roslin Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Computational analysis of mammalian promoteromes, participation in experimental design, hypothesis generation. |
Collaborator Contribution | The partners produced the experimental data to analyse. |
Impact | Several papers have been published. |
Start Year | 2008 |
Description | FANTOM6 |
Organisation | RIKEN |
Department | Division of Genomic Technologies |
Country | Japan |
Sector | Charity/Non Profit |
PI Contribution | We have been members of FANTOM consortia since FANTOM2. Our task has been to analyse the data produced by RIKEN and experimental collaborators of the consortium. |
Collaborator Contribution | The focus of FANTOM6 is massively parallel functional analysis of long noncoding RNAs. In the pilot phase, the RIKEN Division of Genomic Technologies has produced expression data (CAGE and RNA-seq) for the knockdown of a large number of lncRNAs in one human cell type. Main part of the project will expand to multiple cell types and include CRISPR/Cas9 knockouts of lncRNA in addition to shRNA knockdowns. |
Impact | There are no outcomes yet - the collaboration is still in the early stage. It is an interdisciplinary collaboration between - genomic technology development (RIKEN) - experimental molecular biology - computational biology. |
Start Year | 2015 |
Description | ZEPROME |
Organisation | Karlsruhe Institute of Technology |
Department | Institute of Toxicology and Genetics |
Country | Germany |
Sector | Academic/University |
PI Contribution | My group has provided scientific leadership and expertise in computational genomics. |
Collaborator Contribution | Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos |
Impact | Several papers under review or revision |
Start Year | 2008 |
Description | ZEPROME |
Organisation | RIKEN |
Department | Omics Science Center |
Country | Japan |
Sector | Public |
PI Contribution | My group has provided scientific leadership and expertise in computational genomics. |
Collaborator Contribution | Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos |
Impact | Several papers under review or revision |
Start Year | 2008 |
Description | ZEPROME |
Organisation | University of Birmingham |
Department | College of Medical and Dental Sciences |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | My group has provided scientific leadership and expertise in computational genomics. |
Collaborator Contribution | Ferenc Müller lab, University of Birmingham - scientific leadership, zebrafish transgenics, sample preparation, ChIP-seq of zebrafish embryos Uwe Strähle lab, KIT - RNA-seq of zebrafish embryos RIKEN - CAGE of zebrafish embryos |
Impact | Several papers under review or revision |
Start Year | 2008 |
Title | CAGEr |
Description | A Bioconductor package for the analysis of CAGE (Cap Analysis of Gene Expression) sequencing data for precise mapping of transcription start sites and promoterome mining. Originally released in 2013, it has recently been expanded to serve the goals of the BBSRC project. |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | The software enables rapid and efficient retrieval of promoter CAGE data from existing collections, as well as the processing, analysis and visualisation of new CAGE data. It facilitates easy integration of CAGE with other genome-wide data. |
URL | http://www.bioconductor.org/packages/release/bioc/html/CAGEr.html |
Title | CNEr |
Description | CNEr is a R/Bioconductor package for the large-scale identification and advanced visualization of sets of conserved noncoding elements. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | The package is used extensively within our research group to produce sets of extremely conserved genomic elements for further analysis. A publication describing the package is planned for 2016. |
URL | http://bioconductor.org/packages/release/bioc/html/CNEr.html |
Title | GenomicInteractions |
Description | R package for handling Genomic interaction data, such as ChIA-PET/Hi-C, annotating genomic features with interaction information and producing various plots / statistics |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | This is a newly released tool. We have developed it to analyse CHIA-PET and Hi-C data for the needs of our own research (two papers submitted). |
URL | http://www.bioconductor.org/packages/release/bioc/html/GenomicInteractions.html |
Title | TFBSTools |
Description | A R/Bioconductor package for the analysis and manipulation of transcription factor binding sites and their matrix profiles |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | This is R/Bioconductor extended port of our popular Perl software toolkit, which brings TFBS manipulation and detection software to genome-wide level. |
URL | http://www.bioconductor.org/packages/release/bioc/html/TFBSTools.html |
Title | heatmaps |
Description | heatmaps is a R/Bioconductor package that provides functions for plotting heatmaps of genome-wide data across genomic intervals, such as ChIP-seq signals at peaks or across promoters. Many functions are also provided for investigating sequence features. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | heatmaps has been used to produce visualisations for several of our recent publications. |
URL | https://bioconductor.org/packages/release/bioc/html/heatmaps.html |
Title | r3Cseq |
Description | A R/Bioconductor package for the analysis of Chromosome Conformation Capture and Next-generation Sequencing (3C-seq). |
Type Of Technology | Software |
Year Produced | 2012 |
Open Source License? | Yes |
Impact | The software enables easy analysis and visualisation of chromosomal 3D interactions from 3C-seq and 4C data. |
URL | http://www.bioconductor.org/packages/release/bioc/html/r3Cseq.html |
Title | seqPattern |
Description | seqPattern is a R/Bioconductor package for visualising oligonucleotide patterns and sequence motifs occurrences across a large set of sequences centred at a common reference point and sorted by a user defined feature. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | The software was originally developed for the visualisation of data in this publication: Haberle, V., Li, N., Hadzhiev, Y., Plessy, C., Previti, C., Nepal, C., et al. (2014). Two independent transcription initiation codes overlap on vertebrate core promoters. Nature, 507(7492), 381-385. http://doi.org/10.1038/nature12974 Since then it has been used for several publications that are currently accepted for publication or under review. |
URL | http://bioconductor.org/packages/release/bioc/html/seqPattern.html |