Gene regulation by distal regulatory elements in Erythropoiesis and the effect of natural sequence variants.

Lead Research Organisation: University of Oxford

Abstract

Biology has undergone a revolution over the last decade as we can now sequence all of an organism's DNA and reconstruct its complete blueprint. A fundamental question in biology is how are different parts of this DNA blueprint used in different situations (a nerve cell compared to a blood cell) when the both contain the same DNA sequence. The most basic expression of a genome's activity is which genes are turned on to produce the proteins of a cell and so determine which type of cell it is. Technical advances in sequencing technologies now allow us to investigate how and in what situations particular parts of that blueprint are used. This shows a very complex control mechanism where small sections of DNA are scattered around the DNA of the genes they control. This distribution makes it difficult to know which elements control which gene and so decode the instructions in the blueprint. We have developed an approach that allows us to see which control regions interact with which genes. We intend to use this to work out how these regions control genes, but also what happens when these region are damaged and leave us vulnerable to certain diseases.

Technical Summary

It has become clear over the last 10 years that control of gene expression is very often not determined by the promoter of the gene, but rather by variable numbers of regulatory elements, which are unpredictably distributed in and around the genes themselves. These so called distal regulatory elements are situate away from the genes the control, often over large genomic distances and are separated by intervening unrelated genes or even embedded in the introns of other genes. The inability to link any given gene with its regulatory elements is a fundamental problem which has greatly hampered our understanding of gene regulation. Although of great biological importance, the role of these elements in health and disease is becoming increasing clear as the effect of mutations in these regions begins to come to light. In particular the vast majority of predicted variants in the normal population, which from GWAS studies, are associated with disease susceptibility are not found in coding regions and are thought to be acting through changes in distal regulatory elements. To address this problem have developed a high resolution and high throughput method to robustly map the physical interactions between genes and their regulatory elements. Using this Capture-C method we have mapped en masse the regulatory interactions of over 450 genes in the mouse erythroid system. The Capture-C method massively increases the number of genes with linked regulatory elements and we intend to use this ability to study how the molecular events at regulatory elements coordinate expression from linked gene promoters. Our experimental model is the extremely well characterised murine erythroid differentiation system. This model provides large numbers of cells from well defined stages of differentiation which will allow us to study the transcriptional regulation of these genes in a dynamic fashion and includes two extremely well studied paradigms of gene regulation, the ? and ? globin loci. To study changes in protein binding we use high-resolution versions of the DNAse-seq and ChIP-seq assays and to determine the expression status of the genes under study we have developed highly quantitative RNA assays using metabolic labelling of nascent RNA. Additionally we will use our human erythroid model to investigate the effect of previously characterised GWAS SNPs responsible for normal variation of erythroid parameters to determine the underlying principals of how regulatory variants can affect human health and disease susceptibility.

Publications

10 25 50
 
Description Member of American society of Hematology commitee for scientific affairs
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL https://www.hematology.org/about/governance/standing-committees/scientific-affairs
 
Description DPhil Studentship
Amount £161,673 (GBP)
Funding ID 109110/Z/15/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2016 
End 09/2020
 
Description Dynamics in the regulatory genome.
Amount £162,000 (GBP)
Funding ID 220046/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2019 
End 09/2022
 
Description Quinquennial Revew
Amount £29,000,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2017 
End 03/2022
 
Title CSynth 
Description Pilot project for the dynamic visualization of 3D nuclear structure 
Type Of Material Technology assay or reagent 
Year Produced 2017 
Provided To Others? Yes  
Impact Developed in collaboration with Stephen Taylor and Goldsmiths university this tool allows for the interaction and interrogation of 3-Dimensional Chromatin structure. The aim being to provide and an intuitive way of humans interacting with complex 3D structure in the nucleus to further our understanding of gene regulation. 
URL http://www.csynth.org/
 
Title Next Generation Capture-C 
Description A vast more sensitive version of the original Capture-C assay. 
Type Of Material Technology assay or reagent 
Year Produced 2016 
Provided To Others? Yes  
Impact This approach is now being used world-wide to interrogate question of gene regulation and the effect of sequence changes associated with common disease. 
 
Title RASER-FISH 
Description RASER-FISH stands for resolution after single-strand exonuclease resection and maximally retains nuclear structure while performing FISH experiments. Published in Nature Communications/ 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? Yes  
Impact We show that this region forms an erythroid-specific, decompacted, self-interacting domain, delimited by frequently apposed CTCF/cohesin binding sites early in terminal erythroid differentiation, and does not require transcriptional elongation for maintenance of the domain structure. Formation of this domain does not rely on interactions between the a-globin genes and their major enhancers, suggesting a transcription-independent mechanism for establishment of the domain. However, absence of the major enhancers does alter internal domain interactions. Formation of a loop domain therefore appears to be a mechanistic process that occurs irrespective of the specific interactions within. 
 
Title Sasquatch 
Description An online version of our computational tools for high throughput prediction of the effect of SNPs on transcription factor binding in non-coding regulatory regions using DNase footprint meta-analysis 
Type Of Material Technology assay or reagent 
Year Produced 2017 
Provided To Others? Yes  
Impact This tool allows for the prioritization of genomic sequence variants based of the impact on the formation of DNA foorprints 
URL http://apps.molbiol.ox.ac.uk/sasquatch/cgi-bin/foot.cgi
 
Title TRI-C 
Description TRI-C is a multiplex multiways 3C assay, published in Nature Genetics that maps the coincident and simultaneous interaction between related regulatory elements in the mammalian genome. 
Type Of Material Technology assay or reagent 
Year Produced 2019 
Provided To Others? Yes  
Impact TRI-C showed for the first time that regulatory elements cluster in 3D space, which has profound implications for our understanding of mammalian gene regulation. In follow up work it was used to produce a revised model for promoter competition, published in Nature Comummincations. 
 
Title Tiled-C 
Description Tiled-C is an adaptation of the Capture-C technologies that generate ultrdeep Hi-C like data and is applicable to very small cells numbers. At present under consideration in nature communications. 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact Tiled-C has revised our current understanding of the link between the regulatory structure or the genome, epigenetic activity and gene expression. 
 
Title scaRNA-seq 
Description A method to detect and quantify the amount of promoter-proximal pausing found at the level of individual genes. 
Type Of Material Technology assay or reagent 
Year Produced 2021 
Provided To Others? Yes  
Impact Gene transcription occurs via a cycle of linked events, including initiation, promoter-proximal pausing, and elongation of RNA polymerase II (Pol II). A key question is how transcriptional enhancers influence these events to control gene expression. Here, we present an approach that evaluates the level and change in promoter-proximal transcription (initiation and pausing) in the context of differential gene expression, genome-wide. This combinatorial approach shows that in primary cells, control of gene expression during differentiation is achieved predominantly via changes in transcription initiation rather than via release of Pol II pausing. Using genetically engineered mouse models, deleted for functionally validated enhancers of the a- and ß-globin loci, we confirm that these elements regulate Pol II recruitment and/or initiation to modulate gene expression. Together, our data show that gene expression during differentiation is regulated predominantly at the level of initiation and that enhancers are key effectors of this process. 
URL https://www.sciencedirect.com/science/article/pii/S1097276521000022?via%3Dihub
 
Description Dark Matter Project 
Organisation New York University
Country United States 
Sector Academic/University 
PI Contribution I have been made a member of the Dark Matter project based on our work to predict 3D genome structure using deep neural network approaches and build genome regulatory domains from scratch to understand the principles of gene regulation in the mammalian genome.
Collaborator Contribution The Dark Matter Project are expert in the use of large-scale synthetic biology approaches and have agreed to build from scratch 10 complete regulatory domains guided by our machine learning approaches. These will be integrated into the mouse genome to test basic principles of mammalian gene regulation
Impact The partnership has only just been initiated.
Start Year 2021
 
Description MRC Human Genetic Unit Edinburgh 
Organisation Medical Research Council (MRC)
Department MRC Human Genetics Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a scientific collaboration with Prof Nick Gilbert and Davide Marenduzzo, based around an application for a Wellcome Trust Investigator award to work on the molecular basis of 3D genome interaction that direct gene expression.
Collaborator Contribution All three group are expert in different aspect of the 3D genome and have formed a collaboration to use their expertise in collaboration and to ask for funding from the Wellcome trust to support this work.
Impact The collaboration has just been initiated.
Start Year 2021
 
Title DeepC: predicting 3D genome folding using megabase-scale transfer learning. 
Description Predicting the impact of noncoding genetic variation requires interpreting it in the context of three-dimensional genome architecture. We have developed deepC, a transfer-learning-based deep neural network that accurately predicts genome folding from megabase-scale DNA sequence. DeepC predicts domain boundaries at high resolution, learns the sequence determinants of genome folding and predicts the impact of both large-scale structural and single base-pair variations. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact DeepC allows for the prediction of the effect of both large-scale and single base-pair changes of the regulatory structure of the genome and so provide sa platform for the identifications of pathogenic mutations in genome sequence. 
URL https://www.nature.com/articles/s41592-020-0960-3
 
Title Lanceotron 
Description Genomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionised molecular biology, generating a complete genome's worth of signal in a single assay. The challengeis no longer data generation, it's effectively and reproducibly extracting biological meaning from such massively complex datasets. While other tools approach this problem with simple statistical tests, our novel machine learning model uses a convolutional neural network, local genomic enrichments measurements, and Poisson-based significance testing from multiple viewpoints, all integrated using a multilayer perceptron to give a probability of being a true biological signal. We hand-labelled 499Mb of genomic data, built 5,000 models, and tested with over 100 unique users from labs around the world. And because it's built on the powerful MLV visualisation software, results can easily be visualised and shared with collaborators or reviewers. The culmination of these efforts is a peak caller that can extract more from your data, create interactive charts, and improve interpretability - all while simplifying the analysis process. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Impact This software allows biologists to extract meaningful data from complex genomics datasets and aims to address the reproducibility crisis in the biological sciences. 
URL https://lanceotron.molbiol.ox.ac.uk/
 
Title Multi Locus Viewer 
Description Tracking and understanding data quality, analysis and reproducibility are critical concerns in the biological sciences. This is especially true in genomics where Next Generation Sequencing (NGS) based technologies such as ChIP-seq, RNA-seq and ATAC-seq are generating a flood of genome-scale data. These data-types are extremely high level and complex with single experiments capable of mapping 10-100's of thousands of biologically meaningful events across the genome. However, such data are usually processed with automated pipelines resulting in tabular outputs, which are difficult to verify and interpret without looking at the underlying data and combining it with data from other experiments. Conventional genome browsers are limited to single locations and do not allow for interactions with the dataset as a whole. MLV has been developed to allow users to fluidly interact with genomics datasets at multiple scales, from complete metadata labelled and clustered populations to detailed representations of individual elements. It has inbuilt tools to integrate signals across multiple dataset and to perform dimensionality reduction and clustering analysis. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Impact NA 
URL https://mlv.molbiol.ox.ac.uk/
 
Company Name Nucleome Therapeutics 
Description Nucleome Therapeutics is using its dramatically enhanced understanding of how genetic variation drives disease to identify disease affected pathways and discover new drug targets for autoimmune diseases, a market worth in excess of $35bn. Its proprietary platform, combining world-leading expertise from the University of Oxford in bioinformatics, genomics and the 3D structure of chromosomes, enables it to leverage the billions of dollars that have been invested in human genome research. Previously, mining this valuable data to find new drug targets was hindered by the location of the majority (95%) of disease-associated genetic variants in DNA that does not code for proteins and is therefore difficult to study. Using machine learning and 3D chromosomal analysis, Nucleome Therapeutics is able to link these variants to the gene targets they influence, followed by validation of the targets at scale. This unique combination of computational and experimental approaches, which is applicable across all cell types and tissues, is expected to identify derisked, high quality targets across multiple diseases. Having validated the platform, the company is focused on the identification in lymphocytes of new drug targets for autoimmune disease, against which it will develop small molecule drugs. Potential in other cell types and applications such as drug repurposing will be investigated via partnership. 
Year Established 2019 
Impact Nucleome Therapeutics was spun out from the University of Oxford in 2019 after a decade of foundational research, with seed investment of £5.2m from Oxford Sciences Innovation.
Website https://www.nucleome.com/home
 
Description Royal Society Summer Exhibition 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact XXX
Year(s) Of Engagement Activity 2017
 
Description Visits to Schools 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Based on the demonstration and science we developed for the Royal society and New Scientist live events, we have restructured this into a more mobile experience which a rolling series of volunteers have been taking to various schools in and around the Oxford area.
Year(s) Of Engagement Activity 2017,2018