Using Mathematical Modelling to Deconstruct Transcription

Lead Research Organisation: University of Oxford
Department Name: Biochemistry

Abstract

Transcription is the first step in the complex process of gene expression that brings the characteristics of a cell into being, by transcribing the DNA code of genes into RNA copies, transcripts, which will subsequently be translated into proteins, the workhorses of a cell. As such, transcription is a highly regulated process that when disrupted can cause disease. Thus, there is a real need to understand all aspects of transcriptional regulation and exactly where and how it can go wrong.

The best-known regulators of transcription are the DNA binding transcription factors which were thought to act as on/off switches for transcription by RNA polymerase. However, recent data suggests that many transcription factors do not act as the decision point for simply switching a gene on or off but at various stages following the start of transcription, during transcription elongation. Indeed, we have recently shown that the amount of the accessory factors associated with RNA polymerase, known as transcription elongation factors, is determined by the DNA-bound transcription factors, and that this differs on individual genes and with environmental conditions.

In order to explore the various stages of transcription, and the impact of transcription factors on these events, we have developed a mathematical model that is trained on experimental data and can determine which of the many stages of transcription changes with mutation or environmental variation. We have begun the process of extending the model by including further details of the transcriptional process and training it on experimental data from humans and yeast, so that it can be applied generally. The purpose of this work to develop the model and exploit its predictive power so that we can describe the key steps at which the transcription of a gene is regulated and how this is likely to change when conditions change. The results of our modelling will enable us to move to a better understanding of how transcription is disrupted when environmental conditions change, when organisms are stressed and in disease where it may well identify potential therapeutic targets. We intend that the model becomes widely dispersed in the academic community.

Technical Summary

Mathematical modelling of complex processes such as transcription enable us to refine our understanding of genes and their regulatory mechanisms by describing the underlying processes that give rise to the shape of their transcriptional profiles. The data from techniques (NET-seq, GRO-seq, PRO-seq and TT-seq) for assessing nascent transcription produces a profile which is the sum of the various events during transcription: initiation, pausing, backtracking, early termination, elongation and termination. Each technique assesses different aspects of transcription giving distinct profiles for the same genes under the same conditions. An ideal situation would be to amalgamate all data types to give a more holistic view. A mathematical model is ideal for this approach as it can be constructed to account for each technique's strengths and weaknesses and find the underlying mechanisms that best fit all of the data so giving each of the events of transcription a relative importance at different genes and under different conditions. Using our own data, we have shown that the shape of these profiles changes when transcriptional regulatory factors are ablated or environmental conditions change, supporting our hypothesis that the dynamics of transcription giving rise to different profiles are an important component of gene regulation. In this work we will extend and refine our basic mathematical model of transcription, so that using the shape of a transcriptional profile we can interpret how individual genes are regulated, predict how this changes over time and with environmental change, and define more precisely the steps in transcription that are influenced by transcription factors, transcription elongation factors, termination factors and chromatin. This work aims to provide a step change in the understanding of how the regulation of gene expression goes far beyond the simple activation or repression of genes and can be facilitated during the process of transcription itself.

Planned Impact

Who will benefit and how will they benefit?
Other Scientists

We aim to release the model as analysis software for other groups to use and so be better able to analyse their transcription data. We will advertise the resource at International Transcription Meetings over the three year period of funding including the CSH Mechanisms of Eukaryotic Transcription Meeting in 2019 and 2021 the EMBL Transcription and Chromatin Meetings in 2020 and 2022. Both PIs will attend one or both of these meetings. In addition, our TT-seq and NET-seq data sets will be available to collaborators pre-publication and to the community at publication.

A major aim is to increase the number of trained interdisciplinary scientists comfortable with modelling complex data sets, who will then be able to encourage further adoption of the approach and themselves supervise interdisciplinary science. This is something that will become ever more important as experimental techniques become more sophisticated and large volumes of genome-wide data are generated. To facilitate this, both PIs are actively involved in recruiting and training graduate students and post-docs to use bioinformatics and mathematical modelling in their day to day research and this will continue into the life of this grant.

Societal Impact

In addition to the fundamental direct scientific benefits of the proposed research, there are a number of indirect ways in which it could impact society. The most prominent of these is the potential for increased understanding of diseases such as Alzheimer's, Parkinson's, ALS, asthma and cardiovascular disease whose causes are thought to include impaired regulation of the transcriptional process. The proposed research could further benefit the diagnosis and treatment of such diseases. In terms of diagnosis, the improved ability to interpret transcriptional profiles will pave the way for personalised medicine in the form of the analysis of the transcriptional profiles from individual patients which could form a broad-spectrum diagnosis or identify specific subtypes of a disease. This will require technical advances with transcription mapping but recent reports suggest such approaches are feasible for laboratory cultured patient-derived cancer cells and will be developed for other cell types in the future. This sort of approach could be used to (i) detect transcriptional changes related to disease, (ii) see how transcription changes in response to disease or chemical agents in viable cell culture lines with possible applications as a safety test for drugs that are based on transcription factors for example the TFs themselves and small molecules that target them, (iii) extended to safety test drugs that target transcription elongation factors (TEFs) in addition to transcription factors (TFs) as both affect different aspects of transcription dynamics. These sorts of approach could give new insights into the mechanisms of action of new classes of compounds designed to target TEFs and TFs and also facilitate identification of the most suitable TFs or TEFs to target with target with small-molecule inhibitors.

Mathematical modelling approaches are highly suitable for public engagement. Complex systems must be reduced to a minimal set of components and interactions for an effective model to be constructed, and this leads to the ability to present highly intricate molecular systems on a conceptual level. Additionally, simple stochastic models, of the kind to be employed in the proposed research, translate rather well into animations which can be highly engaging for both non-specialist and specialist audiences. Some animations will be produced in-house and we will utilise the local Oxford Sparks service to produce a professional animation from our research for use in public lectures and displays at a cost of £9000.
 
Title How can we use mathematical modelling to understand transcription? 
Description We have commissioned a short video which is currently being produced. It is delayed due to staff shortages but we have 
Type Of Art Film/Video/Animation 
Year Produced 2023 
Impact This video is destined to be accessible from the laboratory websites and on social media. We will use it in talks and public engagement and at open days. It explains in simple language, supported by accessible 2D and 3D animations, how mathematical modelling can give us insights into the processes behind diseases. 
 
Description We have successfully applied and refined our model, training it on our high-resolution yeast data. We have described how genetic interventions (we will have up to 20 different factors) involving loss of transcription factors, implication at the initiation or elongation steps of transcription, affect the key steps. We are working to obtain reliable data on transcription elongation rates to enable us to anchor our data to this parameter. We have developed a new high-resolution technique in mammalian cells known as SNU-seq (a metabolic labelling technique) and are currently applying this to yeast for this purpose. The first manuscript is in preparation describing the yeast transcription process. We have also applied the model to mammalian cell data (NET-seq and SNU-seq) and shown that the same model fits both the yeast and mammalian cell data suggesting that the process is fundamentally the same in both organisms, despite their evolutionary distance. We are currently using our elongation data obtained from SNU-seq to refine our mammalian model and are fitting different models to genes with a predominant promoter-proximal pause and those without.

In addition, we have recently been involved in a major collaboration with the Robert laboratory in Montreal modelling their data on how the transcription elongation complex and chaperone FACT influences the movement of RNA polymerase through chromatin. This work was published in Molecular Cell in 2021. We applied our simulations to understand how Spt4 and Spt5 in the DSIF transcription elongation complex influence transcription through chromatin, allowing us to focus on the +2 nucleosome as the focus of activity. This work was published in Cell Reports in 2021. Finally, we are applying the models and simulations to understand more about how stress and metabolism impact transcription. This work suggests that early termination of transcription is the major mechanism by which the cells respond to stress. By contrast, changes in RNA turnover rates are the primary mechanism by which cycling transcript levels are managed. Preliminary manuscripts are submitted to BioRxiv and all manuscripts are currently undergoing revision after the first review. The URLs are listed below for all four outputs that are not yet finally published. As it seems only one URL can be entered in the box below!
https://doi.org/10.1101/2021.07.16.452657
https://doi.org/10.1101/2021.07.14.452379
https://doi.org/10.1101/2021.03.03.433772
https://doi.org/10.1101/833921
Exploitation Route This is not yet clear but we have preliminary data on development and the cell cycle, suggesting that a promoter-proximal pause coordinates waves of transcription elongation required for synchronised responses and that these are disrupted in tumours. This might make a useful tool to predict levels of disruption in disease. We have also used our yeast model to show how mutants in the elongation machinery impact on transcription. As these events impact all aspects of gene expression, this is likely to have a major impact on basic research on gene expression. Much of the experimental data has been obtained in the very last few months of the grant (due to staff problems and delays in getting key equipment) and we are still analysing this data. This will provide fruitful projects for research students and is informing our understanding of the impact of the project. This will be updated in an ongoing way.
Sectors Pharmaceuticals and Medical Biotechnology

URL https://doi.org/10.1101/2021.03.03.433772
 
Description We have been in conversation with IBM to discuss using AI to model shapes in biology to define underlying parameters. We are particularly interested in how shapes change in disease states and are currently using profiles from mutants associated with diseases to begin this process.
First Year Of Impact 2021
Sector Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title A mathematical model that describes the mechanism of transcription used by both yeast and human RNA polymerase II 
Description We have developed a mathematical model that is able to describe the parameters contributing to the profile of nascent transcription at any gene in yeast or humans (despite the perceived view that the mechanisms in the two organisms are very different). We used profiles of nascent transcription generated by native elongating transcript sequencing (NETseq) as this captures all states of RNA polymerase II (elongating, paused, terminating, backtracked etc). We used 52 models made up of various different mechanisms involved in transcription and sampled 150,000 parameters uniformly for each model via a latin hypercube to obtain the unknown parameter in each model. For each set of parameters, the mechanisms involved in transcription were simulated for 10,000 genes for 40 minutes with increments of 0.005 mins per time set to allow the system to reach steady-state. Use the lowest Kolmogorov-Smirnov statistic to assess the best-fit parameter set to the raw NETseq distribution. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact The results obtained with the model have been widely cited. We continue to develop and refine the model and have two further manuscripts in preparation.