Using Mathematical Modelling to Deconstruct Transcription
Lead Research Organisation:
University of Oxford
Department Name: Biochemistry
Abstract
Transcription is the first step in the complex process of gene expression that brings the characteristics of a cell into being, by transcribing the DNA code of genes into RNA copies, transcripts, which will subsequently be translated into proteins, the workhorses of a cell. As such, transcription is a highly regulated process that when disrupted can cause disease. Thus, there is a real need to understand all aspects of transcriptional regulation and exactly where and how it can go wrong.
The best-known regulators of transcription are the DNA binding transcription factors which were thought to act as on/off switches for transcription by RNA polymerase. However, recent data suggests that many transcription factors do not act as the decision point for simply switching a gene on or off but at various stages following the start of transcription, during transcription elongation. Indeed, we have recently shown that the amount of the accessory factors associated with RNA polymerase, known as transcription elongation factors, is determined by the DNA-bound transcription factors, and that this differs on individual genes and with environmental conditions.
In order to explore the various stages of transcription, and the impact of transcription factors on these events, we have developed a mathematical model that is trained on experimental data and can determine which of the many stages of transcription changes with mutation or environmental variation. We have begun the process of extending the model by including further details of the transcriptional process and training it on experimental data from humans and yeast, so that it can be applied generally. The purpose of this work to develop the model and exploit its predictive power so that we can describe the key steps at which the transcription of a gene is regulated and how this is likely to change when conditions change. The results of our modelling will enable us to move to a better understanding of how transcription is disrupted when environmental conditions change, when organisms are stressed and in disease where it may well identify potential therapeutic targets. We intend that the model becomes widely dispersed in the academic community.
The best-known regulators of transcription are the DNA binding transcription factors which were thought to act as on/off switches for transcription by RNA polymerase. However, recent data suggests that many transcription factors do not act as the decision point for simply switching a gene on or off but at various stages following the start of transcription, during transcription elongation. Indeed, we have recently shown that the amount of the accessory factors associated with RNA polymerase, known as transcription elongation factors, is determined by the DNA-bound transcription factors, and that this differs on individual genes and with environmental conditions.
In order to explore the various stages of transcription, and the impact of transcription factors on these events, we have developed a mathematical model that is trained on experimental data and can determine which of the many stages of transcription changes with mutation or environmental variation. We have begun the process of extending the model by including further details of the transcriptional process and training it on experimental data from humans and yeast, so that it can be applied generally. The purpose of this work to develop the model and exploit its predictive power so that we can describe the key steps at which the transcription of a gene is regulated and how this is likely to change when conditions change. The results of our modelling will enable us to move to a better understanding of how transcription is disrupted when environmental conditions change, when organisms are stressed and in disease where it may well identify potential therapeutic targets. We intend that the model becomes widely dispersed in the academic community.
Technical Summary
Mathematical modelling of complex processes such as transcription enable us to refine our understanding of genes and their regulatory mechanisms by describing the underlying processes that give rise to the shape of their transcriptional profiles. The data from techniques (NET-seq, GRO-seq, PRO-seq and TT-seq) for assessing nascent transcription produces a profile which is the sum of the various events during transcription: initiation, pausing, backtracking, early termination, elongation and termination. Each technique assesses different aspects of transcription giving distinct profiles for the same genes under the same conditions. An ideal situation would be to amalgamate all data types to give a more holistic view. A mathematical model is ideal for this approach as it can be constructed to account for each technique's strengths and weaknesses and find the underlying mechanisms that best fit all of the data so giving each of the events of transcription a relative importance at different genes and under different conditions. Using our own data, we have shown that the shape of these profiles changes when transcriptional regulatory factors are ablated or environmental conditions change, supporting our hypothesis that the dynamics of transcription giving rise to different profiles are an important component of gene regulation. In this work we will extend and refine our basic mathematical model of transcription, so that using the shape of a transcriptional profile we can interpret how individual genes are regulated, predict how this changes over time and with environmental change, and define more precisely the steps in transcription that are influenced by transcription factors, transcription elongation factors, termination factors and chromatin. This work aims to provide a step change in the understanding of how the regulation of gene expression goes far beyond the simple activation or repression of genes and can be facilitated during the process of transcription itself.
Planned Impact
Who will benefit and how will they benefit?
Other Scientists
We aim to release the model as analysis software for other groups to use and so be better able to analyse their transcription data. We will advertise the resource at International Transcription Meetings over the three year period of funding including the CSH Mechanisms of Eukaryotic Transcription Meeting in 2019 and 2021 the EMBL Transcription and Chromatin Meetings in 2020 and 2022. Both PIs will attend one or both of these meetings. In addition, our TT-seq and NET-seq data sets will be available to collaborators pre-publication and to the community at publication.
A major aim is to increase the number of trained interdisciplinary scientists comfortable with modelling complex data sets, who will then be able to encourage further adoption of the approach and themselves supervise interdisciplinary science. This is something that will become ever more important as experimental techniques become more sophisticated and large volumes of genome-wide data are generated. To facilitate this, both PIs are actively involved in recruiting and training graduate students and post-docs to use bioinformatics and mathematical modelling in their day to day research and this will continue into the life of this grant.
Societal Impact
In addition to the fundamental direct scientific benefits of the proposed research, there are a number of indirect ways in which it could impact society. The most prominent of these is the potential for increased understanding of diseases such as Alzheimer's, Parkinson's, ALS, asthma and cardiovascular disease whose causes are thought to include impaired regulation of the transcriptional process. The proposed research could further benefit the diagnosis and treatment of such diseases. In terms of diagnosis, the improved ability to interpret transcriptional profiles will pave the way for personalised medicine in the form of the analysis of the transcriptional profiles from individual patients which could form a broad-spectrum diagnosis or identify specific subtypes of a disease. This will require technical advances with transcription mapping but recent reports suggest such approaches are feasible for laboratory cultured patient-derived cancer cells and will be developed for other cell types in the future. This sort of approach could be used to (i) detect transcriptional changes related to disease, (ii) see how transcription changes in response to disease or chemical agents in viable cell culture lines with possible applications as a safety test for drugs that are based on transcription factors for example the TFs themselves and small molecules that target them, (iii) extended to safety test drugs that target transcription elongation factors (TEFs) in addition to transcription factors (TFs) as both affect different aspects of transcription dynamics. These sorts of approach could give new insights into the mechanisms of action of new classes of compounds designed to target TEFs and TFs and also facilitate identification of the most suitable TFs or TEFs to target with target with small-molecule inhibitors.
Mathematical modelling approaches are highly suitable for public engagement. Complex systems must be reduced to a minimal set of components and interactions for an effective model to be constructed, and this leads to the ability to present highly intricate molecular systems on a conceptual level. Additionally, simple stochastic models, of the kind to be employed in the proposed research, translate rather well into animations which can be highly engaging for both non-specialist and specialist audiences. Some animations will be produced in-house and we will utilise the local Oxford Sparks service to produce a professional animation from our research for use in public lectures and displays at a cost of £9000.
Other Scientists
We aim to release the model as analysis software for other groups to use and so be better able to analyse their transcription data. We will advertise the resource at International Transcription Meetings over the three year period of funding including the CSH Mechanisms of Eukaryotic Transcription Meeting in 2019 and 2021 the EMBL Transcription and Chromatin Meetings in 2020 and 2022. Both PIs will attend one or both of these meetings. In addition, our TT-seq and NET-seq data sets will be available to collaborators pre-publication and to the community at publication.
A major aim is to increase the number of trained interdisciplinary scientists comfortable with modelling complex data sets, who will then be able to encourage further adoption of the approach and themselves supervise interdisciplinary science. This is something that will become ever more important as experimental techniques become more sophisticated and large volumes of genome-wide data are generated. To facilitate this, both PIs are actively involved in recruiting and training graduate students and post-docs to use bioinformatics and mathematical modelling in their day to day research and this will continue into the life of this grant.
Societal Impact
In addition to the fundamental direct scientific benefits of the proposed research, there are a number of indirect ways in which it could impact society. The most prominent of these is the potential for increased understanding of diseases such as Alzheimer's, Parkinson's, ALS, asthma and cardiovascular disease whose causes are thought to include impaired regulation of the transcriptional process. The proposed research could further benefit the diagnosis and treatment of such diseases. In terms of diagnosis, the improved ability to interpret transcriptional profiles will pave the way for personalised medicine in the form of the analysis of the transcriptional profiles from individual patients which could form a broad-spectrum diagnosis or identify specific subtypes of a disease. This will require technical advances with transcription mapping but recent reports suggest such approaches are feasible for laboratory cultured patient-derived cancer cells and will be developed for other cell types in the future. This sort of approach could be used to (i) detect transcriptional changes related to disease, (ii) see how transcription changes in response to disease or chemical agents in viable cell culture lines with possible applications as a safety test for drugs that are based on transcription factors for example the TFs themselves and small molecules that target them, (iii) extended to safety test drugs that target transcription elongation factors (TEFs) in addition to transcription factors (TFs) as both affect different aspects of transcription dynamics. These sorts of approach could give new insights into the mechanisms of action of new classes of compounds designed to target TEFs and TFs and also facilitate identification of the most suitable TFs or TEFs to target with target with small-molecule inhibitors.
Mathematical modelling approaches are highly suitable for public engagement. Complex systems must be reduced to a minimal set of components and interactions for an effective model to be constructed, and this leads to the ability to present highly intricate molecular systems on a conceptual level. Additionally, simple stochastic models, of the kind to be employed in the proposed research, translate rather well into animations which can be highly engaging for both non-specialist and specialist audiences. Some animations will be produced in-house and we will utilise the local Oxford Sparks service to produce a professional animation from our research for use in public lectures and displays at a cost of £9000.
Organisations
Publications
Fischl H
(2020)
Cold-induced chromatin compaction and nuclear retention of clock mRNAs resets the circadian rhythm.
in The EMBO journal
Jeronimo C
(2021)
FACT is recruited to the +1 nucleosome of transcribed genes and spreads in a Chd1-dependent manner
in Molecular Cell
Uzun Ü
(2021)
Spt4 facilitates the movement of RNA polymerase II through the +2 nucleosomal barrier.
in Cell reports
Xi S
(2024)
Size fractionated NET-Seq reveals a conserved architecture of transcription units around yeast genes.
in Yeast (Chichester, England)
| Title | How can we use mathematical modelling to understand transcription? |
| Description | We have commissioned a short video which is currently being produced. It is delayed due to staff shortages but we have |
| Type Of Art | Film/Video/Animation |
| Year Produced | 2023 |
| Impact | This video is destined to be accessible from the laboratory websites and on social media. We will use it in talks and public engagement and at open days. It explains in simple language, supported by accessible 2D and 3D animations, how mathematical modelling can give us insights into the processes behind diseases. |
| Description | We have used machine learning to train an algorithm to described the pattern RNA polymerase makes as it transcribes a gene in yeast and mammalian cells. When combined with drugs or mutations the specific effect on RNA polymerase can be assigned at base pair resolution, offering a new way of . In addition, we have adapted our methods to examine short, poorly transcribed transcription units in yeast and linked these to regions that are involved in higher order structures in the genome. |
| Exploitation Route | We have presented this methodology at many meeting and conferences. The methods developed here and in other work are being applied by other groups. We are stressing that any pattern, shape or distribution of data can be subject to this type of analysis. In addition, clustering into subgroups can strategy genes into different types in an unbiased way. |
| Sectors | Pharmaceuticals and Medical Biotechnology |
| URL | http://doi.org/10.1002/yea.3931 |
| Description | We have been in conversation with IBM to discuss using AI to model shapes in biology to define underlying parameters. We are particularly interested in how shapes change in disease states and are currently using profiles from mutants associated with diseases to begin this process. We are also linked newly identified low abundance transcription units as potential sites at which anchor sites for high order structures are formed and we are exploring this with a local spin out Oxford BioDynamics plc, who use higher order structures as prognostic and diagnostic biomarkers. |
| First Year Of Impact | 2023 |
| Sector | Pharmaceuticals and Medical Biotechnology |
| Impact Types | Economic |
| Title | A mathematical model that describes the mechanism of transcription used by both yeast and human RNA polymerase II |
| Description | We have developed a mathematical model that is able to describe the parameters contributing to the profile of nascent transcription at any gene in yeast or humans (despite the perceived view that the mechanisms in the two organisms are very different). We used profiles of nascent transcription generated by native elongating transcript sequencing (NETseq) as this captures all states of RNA polymerase II (elongating, paused, terminating, backtracked etc). We used 52 models made up of various different mechanisms involved in transcription and sampled 150,000 parameters uniformly for each model via a latin hypercube to obtain the unknown parameter in each model. For each set of parameters, the mechanisms involved in transcription were simulated for 10,000 genes for 40 minutes with increments of 0.005 mins per time set to allow the system to reach steady-state. Use the lowest Kolmogorov-Smirnov statistic to assess the best-fit parameter set to the raw NETseq distribution. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2021 |
| Provided To Others? | Yes |
| Impact | The results obtained with the model have been widely cited. We continue to develop and refine the model and have two further manuscripts in preparation. |
| Description | Oxford Science Festival October 2019 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Public/other audiences |
| Results and Impact | We ran a stand in the local shopping centre as part of the Oxford Science Festival demonstrating how structural biology relates to drug discovery and the real world. We had lots of children and their parents who modelled crystal structures using Jelly babies and cocktail sticks, made their own crystals by mixing reagents and examined crystals under the microscope. Great fun and educational. Many of the parents were particularly interested. |
| Year(s) Of Engagement Activity | 2019 |
| URL | http://sciencefestivals.uk/festivals/oxfordshire-science-festival |
