Mathematical and Computational Modelling of Transcription Factor Target Finding in Eukaryotic Organisms

Lead Research Organisation: University of Cambridge
Department Name: School of Biological Sciences

Abstract

The DNA in all cells of an organism is the same. What makes the great variety of tissues, or the difference between healthy and diseased tissue, is the specific set of genes that are active. This activity state is regulated by molecules called transcription factors, proteins that bind in the vicinity of genes and determine whether a gene is on or off. Although they play important roles in many biological processes, transcription factors are amongst the least abundant and most enigmatic proteins in the cell. How do they home in on their binding sites if they are so sparse? What biophysical laws govern their movements in the nucleus? The answers to these questions have important medical implications. Previous mathematical models have provided estimates for both the mode and rate at which transcription factors find their target sites. However, these models were at a level of mathematical abstraction in which the genome is modelled as a one-dimensional string. Modern experimental methods generate huge amounts of data to inform researchers about this accessibility state of certain genomic regions and have a spatial component, thus, allowing us to build models of binding that incorporate spatial effects. This work will inform ongoing studies that try to explain how molecular interactions at the DNA translate into the observed binding behaviours.

Technical Summary

Site-specific transcription factors (TFs) are proteins that bind to specific sites on the genome and regulate the activity of one or more genes. TFs control gene activity by either increasing or reducing the transcription rate of their target genes. Thus, TFs play a crucial role in the fine-tuned cellular response to developmental, physiological or environmental signals. Eukaryotic gene regulatory programmes usually require the concerted binding of a defined combination of TFs to cis-regulatory modules, short genomic regions that may be many kilobases away from the actual transcription start sites around the genes‘ promoters. Much of transcriptional regulation is already understood; but while the in vitro biochemistry of TF-DNA interactions is increasingly well established, we still have no clear picture how TFs target the appropriate binding sites in a large genome.
Therefore, the central question addressed in this fellowship is: how do TFs find their genomic binding sites? Approximations to this question can be provided by statistical physics, in particular the law of mass action, but also using mathematical and computational methodologies that I worked on as a PhD student.
Current biophysical models often assume that all TF molecules diffuse in a perfectly homogeneous reservoir around a one-dimensional genome, from where they can bind only to more-or-less specific sites on the DNA. Spatial aspects are frequently ignored in these models, and thus, any deviations from the assumption of one-dimensionality etc. will undoubtedly introduce an error in these analyses. In order to get a better understanding of the target finding process, it is essential to consider spatial dimension since nuclear processes occur non-homogeneously in three dimensions.
In the proposed research, I aim to develop a mathematical model that describes how TF molecules target their genomic binding sites under consideration of the local chromatin environment. Model building will be informed by genome-wide data on nucleosome occupancy, experimentally determined transcription factor binding sites and chromatin modification data. Using the model, I will address questions such as: How do chromatin modifications influence target site finding? What role does nucleosome occupancy of the DNA play? My aim is to establish under which specific conditions these spatial aspects make the strongest contribution to gene regulation and if there are any situations where the local chromatin environment can be neglected.
A reliable mathematical/computational model of transcription factor target site identification under consideration of the local chromatin environment will advance the theoretical understanding of the biophysical laws governing the gene regulation process in eukaryote organisms.

Publications

10 25 50