Genome wide identification of priming sites for Okazaki fragments

Lead Research Organisation: University of Warwick
Department Name: Warwick Medical School

Abstract

Our genetic blueprint is encoded by chromosomal molecules that consist of DNA as well as chromatin. The chromatin determines expression states of individual genes. As we go through life, cells get damaged and die. Thus, tissues are constantly regenerating through cell divisions. Every time a cell divides, the DNA and the associated chromatin have to be duplicated. This happens by the process of DNA replication. DNA replication is initiated at specific sites in the genome called origins, and progresses in a bidirectional manner from these. Since DNA consists of two anti-parallel strands, and these two strands only can be copied in one direction, one of the strands, the leading, is synthesized in a continuous manner, while the other, the lagging, is synthesized as fragments that are subsequently put together to form the final product. Each time the DNA replication is initiated on the lagging strand, a priming event takes place, but virtually nothing is known about what determines where these primers are put down. We have developed a novel method for determining the position of the priming sites on the genomic level using Next Generation Sequencing. Using this technical breakthrough, we will answer fundamental questions about the replication process that would not have been possible to answer before. We will be able to identify positions in the genome where the replication process slows down or terminates, as at these positions priming is site-specific. We also will be able to identify region were there are "problems" during the replication process due to infrequent or too frequent priming events. Finally, we will investigate whether chromatin affects where primers are put down. The project will answer some very fundamental questions about the replication process underlying all cellular life, and will increase our knowledge about what can go wrong during this essential process, thus giving us insight into the genetic instability that underlies cancer and ageing.

Technical Summary

DNA replication is an asymmetric process where one strand, the leading, is replicated in a continuous manner while the other, the lagging, is synthesized as fragments that are then combined to form the intact strand. Polymerase alpha and primase act together to initiate synthesis for each fragment on the lagging-strand; first primase puts down short RNA primers that are then extended by polymerase alpha. Subsequently, the polymerase alpha/primase complex is replaced by polymerase delta that completes the synthesis. Importantly, virtually nothing is known about what determines where primers are put down during the replication process in vivo. We have developed a method for determining the positions of priming sites on the genomic level. The method utilizes size fractionation of replication intermediates on alkali sucrose gradients, in a manner similar to the original work of Okazaki. The purified fragments are then made double stranded and sequenced using the Illumina Next Generation Sequencing platform. We will use this method to address the following questions: 1) what are the sequence requirements for primase/polymerase alpha? 2) are there sequences with higher or lower priming frequency and are such sequences genetically unstable? 3) are there site-specific priming events at all replication barriers and can we use these to identify all Swi1-dependent replication barriers? 4) does chromatin affect fork progression as well as where primers are put down? 5) is there priming on the leading strand when replication restart occurs? We are in a unique position to address these fundamental biological questions using this new technology.

Planned Impact

The proposed research will increase the understanding of the fundamental process of DNA replication that underlies all cellular life. Factors involved in DNA replication are major drug targets for the treatment of cancer, thus long-term the knowledge will help underpin the development of better treatments for cancer and potentially for the prevention of ageing. The Pharmaceutical industry will therefore benefit from this project. In addition, the biotech industry will potentially benefit. One key biotechnological goal is to create synthetic life, but this requires an understanding of the fundamental processes, one of which is DNA replication. Our project will give a unique insight into how DNA replication occurs and what problems arise during an unperturbed replication process. This knowledge would not be possible to obtain by other means. The UK employment sector will also benefit, as the project will be a unique training opportunity for the two PDRAs involved. They will acquire a set of skills that is in high demand in the industry. Finally, the general public will benefit. We will regularly disseminate our findings to the general public as well as to the scientific community. This will occur both in publications and presentations as part of public events here at the University of Warwick. We will also actively visit schools to tell about our research.
 
Description Sequencing technologies are extremely powerful, allowing biological processes at the cell level to be scutinised in detail, but challenging to analyse. We developed a statistical (Bayesian) algorithm to fit stochastic mechanistic models to sequencing data, thereby determining model parameters and quantifying sources of stochasticity (noise). Cellular processes are inherently stochastic because they involve recruitment, and activation, of small numbers of proteins. We analysed genome (DNA) replication in yeast cells. The genome is replicated by generation of bidirectional replication forks from privileged sites along the genome called origins. This is stochastic, with different origins being used and those origins firing at different times in different cells. We fitted a model of origin firing and DNA replication to sequencing data for yeast, demonstrating that both origin choice and origin firing contribute significant stochasticity, and quantified, for the first time, the frequency with which origins are used in a population of cells. Our model and algorithm can be applied to other organisms, whilst the ideas are transferable to other biological questions and data sets. This approach represents a pardigm shift in the analysis of sequencing data using mechanistic models.
Exploitation Route We overcame a significant analysis challenge with our algorithm, in particular using an approximate likelihood in place of an intractable one, and required sophisticated Markov chain Monte Carlo methods to handle the complexity in the data. The ideas are highly transferrable to other analysis challenges in next generation sequencing.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://github.com/albazarova/DNAorigins
 
Title DNAorigins 
Description A Bayesian analysis algorithm for NGS data that fits the generative model of origin firing as described in "Bayesian inference of origin firing time distributions, origin interference and licensing probabilities from NGS data", Bazarova et al, NAR 2019. 
Type Of Technology Software 
Year Produced 2019 
Impact Just released. Used in our paper to analyse NGS data. 
URL https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkz094/5319135