Computational Studies of AEGIS Molecular Components

Lead Research Organisation: Cardiff University
Department Name: Chemistry


One of the most important outcomes of the last quarter-century of synthetic biology is the recognition that the biopolymers that have been delivered to us by 4 billion years of biological evolution are not the only molecules that might support genetics, inheritance, evolution, and catalysis. Developing new biopolymers and characterizing their properties in living organisms not only has significance for testing our ideas about the functional optimization of the existing "molecules of life" but also opens new opportunities in biotechnology. Such work also permits insights into questions of the uniqueness of terrean biology and whether life in other parts of the universe could be constructed using alternate chemistries. Considering just DNA and RNA (collectively xNA), it is now clear that the four distinct "standard" building blocks (A, G, C, T and its equivalent U) for xNA do not exhaust the constraints imposed by the two rules guiding Watson-Crick pairing in natural nucleic acids. For example, the number of nucleobase pairs can be increased from two to six by merely rearranging hydrogen bond donor and acceptor groups. Efforts to implement this observation in practice using chemical synthesis, have (so far) led to two "generations" of novel heterocycles that can be incorporated into precursors suitable for use in automated xNA synthesis, thereby yielding artificially expanded genetic information systems (AEGIS).

Although exploiting altered patterns of hydrogen bonding to obtain novel nucleobase pairs that are (in principle) "orthogonal" to A:T and G:C appears straightforward, the practical realization of these ideas has proven surprisingly problematic. For example, some potential heterocycles have highly populated tautomeric forms with altered hydrogen bonding patterns; these can base pair with standard nucleobases in either duplex DNA or within the active sites of polymerases, thereby giving rise to unanticipated mutations or the loss of the AEGIS nucleobases during replication. Being able to predict tautomer populations in solution or within enzyme active sites prior to chemical synthesis would be a significant step in improving the efficiency with which new nucleobase pairs can be discovered. Even were these "design" problems to be resolved, little is known about how the incorporation of these non-natural nucleobases into xNA affects the conformational preferences and dynamical properties of these complex molecules, which are fundamental to the interaction of "standard" xNA with proteins, such as polymerases, and transcription factors. We note that there has been a dearth of studies aimed at understanding how AEGIS nucleobases, which have altered electrostatic properties (dipole moments, charge distribution), might perturb xNA structure in both free solution and when bound within polymerase active sites. Finally, the validation of the force field parameters needed to model AEGIS nucleobases by, for example, comparing calculated free energies of interaction between xNA and proteins with experimental measurements has not yet been reported.
Work in this project will therefore seek to address the problems outlined above by (i) developing and validating new computational methods for determining the populations of tautomeric forms of AEGIS nucleobases in water and in protein environments, and (ii) using advanced MD-based methods to understand how the incorporation of AEGIS nucleobase pairs affects the conformational and dynamical properties of duplex DNA and its interactions with DNA-binding proteins and polymerases. In particular, free energy perturbation methods will be used to study how replacing "standard" Watson-Crick bases by an AEGIS nucleobase pair changes the affinity of the DNA-binding domain of the human SETMAR transcription factor. The successful accomplishment of this aim will lay a foundation for obtaining novel endonucleases capable of cleaving AEGIS-containing duplex DNA.

Technical Summary

DNA and RNA (collectively xNA) are fascinatingly diverse molecules that display a multitude of detailed conformations with biological significance. In recent years, it has been recognized that, by merely rearranging hydrogen bond donor and acceptor groups, the number of nucleobase pairs can be increased from two to six, thereby yielding an Artificially Expanded Genetic Information System (AEGIS). Efforts to reduce this observation to practice using chemical synthesis, have (so far) led to two "generations" of novel heterocycles that can be incorporated into precursors suitable for use in automated xNA synthesis. In addition, protein engineering using a variety of strategies has created polymerases that can copy and PCR amplify oligonucleotides containing a variety of these "non-natural" nucleobase bases, opening up a substantial number of technological applications. The long-term development of AEGIS-based applications is constrained, however, by a lack of theoretical "infrastructure" that permits us to understand fundamental chemical aspects of expanded genetic systems. In this project, we will carry out theoretical studies at multiple levels to improve our understanding of xNA molecules containing novel nucleobases and nucleobase pairs. The project has two specific aims, which can be summarized as (1) to develop and validate new computational methods for determining the populations of tautomeric forms of AEGIS nucleobases in water and in protein environments, and (2) to understand how the incorporation of AEGIS nucleobase pairs affects the conformational and dynamical properties of duplex DNA and its interactions with DNA-binding proteins. Not only will these studies facilitate the discovery of novel non-natural nucleobases but they will also drive the development of reagents for the creation of organisms in which AEGIS play a functional role.

Planned Impact

The creation of artificial genetic systems capable of Darwinian evolution is a central theme in the emerging field of synthetic biology. Work in this project will drive the development and evaluation of computational strategies that can be used (i) to investigate the molecular properties, including tautomer preferences, of heterocycles that are potential non-natural nucleobases, and (ii) to obtain novel restriction endonucleases and polymerases for manipulating DNA that is built from expanded genetic alphabets. Access to such enzymes will be an essential element in permitting synthetic biologists to construct plasmids containing an expanded number of nucleobases, which will, in turn, accelerate their ability to develop modified bacteria with the capability of producing proteins that contain novel amino acids, with significant implications for the sustainable production of fine chemicals by engineered microorganisms. Indeed, the topic also lends itself to articles in the popular press given the general interest in the nature of life that may exist elsewhere in the Universe. We anticipate that the dynamical simulations of duplex DNA can form the basis of podcasts that can used by school teachers to enhance science teaching.

This project involves an extensive number of collaborations, primarily with USA-based researchers, and so our studies will positively impact the international reputation of UK-based research in chemical biology. We anticipate that the PDRA appointed on this grant will interact closely with our collaborators, both using Skype-based conference calls and by travel to Indiana and Florida. The expertise of our collaborators in the design, chemical synthesis and the characterization of novel heterocycles as novel nucleobases, and in the structure and biophysical properties of nucleic acids is also very strong, and so the project PDRA will gain unique cross-disciplinary training and insight into this emerging sub-discipline of synthetic biology. This will enhance their skill set, and have a positive impact on his/her competitiveness for a career in either academia or the industrial/biotechnology sector.

The Benner laboratory at the Foundation for Applied Molecular Evolution has a strong track record of working with industrial partners to exploit the properties of novel nucleobases in diagnostic reagents. Any intellectual property developed in this study will be identified through periodic discussions with, and disclosures to, Research Innovation Services at Cardiff University and the other institutions involved in this collaborative research project.


10 25 50

publication icon
Radadiya A (2020) Characterizing human odorant signals: insights from insect semiochemistry and modelling. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Zhu W (2020) Whole-Genome Sequence of Shomura and Niida SF-557. in Microbiology resource announcements

Description We have used computational methods to understand the molecular features of DNA polymerases that must be "re-engineered" if they are to be capable of replicating DNA built from expanded genetic alphabets. Our findings set the scene for obtaining aptamers that can be used in disease diagnosis and for the creation of living cells capable of making proteins with a larger number of amino acids. We have also used calculations to understand the alternate molecular structures (tautomers) that can be taken by novel nucleobases in expanded genetic alphabets. This is important in identifying which of these structures might introduce mutations in DNA built from an expanded genetic alphabet when it undergoes replication. This work also shows that existing experimental methods for measuring the tautomer populations of novel nucleobases are inadequate.
Exploitation Route Our work can be used as a basis for introducing modified DNA polymerases into cells as part of developing new tools for making enzymes that can catalyses reactions, which are unknown in biology. In addition, access to the methods developed to compute free energy differences between molecular tautomers will have benefits in drug discover and the synthesis of aptamers with improved binding specificity.
Sectors Pharmaceuticals and Medical Biotechnology

Description Mapping C-C nucleoside bond formation in real time
Amount £660,731 (GBP)
Funding ID BB/T006188/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2020 
End 12/2023
Description Indiana University School of Medicine 
Organisation Indiana University
Department School of Medicine
Country United States 
Sector Academic/University 
PI Contribution Our role in the collaboration is to use advanced computational methods to determine the molecular properties of DNA or protein/DNA complexes. We have recently completed a detailed study of how site-specific residue replacements permit DNA polymerases to use non-natural (AEGIS) nucleobases in DNA replication that cannot be used efficiently by wild type enzymes. These modified DNA polymerases are an essential component of expanded genetic alphabets. We have also studied how the incorporation of AEGIS nucleases affect the structure and dynamical properties of duplex DNA.
Collaborator Contribution Our partner, Prof. Millie Georgiadis, is a biophysicist who is an expert in obtaining high-resolution X-ray crystal structures of nucleic acids and protein/nucleic acid complexes. She provides the structures needed for use in our computational studies and works closely with us to develop testable hypotheses and to interpret simulation results. Prof. Georgiadis also plays a major role in manuscript development.
Impact We have obtained prior access to unpublished X-ray crystal structures that are needed for the computational studies relevant to this award. Dr. Georgiadis has also played a major role in developing a manuscript that is presently under review for publication.
Start Year 2017
Description Technische Universität Dortmund 
Organisation Technical University of Dortmund
Country Germany 
Sector Academic/University 
PI Contribution We have performed free energy calculations to evaluate the tautomeric equilibria of a series of non-natural nucleobases used in expanded genetic alphabets.
Collaborator Contribution Our partners have calibrated our calculations using the EC_RISM method developed by Prof. Dr. Stefan Kast and his co-workers.
Impact Eberlein, L., Beierlein, F., van Eikema-Hommes, N. J. R., Radadiya, A., Heil, J., Benner, S. A., Clark, T., Kast, S. M. and Richards, N. G. J. "Tautomeric equilibria of nucleobases in the hachimoji expanded genetic alphabet", (2020) J. Chem. Theory Computat, Accepted for publication.
Start Year 2018