Ab initio protein modelling for automated X-ray crystal structure solution

Lead Research Organisation: University of Liverpool
Department Name: Sch of Biological Sciences

Abstract

Proteins make up the functional machinery of all living beings. Their particular roles depend on their 3-dimensional structures which allow given proteins to interact specifically with other molecules in their environment. Some proteins - enzymes - go further and can transform certain compounds into others. To understand better how proteins work and be able to use them in industry and medicine, scientists are greatly interested in figuring out their 3-dimensional structures. There are various ways to do this, but the dominant technique is X-ray crystallography. In this, an intense beam of X-rays is fired at a protein crystal. The X-rays are diffracted when passing through the crystal, producing a pattern of rays that is characteristic of the protein under study. In order to elucidate the structure of the protein, information derived from multiple diffraction patterns obtained from the same protein but under different conditions must be drawn together. The acquisition of such extra diffraction patterns can be time consuming, expensive, and commonly involves hazardous chemicals. A technique exists, however, where computers substitute the additional experiments by estimating equivalent information from available structures of proteins similar to that under study. In this way, protein structures can be solved from one single diffraction pattern. This technique - called Molecular Replacement (MR) - is fast, economical, clean and often uncomplicated. However, since MR relies on pre-existing structures, it is not applicable to many proteins of interest, for which similar structures are simply not available. For many years, scientists have tried to develop computer methods to predict the structure of proteins, purely based on their sequences. These methods are generally called ab initio modelling methods. Over the past decade, these efforts have started to bear fruit. These predicted models are unlikely to substitute for crystal structures any time soon since they typically contain errors, but recent work has shown that they are sometimes close enough to the real structure for them to be used in the MR process. This is the main idea behind this proposal - to adapt current ab initio modelling procedures to the specific needs of MR. With ab initio modelling, it is generally the case that the more detailed (i.e. the longer) the computer calculation, the better the model you can make. Unfortunately, achieving the best models is so demanding that it often requires extensive calculation times or access to supercomputers or other vast computer resources. Few crystallographers have access to these facilities, making the modelling method impractical. We therefore propose a different approach, making efficient use of simpler models that can be easily obtained on typical computers. In our preliminary work, we have already proven that this approach can work successfully for MR. What we want to do now is find the best way to produce optimal models and to do this automatically. This effectively means adapting the method to meet the demands of modern X-ray crystallography, making it fast so that it can be used as a routine approach and accessible to other crystallographers without specialist knowledge of ab initio modelling. We then want to include the method in the MrBUMP program, which is a well-established package allowing for easy, automated MR. MrBUMP can be added to a software package called CCP4i that is widely used by crystallographers. By incorporating our processing method in a familiar program, we expect it to become widely used across the world. We expect that by extending the MR computational approach we will enable protein structures to be determined more quickly and cheaply. In this way, research in all sorts of areas that depend on protein structure information, like drug design, will proceed faster.

Technical Summary

The overall objective of this work is to facilitate access by crystallographers to a novel source of effective search models for Molecular Replacement (MR), namely ab initio protein modeling. Search models will be obtained by automated clustering and processing methods that we will develop. To allow for maximum access by crystallographers, we will use polyalanine models that can be cheaply obtained on typical PCs. To this end, we will draw on our experience of successful MR with hand-picked ab initio models (Acta Cryst, 2008, D64:1288-1291) and discover novel, effective and automatic means of clustering and processing models. We will exploit the synergy that exists between ab initio modeling, which produces clusters of structurally similar models, often capturing correctly different elements of the true structure, and the demonstrably effective structure ensemble approach to MR. Further, we will explore means for automatic deletion of inaccurately modeled termini and benchmark alternative treatments of side chains. After identifying the most successful modes of model superposition and processing, we will incorporate a pipeline for search model production into the program MrBUMP. MrBUMP is an automated program of the popular CCP4 package, which creates and feeds search models to the successful MR programs Phaser and Molrep, and hence is the ideal vehicle for dissemination of our new protocol. Importantly, the changes to MrBUMP will also enable simple, automated local running of the ab initio modeling itself. We will focus on ROSETTA as the only presently widely available ab initio program but we will observe other developing packages and compare their performance as and when they reach maturity. Easy access to ab initio-derived search models will provide the crystallographer with a new tool applicable to both entire proteins and individual domains of up to 100-120 residues, the most common domain size.

Planned Impact

This proposal addresses the key phase problem that lies at the heart of protein crystallography. Exploitation of ab initio protein models in the computational solution of phases by MR will constitute another tool in the crystallographer's armory. This will enable, in applicable cases, to avoid the crystal- and reagent-expensive experimental approaches to structure solution. The ab initio/MR approach will eventually form part of fully automated, high-throughput elucidation pipelines like those of Structural Genomics projects. The methodology will also be capable of accelerating structure refinement in cases where part of the structure can readily be placed leaving a 'missing domain'. The direct beneficiaries of the research will be crystallographers, both in the academic and commercial sectors (eg pharmaceutical industry) but,significantly, the fundamental importance of structural information in all areas of biology leads to two further layers of beneficiaries. Beyond the crystallographer beneficiaries lie the collaborators who will obtain their structures of interest more rapidly and cheaply. Broader still, given the importance of protein structures to drug and vaccine design, pesticide development, biotechnological enzymes etc., the general public may genuinely be said to be the ultimate beneficiary. Society in general will benefit from better medicines and bioproducts. For the above benefits to accrue crystallographers must first be aware of the methodology and, secondly, find it accessible and easy to use. To address the first aspect we will publicise our work at national and international conferences. Of particular relevance, since we propose to incorporate our method into the MrBUMP program, is the CCP4 user community. As part of the CCP4 collaboration, Winn and Keegan run exhibition stands at major crystallographic conferences and act as tutors at crystallographic workshops. Such activities will be used to disseminate our findings, as part of the general training of crystallographers in advanced methods. The broader scientific community will learn of progress through publications. For maximum publicity, we will choose open access journals. Engagement of the broadest group of beneficiaries - the general public - will take advantage of successful programs at Liverpool, initiatives illustrating the importance that the School of Biological Sciences places on increasing public awareness of the impact of biological research on British society. The School actively participates in the BBSRC 'Excellence with Impact' programme. Recent School activities are the co-founding of the University's Centre for Poetry and Science and the participation in public lectures and workshops at Liverpool's 2008 British Association Festival of Science. The School also encourages young biologists with visits to local colleges and School Open Days. Nuffield Bursaries allow for summer studentship placements that further stimulate scientific thinking among the young. We will take full advantage of the University's highly experienced Corporate Communications Team, to make science accessible to the general public via local media coverage and other forms of engagement. We will also access the University's Widening Participation Group, which in turn cooperates with the Educational Opportunities Group to provide an Education Liaison Service. This service arranges a wide spectrum of educational events, e.g. 4-day residential summer schools, 1-day taster days, talks for schools and colleges, a Christmas lecture programme for sixth-form students from local schools, and a newsletter to distributed to 130 Heads of Biology in schools and colleges. STFC also has an active public engagement programme, and indeed this is a major part of the Council's remit, as laid out in the Royal Charter. Previous work by Winn and Keegan has been included in the STFC Annual Report 2007-2008, in the annual CSED Frontiers magazine, and in many issues of the CCP4 newsletter.

Publications

10 25 50

publication icon
Bibby J (2013) Application of the AMPLE cluster-and-truncate approach to NMR structures for molecular replacement. in Acta crystallographica. Section D, Biological crystallography

publication icon
Biterova EI (2019) The crystal structure of human microsomal triglyceride transfer protein. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Hotta K (2014) Conversion of a disulfide bond into a thioacetal group during echinomycin biosynthesis. in Angewandte Chemie (International ed. in English)

publication icon
Keegan RM (2015) Exploring the speed and performance of molecular replacement with AMPLE using QUARK ab initio protein models. in Acta crystallographica. Section D, Biological crystallography

publication icon
Zhang J (2017) Crystal structure of the type IV secretion system component CagX from Helicobacter pylori in Acta Crystallographica Section F Structural Biology Communications

 
Description Rapidly obtained ab initio models can be processed for successful solution of protein crystal structures by Molecular Replacement. Success rate for a large, non-redundant test set of 40-120 residues proteins was 43%, but greater than 80% for alpha-helical targets.

Preliminary data suggest potential for much broader application of the core methodology, for example for NMR structures, multi-domain proteins, membrane proteins etc.
Exploitation Route The method is relevant to any crystallography lab, including those in industry eg pharma. Incorportated into CCP4 crystallorgaphic software package.
Sectors Pharmaceuticals and Medical Biotechnology

URL http://www.ccp4.ac.uk/ample/
 
Description Program AMPLE has been taken up by CCP4 crystallography software package. Now distributed worldwide.
First Year Of Impact 2010
 
Title Distribution of AMPLE 
Description AMPLE in beta form distributed as a component of release 6.3.0 of CCP4 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted
Licensed Yes
 
Title AMPLE 
Description A pipeline for unconventional Molecular Replacement using, for example, ab initio protein structure predictions 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact It has allowed solution of protein crystal structures by MR when conventional approaches failed 
URL https://amplemr.wordpress.com/
 
Title AMPLE, 2019 
Description AMPLE is a pipeline for Molecular Replacement. Since its original conception it has been extensively improved to work with search models derived from, for example, NMR ensembles (with or without remodelling), ensembles derived from single structures by computational means, contact-assisted ab initio models, single structures processed according to arbitrary scores provided, ab initio models from databases and so on. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Makes the process of doing molecular replacement in macromolecular structure solution easier for users, consequently enabling new insights into macromolecular molecules. 
URL https://ample.readthedocs.io
 
Title MrBUMP molecular replacement pipeline for X-ray Crystallography 
Description Significant updates have been made to the MrBUMP software including the use of the molecular graphical application CCP4mg for a graphical front end to the model search and preparation steps of the program. This enables users to better visualize and manipulate the search models that they are using for their structure solution. New version was released to coincide with the release of a new publication on the software to be part of the CCP4 2017 Study Weekend proceedings in Acta Cryst. D. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Makes the process of doing molecular replacement in macromolecular structure solution easier for users, consequently enabling new insights into macromolecular molecules. 
URL http://www.ccp4.ac.uk
 
Title SIMBAD - Sequence independent molecular replacement based on available database 
Description Software for solving the phase problem in macromolecular x-ray crystallography. Designed to be independent of sequence and use PDB database directly to search for potential matches to a target crystal. Released as part of CCP4 suite in late 2017 including CCP4i2 interface. Will also be available through CCP4 cloud facilties. A publication is due in 2018. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Allows for solving cases of contaminants or crystal structures with no obvious sequence-based homologue available. Several contaminants have been solved using the software and hepled to prevent mis-directed research effort when trying to deal with such cases. 
URL http://simbad.readthedocs.io/en/latest/