Ab initio protein modelling for automated X-ray crystal structure solution

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Computational Science & Engineering

Abstract

Proteins make up the functional machinery of all living beings. Their particular roles depend on their 3-dimensional structures which allow given proteins to interact specifically with other molecules in their environment. Some proteins - enzymes - go further and can transform certain compounds into others. To understand better how proteins work and be able to use them in industry and medicine, scientists are greatly interested in figuring out their 3-dimensional structures. There are various ways to do this, but the dominant technique is X-ray crystallography. In this, an intense beam of X-rays is fired at a protein crystal. The X-rays are diffracted when passing through the crystal, producing a pattern of rays that is characteristic of the protein under study. In order to elucidate the structure of the protein, information derived from multiple diffraction patterns obtained from the same protein but under different conditions must be drawn together. The acquisition of such extra diffraction patterns can be time consuming, expensive, and commonly involves hazardous chemicals. A technique exists, however, where computers substitute the additional experiments by estimating equivalent information from available structures of proteins similar to that under study. In this way, protein structures can be solved from one single diffraction pattern. This technique - called Molecular Replacement (MR) - is fast, economical, clean and often uncomplicated. However, since MR relies on pre-existing structures, it is not applicable to many proteins of interest, for which similar structures are simply not available. For many years, scientists have tried to develop computer methods to predict the structure of proteins, purely based on their sequences. These methods are generally called ab initio modelling methods. Over the past decade, these efforts have started to bear fruit. These predicted models are unlikely to substitute for crystal structures any time soon since they typically contain errors, but recent work has shown that they are sometimes close enough to the real structure for them to be used in the MR process. This is the main idea behind this proposal - to adapt current ab initio modelling procedures to the specific needs of MR. With ab initio modelling, it is generally the case that the more detailed (i.e. the longer) the computer calculation, the better the model you can make. Unfortunately, achieving the best models is so demanding that it often requires extensive calculation times or access to supercomputers or other vast computer resources. Few crystallographers have access to these facilities, making the modelling method impractical. We therefore propose a different approach, making efficient use of simpler models that can be easily obtained on typical computers. In our preliminary work, we have already proven that this approach can work successfully for MR. What we want to do now is find the best way to produce optimal models and to do this automatically. This effectively means adapting the method to meet the demands of modern X-ray crystallography, making it fast so that it can be used as a routine approach and accessible to other crystallographers without specialist knowledge of ab initio modelling. We then want to include the method in the MrBUMP program, which is a well-established package allowing for easy, automated MR. MrBUMP can be added to a software package called CCP4i that is widely used by crystallographers. By incorporating our processing method in a familiar program, we expect it to become widely used across the world. We expect that by extending the MR computational approach we will enable protein structures to be determined more quickly and cheaply. In this way, research in all sorts of areas that depend on protein structure information, like drug design, will proceed faster.

Technical Summary

The overall objective of this work is to facilitate access by crystallographers to a novel source of effective search models for Molecular Replacement (MR), namely ab initio protein modeling. Search models will be obtained by automated clustering and processing methods that we will develop. To allow for maximum access by crystallographers, we will use polyalanine models that can be cheaply obtained on typical PCs. To this end, we will draw on our experience of successful MR with hand-picked ab initio models (Acta Cryst, 2008, D64:1288-1291) and discover novel, effective and automatic means of clustering and processing models. We will exploit the synergy that exists between ab initio modeling, which produces clusters of structurally similar models, often capturing correctly different elements of the true structure, and the demonstrably effective structure ensemble approach to MR. Further, we will explore means for automatic deletion of inaccurately modeled termini and benchmark alternative treatments of side chains. After identifying the most successful modes of model superposition and processing, we will incorporate a pipeline for search model production into the program MrBUMP. MrBUMP is an automated program of the popular CCP4 package, which creates and feeds search models to the successful MR programs Phaser and Molrep, and hence is the ideal vehicle for dissemination of our new protocol. Importantly, the changes to MrBUMP will also enable simple, automated local running of the ab initio modeling itself. We will focus on ROSETTA as the only presently widely available ab initio program but we will observe other developing packages and compare their performance as and when they reach maturity. Easy access to ab initio-derived search models will provide the crystallographer with a new tool applicable to both entire proteins and individual domains of up to 100-120 residues, the most common domain size.

Planned Impact

This proposal addresses the key phase problem that lies at the heart of protein crystallography. Exploitation of ab initio protein models in the computational solution of phases by MR will constitute another tool in the crystallographer's armory. This will enable, in applicable cases, to avoid the crystal- and reagent-expensive experimental approaches to structure solution. The ab initio/MR approach will eventually form part of fully automated, high-throughput elucidation pipelines like those of Structural Genomics projects. The methodology will also be capable of accelerating structure refinement in cases where part of the structure can readily be placed leaving a 'missing domain'. The direct beneficiaries of the research will be crystallographers, both in the academic and commercial sectors (eg pharmaceutical industry) but,significantly, the fundamental importance of structural information in all areas of biology leads to two further layers of beneficiaries. Beyond the crystallographer beneficiaries lie the collaborators who will obtain their structures of interest more rapidly and cheaply. Broader still, given the importance of protein structures to drug and vaccine design, pesticide development, biotechnological enzymes etc., the general public may genuinely be said to be the ultimate beneficiary. Society in general will benefit from better medicines and bioproducts. For the above benefits to accrue crystallographers must first be aware of the methodology and, secondly, find it accessible and easy to use. To address the first aspect we will publicise our work at national and international conferences. Of particular relevance, since we propose to incorporate our method into the MrBUMP program, is the CCP4 user community. As part of the CCP4 collaboration, Winn and Keegan run exhibition stands at major crystallographic conferences and act as tutors at crystallographic workshops. Such activities will be used to disseminate our findings, as part of the general training of crystallographers in advanced methods. The broader scientific community will learn of progress through publications. For maximum publicity, we will choose open access journals. Engagement of the broadest group of beneficiaries - the general public - will take advantage of successful programs at Liverpool, initiatives illustrating the importance that the School of Biological Sciences places on increasing public awareness of the impact of biological research on British society. The School actively participates in the BBSRC 'Excellence with Impact' programme. Recent School activities are the co-founding of the University's Centre for Poetry and Science and the participation in public lectures and workshops at Liverpool's 2008 British Association Festival of Science. The School also encourages young biologists with visits to local colleges and School Open Days. Nuffield Bursaries allow for summer studentship placements that further stimulate scientific thinking among the young. We will take full advantage of the University's highly experienced Corporate Communications Team, to make science accessible to the general public via local media coverage and other forms of engagement. We will also access the University's Widening Participation Group, which in turn cooperates with the Educational Opportunities Group to provide an Education Liaison Service. This service arranges a wide spectrum of educational events, e.g. 4-day residential summer schools, 1-day taster days, talks for schools and colleges, a Christmas lecture programme for sixth-form students from local schools, and a newsletter to distributed to 130 Heads of Biology in schools and colleges. STFC also has an active public engagement programme, and indeed this is a major part of the Council's remit, as laid out in the Royal Charter. Previous work by Winn and Keegan has been included in the STFC Annual Report 2007-2008, in the annual CSED Frontiers magazine, and in many issues of the CCP4 newsletter.

Publications

10 25 50

publication icon
Bibby J (2013) Application of the AMPLE cluster-and-truncate approach to NMR structures for molecular replacement. in Acta crystallographica. Section D, Biological crystallography

publication icon
Hotta K (2014) Conversion of a disulfide bond into a thioacetal group during echinomycin biosynthesis. in Angewandte Chemie (International ed. in English)

publication icon
Keegan RM (2015) Exploring the speed and performance of molecular replacement with AMPLE using QUARK ab initio protein models. in Acta crystallographica. Section D, Biological crystallography

publication icon
Thomas JMH (2017) Approaches to ab initio molecular replacement of a-helical transmembrane proteins. in Acta crystallographica. Section D, Structural biology

publication icon
Zhang J (2017) Crystal structure of the type IV secretion system component CagX from Helicobacter pylori. in Acta crystallographica. Section F, Structural biology communications

 
Description We have developed a specific algorithm for determining the atomic structures of macromolecules such as proteins. Knowledge of these structures underpins several sectors such as pharma, biotech and agrochemical. The algorithm was encoded in a computer program AMPLE which has been released to the research community.
Exploitation Route The software can be used by research groups for their specific scientific projects. The algorithms have been developed further as part of CCP4, and improved software is part of the CCP4 software suite.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://ample.readthedocs.io/en/latest/index.html
 
Description The protocol that we developed has been encoded in the program AMPLE. This has been included in the CCP4 suite for macromolecular crystallography, and has been used by a number of research groups to solve novel protein structures. AMPLE has been further developed since this award ended (funded partly by the CCP4 grant 2014-2019) to cover several new modalities (see https://amplemr.wordpress.com/). By enabling structure solution in difficult cases, AMPLE extends the reach of structural biology, and thus contributes to downstream activities such as rationale drug design.
First Year Of Impact 2014
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description CCP4 Advanced integrated approaches to macromolecular structure determination
Amount £275,180 (GBP)
Funding ID BB/S007040/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2019 
End 03/2024
 
Description CCP4 Grant Renewal 2014-2019: Question-driven crystallographic data collection and advanced structure solution
Amount £11,705 (GBP)
Funding ID BB/L008777/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 03/2014 
End 03/2019
 
Title Cluster and truncate algorithm for structural search models 
Description We have developed a protocol for clustering structural models, truncating divergent regions, and re-clustering to produce ensemble models suitable for crystallographic structure solution via Molecular Replacement. The protocol was originally developed for decoys output from structure prediction software, but has also been applied to models from NMR structure solution of homologous proteins. 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact The algorithm has been implemented in the software AMPLE, which has been used to solve 2 novel structures to-date: Bruhn et al., J.Virology, 88:758-762 (2014) Hotta et al., Ang. Chem. Int. Ed., 53:824-828 (2014) 
 
Description AMPLE collaboration 
Organisation University of Liverpool
Department Institute of Integrative Biology
Country United Kingdom 
Sector Academic/University 
PI Contribution Development of AMPLE code, and inclusion in CCP4 suite. This work was started by the BBSRC grant 2010-2012, and has continued since, supported by CCP4 staff.
Collaborator Contribution Development of AMPLE code. This work was started by the BBSRC grant 2010-2012, and has continued since, supported by Daniel Rigden and PhD students at Liverpool.
Impact Collaboration develops the AMPLE software, which is now released as part of the CCP4 software suite. The collaboration combines structural bioinformatics expertise at Liverpool with computational structural biology expertise at CCP4/STFC.
Start Year 2008
 
Title AMPLE 
Description Software for solving macromolecular crystal structures, using Molecular Replacement and ab initio search models. A beta version of AMPLE was included in CCP4 6.3.0 
Type Of Technology Software 
Year Produced 2012 
Impact No actual Impacts realised to date 
URL http://www.ccp4.ac.uk/ample/
 
Description CCP4 Study Weekends 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation workshop facilitator
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The CCP4 Study Weekend is recognised as the best conference for computational methods in macromolecular crystallography (as opposed to those focussed on the scientific results). As such it attracts the leading international developers and an audience of over 400. Each year it provides a snapshot of the state-of-the-art.
Lunchtime bytes provides an opportunity for software developers to demonstrate their programs to the delegates at the meeting. Software from both CCP4 and CCP-EM is demonstrated each year.

The proceedings of each year's conference are published in a special issue of Acta Crystallographica D. Articles in these issues are usually highly cited, as they describe methods used by many crystallographers.
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
URL http://www.ccp4.ac.uk/ccp4course.php
 
Description Talk at BioNMR meeting March 2012 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact Talk given at BioNMR meeting, Barcelona March 2012. About a third of this talk was devoted to the development of Ample, which is the software being developed in this grant. Talk on Ample to an audience of European structural biologists. Powerpoint presentation produced, available on request.

no actual impacts realised to date
Year(s) Of Engagement Activity 2012
 
Description Tutoring at CEITEC Winter School on Structural Cell Biology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I gave talks on "De novo protein modelling and its use in interpreting experimental structural data" and "Volume data from cryoEM and crystallography: fitting and building atomic models and matching against other volume data". These sparked questions from the students and discussions with other tutors.

I invited my host at CEITEC to give a talk at the CCP-EM Spring Symposium later that year.
Year(s) Of Engagement Activity 2015