Validation of NMR protein structures using FIRST and RCI

Lead Research Organisation: University of Sheffield
Department Name: Molecular Biology and Biotechnology


Protein structures are essential for understanding protein function, and for drug design. In order to make use of structures, it is vital for users to know how good the structures are. The structures are generated mainly from X-ray crystallography and NMR. For crystal structures, there are reliable ways of knowing how good the structure is. These are based on the fact that a structure can be used to calculate exactly what the input data should look like: a comparison with the actual diffraction data therefore gives a reliable quantitative measure of quality. For NMR, there is no such measure, meaning that so far it is very difficult to know how good an NMR structure is. This is a problem not only for users of structural information, but also for the scientists who calculate the structures, since they also have no way to judge how good their structures are.
In this proposal we describe a method for calculating the quality of NMR structures (ie, validation), based on comparing two measures of local rigidity, one derived from the structures and one from the original input data. The first measure is calculated using an established method for identifying rigid clusters based on graph theory, called FIRST, and developed by our collaborator Dr Sljoka. The second method uses the Random Coil Index (RCI), which is a program based on the simple idea that the NMR frequencies ('chemical shifts') of protein backbone atoms have very characteristic 'random coil' shifts when the protein is locally disordered, and therefore that the experimentally measured shifts in a protein can be used to quantify to what extent a given amino acid residue is disordered. A comparison of these two measures of local rigidity therefore provides a residue-by-residue test of how well the rigidity of the structures compares to the experimentally determined 'true' rigidity. Although this is not a direct comparison between structure and input, it is likely to be as close as one can get for NMR structures, and is a major improvement in the NMR structure determination process. The proposal describes how we will go about implementing the comparison and checking that it works as expected, and then how we will make it available to the community and use it to examine NMR structures, for example by reporting on the quality of all existing protein NMR structures (objective 1).

Having developed the validation tool, we then propose to apply it to some useful ends. The first of these (objective 2) is to identify sets of 'good' and 'bad' NMR structures. So far there has been no good way to know how good structures are: by identifying such structures we expect to generate an important resource for the structural biology community by marking out quality criteria and therefore stimulating further research into structure quality.

Whereas crystal structures are typically represented by a single set of coordinates at the average position (together with 'B factors' that represent the uncertainty in each coordinate), NMR structures are always represented as an ensemble of (typically 20) structures. There is a valid reason for this, that NMR structures are inherently less well defined than crystal structures. Nevertheless, it is confusing and unnecesary. We aim to apply our method to define more closely how many structures in an ensemble are really necessary, and whether some are simply wrong. In order to assist the process, we will improve current methods for calculating chemical shifts from structures, by modifying them to work on ensembles.

Finally, we shall use our methods to look at an important class of protein structures called Intrinsically Disordered Proteins, to test whether current methods provide a correct representation of the true conformational ensemble. These represent roughly one third of human proteins (including many responsible for signalling), so are an important topic.

Technical Summary

Currently there is no good way to validate NMR structures, because there is no direct connection between the input data (NMR spectra) and the structures, as there is for crystal structures. We present preliminary results for a method that comes as close to this as possible, namely a comparison between chemical shifts (the Random Coil Index, RCI) and the program FIRST, which calculates the local rigidity of a structure or an ensemble of structures using mathematically rigorous methods. Both calculate the local rigidity of a protein. Crucially, the shifts are not used as part of the structure calculation, and represent only a small abstraction from the original NMR spectra. The comparison therefore comes as close as possible to a crystallographic R-free. We will test and refine the method, with the aim of rapidly and automatically generating a per-residue quality index for every NMR structure in the PDB, thereby for the first time allowing PDB users to know how good (accurate) any NMR structure is. Equally importantly, the method can be used by NMR groups to measure the accuracy of their structures at any stage in the structure calculation, and should therefore be a useful tool to improve structure calculations by identifying problems during the calculation.

All NMR structures in the PDB are deposited as ensembles. The relevance of the individual members of an ensemble is far from clear, and the selection process is opaque. Our method will throw light on this, and hopefully stimulate a change in behaviour, or at least debate. It will show how well individual members match, and thus identify outliers. We propose to update our (1993) program for calculating protein chemical shifts to operate on ensembles rather than single (crystal) structures. This will allow us to identify outliers using a second independent method, and thus work towards re-defining NMR ensembles. Finally, we shall better characterise residual structure in Intrinsically Disordered Proteins.

Planned Impact

This work will only have Impact if it is taken up and used by the scientific community, in particular structural biologists. Hence a key aim of the proposal is to make sure that the programs are adopted and used widely once the methodology has been tested and checked. We have therefore built this aim firmly into the proposal:

1. The work will be disseminated as widely as possible, for example by publication in international scientific journals, and presentations at relevant conferences. MPW is on the organising committee for the next ICMRBS meeting in Dublin in August 2018, and if the project is suitable advanced by that point, he will propose a workshop satellite meeting at ICMRBS to cover validation.

2. Part of Objective 1 is to carry out calculations of the quality of all NMR structures in the PDB, publish them on a website and make them available in an easily accessible archive. This will make it easy for anyone to check the accuracy of an NMR structure, and should increase the usage and therefore the impact.

3. The PDB has a task force on protein validation, which has published its preliminary findings (reference 8 in the proposal). It proposed three phases for developing validation, of which the third recognises the need to develop new tools, specifically based around chemical shifts. We propose to put a lot of effort into integrating our methodology with the validation software made available on the PDB website, with the aim of getting PDB to include our method as one of the standard measures for validating NMR structures. In the UK, the two key people who would be involved in this dialogue are Aleksandras Gutmanas, who works at PDBe in Hinxton, Cambridgeshire, and Geerten Vuister, Professor in the Department of Molecular and Cell Biology at the University of Leicester. Gutmanas has a specific responsibility for NMR structures and NMR validation in PDB, while Vuister has worked in NMR validation for many years, is a member of the PDB validation task force (as is Gutmanas' boss at PDBe, Gerard Kleywegt) and is also chair of CCPN. CCPN is the Collaborative Computational Project for NMR, funded by the BBSRC from 2000 to 2012 and by MRC from 2013. It develops programs for NMR analysis, and aims to 'determine and spread best practice in NMR'. We have initiated discussions with both of them. MPW is also in occasional touch with Guy Montelione, chair of the PDB NMR validation task force, and with John Markley, director of the BioMagResBank, which is the repository for all NMR protein chemical data. He has also held detailed discussions with Naohiro Kobayashi, who is the NMR expert in PDBj (the Japanese wing of PDB) and who has a major interest in validation: MPW shared an office with him for a year while on sabbatical in Osaka a few years ago. We therefore feel that we are well placed to get our software incorporated into the PDB NMR validation suite. We also hope to get it linked into the CcpNMR Analysis website.

4. Training. One output from this research is that the PDRA involved (who we expect to be a computer scientist) will be trained in protein structure and NMR. Such cross-disciplinary training is increasingly important.

5. Engagement with the public. We shall use virtual reality displays to explain and delight, focussing on 'wrong' structures. We shall also explore using an app for mobile phones that enables users to see objects in 3D on their mobiles using cheap and readily available cardboard glasses, which is an engaging introduction to protein structure in general, and 'mistakes' in particular. The concept that scientists sometimes make mistakes is one worth working at to be clear but also entertaining without being sensationalist.


10 25 50