Enhancements to ProFit

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

This project will make a number of enhancements to the computer software known as ProFit previously written by the applicant for comparing sets of protein or RNA coordinates. These enhancements are largely requested by current users and will enable additional applications and research. The optimal 'fitting' of two or more sets of protein or RNA coordinates is a fundamental requirement for examining structural similarity and variability. This fitting is done when predicting the three-dimensional structure of proteins based on other known structures and in evaluating the performance of modelling methods. It is also used in studying the variability and flexibility of protein and RNA structures. This is useful when a single molecule is flexible and when one wishes to examine the effects of mutations or different bound ligands on protein structure. Fitting is also used to examine the variability of related protein structures to study evolution or protein structures. ProFit is a very widely used program for fitting two or more protein or RNA structures. A set of equivalent atoms in the structures must be specified, but ProFit also allows the equivalent atoms to be optimized automatically. Once the structures have been fitted, a simple statistic known as the 'root mean square deviation' (RMSd) is calculated to give an indication of the similarity. The work proposed in this project will provide a number of enhancements to the program. Most importantly, a number of additions requested by users will be implemented which include removing restrictions when dealing with structures having more than one poly-peptide chain. This is becoming particularly important as crystallographers are now able to solve enormously complex structures. We will also perform some internal cleanup of the code to make it more maintainable and will provide a Windows version and a web-based graphical interface to improve the usability of the software for Biologists.

Technical Summary

Least squares fitting of two or more protein or RNA structures is a fundamental technique to examine structural similarity and variability. ProFit implements McLachlan least squares fitting which overlaps the centres of geometry and performs a modified Conjugate Gradients minimisation of the RMSd through rotational optimization. The code consists of ~16,000 lines of C (of which ~400 lines are the actual fitting code) written by the applicant and the program has >4000 users. The program began life in the late 1980s as FIT and was redesigned in the early 1990s. Over the last 15 years, the program has evolved significantly adding new facilities. These include two methods to create atom equivalences for the user. First, it provides a Needleman and Wunsch sequence alignment with fitting zones being derived from equivalent residues in the sequence alignment. Second it provides an iterative updating procedure; dynamic programming is used iteratively to update an optimal set of equivalent C-alpha atoms. Another recently added facility is fitting of multiple structures. This is an interative procedure in which an averaged structure is generated in each iteration. The primary aim of the work proposed here is to implement a large number of enhancements requested by users. These include removing restrictions relating to multi-chain proteins (currently, sequence alignment and iterative updating of fitting zones can only be used with single-chain proteins). Currently the software is distributed as source code and as a Linux binary. We will provide an official Windows binary and will implement a web-based interface to the program using AJAX. These enhancements will enable the use of the program by Biologists without the ability or desire to install ProFit under Linux, or compile it themselves for Windows. In addition we will provide a Subversion repository and bug tracking using Trac to make the program fully open source and encourage communinity involvement.

Publications

10 25 50
 
Description ProFit is a computer program for fitting and comparing protein and nucleotide structures. The main objectives of the project were (1) To fulfill requests from users for enhancements, making the program more useful to the Biologist and enabling new science to be performed; (2) To provide web-accessible and Windows versions of the software; (3) To re-organize some of the code, improve documentation and implement version control.



All features requested by users have been implemented and new versions of the ProFit software have been released. The most significant enhancements have been in dealing with structures containing multiple chains and improvements in fitting multiple structures. Usability was improved

in various ways and the documentation was enhanced. The code has been cleaned up and revision-control has been implemented. Web-based and Windows versions of the program are now available. Some problems with the algorithm used for fitting when applied to identical structures were investigated and a work-around was developed.
Exploitation Route Structural biology is key to many drug discovery routes. Consequently ProFit has been downloaded and is used by numerous companies. A small sample of larger companies includes Abbott, Astex, AstraZeneca, Aventis, Biogen-Idec, Boehringer-Ingelheim, Chiron, Esbatech, Fujitsu, Helix Genomics, Merck, Morphotex, Novartis, Novonordisk, Pfizer, Roche, Sanofi-Synthelabo, Serono, Strand Genomics, Stromix and UCB. The application of software is in structural biology. It is regularly used in our lab, for example in comparison and analysis of antibody structures. In total, ProFit has been downloaded over 9000 times by over 6800 unique users. It has an average of approximately 60 downloads per month, but this increased to ~250 downloads per month when the new version was released.
Sectors Manufacturing/ including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Other

URL http://www.bioinf.org.uk/software/profit/
 
Title ProFit 
Description ProFit is a least squares fitting program for comparison of (primarily) protein structure (although it can also be applied to nucleic acids and small molecules stored in PDB files). It has many features including flexible specification of fitting zones and atoms, calculation of RMS over different zones or atoms, RMS-by-residue calculation, It also allows iterative fitting to optimize equivalent atoms, multiple chains and fitting based on sequence alignment. ProFit is designed to be easy to use, but provides on-line help and comes with extensive documentation. The program is available at http://www.bioinf.org.uk/software/profit/ 
Type Of Technology Software 
Year Produced 2009 
Open Source License? Yes  
Impact The software has now (November 2014) had more than 7350 downloads. There are over 75 commercial companies using the software including: Amgen, Astra Zeneca, Boehringer Ingelheim, Bristol Myers Squibb, Fujitsu, GSK, Lilly, Merck, Nestle, Novartis, Roche, Sanofi-Aventis, Takeda, UCB Group, Abbott and Pfizer. 
URL http://www.bioinf.org.uk/software/profit