Prediction and analysis of membrane protein structures and their interactions from genome data

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

About 30% of genes in the human genome code for proteins which are found in cell membranes. These proteins are responsible for maintaining many important processes in cells. Understanding the structure and function of these proteins and studying their properties and biochemical mechanisms are therefore among the most important goals in biological and pharmaceutical research. To understand membrane proteins, it is important to understand how their sequences and structures, and consequently their interactions, have adapted to the chemical and physical properties of biological membranes. With the advent of the structural genomics era, as membrane protein 3D structures are determined, bioinformatics studies are required to analyse and exploit this knowledge at the genome scale. Within this scenario, the main goals of our research project are to develop a pipeline (a closely linked set of computer programs) for the building of membrane protein 3D models at genome-scale and to develop new methods for the analysis and prediction of membrane protein interactions within the biological membrane. We will test our methods on membrane proteins which have experimental data available (e.g. where the number and orientation of transmembrane elements are known) and where possible where the atomic-level 3D structure is known. Currently there are few available resources fully dedicated to membrane proteins, but these do not provide all the information we need (they either contains only structural data or are limited to only few membrane protein classes). We will develop a structure-centric resource including all classes of known membrane protein structures and linking them to the corresponding available genome data. The next step will consist in analysing and classifying all of the transmembrane proteins for which 3D structures have been determined. Unlike existing classification schemes, we will derive a novel method specific for membrane protein classification that will integrate and exploit the existing schemes but develop a set of specific descriptions which will discriminate diverse membrane protein structures in biologically meaningful ways. We also plan to improve the current transmembrane topology prediction by developing a new suite of tools to predict transmembrane topology with increased accuracy combining multiple topological features, like amino acid topogenic propensities, topogenic motifs, sub-cellular location prediction of domains and prediction of signal peptides. We also plan to build a pipeline for building 3D models of membrane proteins across whole genomes. We will develop two different methods to assign membrane proteins to structural families (and build the corresponding 3D models). For sequences with clear similarity to a structural families, sequence profiles will be used for the assignment, while a 'fold recognition' method will be derived for sequences with weak sequence similarity. The interactions of membrane protein interactions with other molecules, i.e. peptides, proteins, lipids and small molecules will be studied to identify interactions for the functioning of specific families. We will analyse binding sites in transmembrane proteins using methods similar to those previously developed in the Thornton group to characterize binding sites in water-soluble proteins. These methods will need to be adapted, since we believe the membrane constraint strongly influences the way cognate partners interact making the interaction patterns peculiar. We will therefore explore the interactions between membrane protein and lipids and their importance for different biological functions and compartmentalisations exploiting the protein structural knowledge developed in our previous structural studies and all currently available resources.

Technical Summary

The main goals of our research project are to develop a pipeline for the building of membrane protein 3D models at genome-scale and new methods for the analysis and prediction of membrane protein interactions. We will develop a structure-centric resource of membrane proteins of experimentally known 2D and 3D information by exploiting and integrating available data. The resource will be web-based and included into PDBsum. Collected structures will be also used to obtain a benchmark set of non redundant known 3D-structures representing all classes of membrane proteins. We will therefore move to the analysis and classification of the available 3D structures and derive a novel method specific that will exploit and integrate White's classification scheme with a set of specific structure-related parameters discriminating diverse membrane protein structures. The 3D-modelling pipeline will include a comparative modelling technique for sequences with clear sequence similarity to a protein of known structure (detectable by HMM profiles) and a fold recognition method for low-similarity sequences. We will also develop a new suite of tools (MEMSAT4) for predicting transmembrane topology with increased accuracy combining multiple topological features. Moreover, we will perform a knowledge-based study on membrane protein-protein, membrane protein-small molecule and membrane protein-lipid interactions exploiting the wide expertise of the Thornton group in analysing globular protein interactions, the structural knowledge derived by our previous studies and all currently available data.
 
Description The main three software products that were developed as a result of this grant were the MEMSAT-SVM topology prediction package, the MEMPACK server and the BioSerf-TM modelling pipeline. MEMSAT-SVM is currently the most accurate predictor of transmembrane topology when evaluated over known transmembrane 3-D structures. The latest version of MEMSAT has also been incorporated into our FFPRED server, which allows prediction of transmembrane protein function based on predicted topology and other sequence features. The MEMPACK server implements quite novel functionality in that not only is the topology of the target TM protein predicted, but the likely packing arrangement of helices is also predicted. These results are also displayed in a user-friendly graphical form. Finally, a specific pipeline for modelling TM proteins has been implemented using our existing BioSerf framework. This provides full 3-D homology models of transmembrane proteins.
Exploitation Route The software developed in this project is freely available to other academic researchers, and is also available via web servers that we maintain.

Also, a high quality benchmark data set on transmembrane topology has been compiled based on available 3-D structures of transmembrane proteins. This data is available from our file server (details given in the MEMSAT-SVM and MEMPACK papers). Another result to mention is the collaborative work with the Thornton and Orengo groups to produce a hand curated classification of transmembrane superfamilies in the CATH protein structure classification database.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://bioinf.cs.ucl.ac.uk/psipred
 
Description A major impact from this project has been the further training of skilled research staff, along with transferable skills to equip them for the commercial workplace. Sean Ward, an RA on the project, received a UCL Entrepreneur Fellowship to construct a business plan for a new UCL spinout called Synthace (http://www.synthace.com/). Synthace is the UK's first synthetic biology company, and recently the company was recently a ?500k TSB Synthetic Biology Grant and has also completed a ?1.3m seed funding round. Synthace has been used as an exemplar for British expertise in the new area of synthetic biology and was visited by a government delegation in 2013, headed by David Willetts MP to inform future policy.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic,Policy & public services

 
Title MEMSAT 
Description Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results MEMSAT is a TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. 
Type Of Technology Software 
Year Produced 2009 
Open Source License? Yes  
Impact Has been licensed to 3 companies so far. 
URL http://bioinf.cs.ucl.ac.uk/web_servers