Prediction and analysis of membrane protein structures and their interactions from genome data

Lead Research Organisation: European Bioinformatics Institute
Department Name: Thornton Group

Abstract

About 30% of genes in the human genome code for proteins which are found in cell membranes. These proteins are responsible for maintaining many important processes in cells. Understanding the structure and function of these proteins and studying their properties and biochemical mechanisms are therefore among the most important goals in biological and pharmaceutical research. To understand membrane proteins, it is important to understand how their sequences and structures, and consequently their interactions, have adapted to the chemical and physical properties of biological membranes. With the advent of the structural genomics era, as membrane protein 3D structures are determined, bioinformatics studies are required to analyse and exploit this knowledge at the genome scale. Within this scenario, the main goals of our research project are to develop a pipeline (a closely linked set of computer programs) for the building of membrane protein 3D models at genome-scale and to develop new methods for the analysis and prediction of membrane protein interactions within the biological membrane. We will test our methods on membrane proteins which have experimental data available (e.g. where the number and orientation of transmembrane elements are known) and where possible where the atomic-level 3D structure is known. Currently there are few available resources fully dedicated to membrane proteins, but these do not provide all the information we need (they either contains only structural data or are limited to only few membrane protein classes). We will develop a structure-centric resource including all classes of known membrane protein structures and linking them to the corresponding available genome data. The next step will consist in analysing and classifying all of the transmembrane proteins for which 3D structures have been determined. Unlike existing classification schemes, we will derive a novel method specific for membrane protein classification that will integrate and exploit the existing schemes but develop a set of specific descriptions which will discriminate diverse membrane protein structures in biologically meaningful ways. We also plan to improve the current transmembrane topology prediction by developing a new suite of tools to predict transmembrane topology with increased accuracy combining multiple topological features, like amino acid topogenic propensities, topogenic motifs, sub-cellular location prediction of domains and prediction of signal peptides. We also plan to build a pipeline for building 3D models of membrane proteins across whole genomes. We will develop two different methods to assign membrane proteins to structural families (and build the corresponding 3D models). For sequences with clear similarity to a structural families, sequence profiles will be used for the assignment, while a 'fold recognition' method will be derived for sequences with weak sequence similarity. The interactions of membrane protein interactions with other molecules, i.e. peptides, proteins, lipids and small molecules will be studied to identify interactions for the functioning of specific families. We will analyse binding sites in transmembrane proteins using methods similar to those previously developed in the Thornton group to characterize binding sites in water-soluble proteins. These methods will need to be adapted, since we believe the membrane constraint strongly influences the way cognate partners interact making the interaction patterns peculiar. We will therefore explore the interactions between membrane protein and lipids and their importance for different biological functions and compartmentalisations exploiting the protein structural knowledge developed in our previous structural studies and all currently available resources.

Technical Summary

The main goals of our research project are to develop a pipeline for the building of membrane protein 3D models at genome-scale and new methods for the analysis and prediction of membrane protein interactions. We will develop a structure-centric resource of membrane proteins of experimentally known 2D and 3D information by exploiting and integrating available data. The resource will be web-based and included into PDBsum. Collected structures will be also used to obtain a benchmark set of non redundant known 3D-structures representing all classes of membrane proteins. We will therefore move to the analysis and classification of the available 3D structures and derive a novel method specific that will exploit and integrate White's classification scheme with a set of specific structure-related parameters discriminating diverse membrane protein structures. The 3D-modelling pipeline will include a comparative modelling technique for sequences with clear sequence similarity to a protein of known structure (detectable by HMM profiles) and a fold recognition method for low-similarity sequences. We will also develop a new suite of tools (MEMSAT4) for predicting transmembrane topology with increased accuracy combining multiple topological features. Moreover, we will perform a knowledge-based study on membrane protein-protein, membrane protein-small molecule and membrane protein-lipid interactions exploiting the wide expertise of the Thornton group in analysing globular protein interactions, the structural knowledge derived by our previous studies and all currently available data.