Towards better predictions designs and engineering of coiled-coil protein-protein interactions

Lead Research Organisation: University of Bristol
Department Name: Chemistry

Abstract

Biology has rapidly become a molecular science: biology is blueprinted by, built from and run by molecules; and we now have the means to examine and understand biology at the molecular level. Biological molecules come in all shapes and sizes, ranging from water molecules that measure less than one billionth of a metre across, to molecules of DNA that, when stretched out, can span tens of centimetres. The larger molecules are called biological macromolecules, of which there are four types: carbohydrates, lipids, nucleic acids and proteins. Most of these perform tasks in biology dictated by their chemistry. Proteins, which are the subject of our research, are unusual in that they perform a wide variety of functions in biology. For example, collagen provides scaffolding and reinforcement in most mammalian tissues; myoglobin stores oxygen in muscle, whereas its relative, haemoglobin, transports oxygen from the lungs to active organs and tissues; and hexokinase is the first in a cascade of enzymes--proteins that catalyse chemical reactions--that breaks down glucose-containing foodstuffs to make ATP, the universal currency of energy in biology. The functions of all of the above proteins depend on them adopting specific three-dimensional shapes. To appreciate and to understand this, which is one long-term objective of our research, some protein chemistry is required: firstly, proteins are polymers; that is, they are chain-like molecules made from similar building blocks held together by strong links called peptide bonds. In general, polymers do not adopt specific three-dimensional structures. Proteins are unusual in that they do, which is the key to their roles and importance in biology. The reason that proteins adopt, or fold up to form specific structures comes down to the polypeptide chain and the building blocks used. Proteins predominantly use a set of just twenty amino-acid building blocks. The amino acids have different chemistries, for instance, some are soluble in water, whereas others are not, and, ultimately, these determine the functions of the intact proteins. The three-dimensional structures of proteins are determined by the order of amino acids along the polypeptide chain, which is known as the protein sequence. Even after more than fifty years of research, scientists do not understand how a protein's structure, and hence its function, is related to its sequence. This riddle is known as the protein-folding problem and is the subject of this grant proposal. To put this into perspective, for a protein of 100 amino acids, which is on the small side for proteins, there are 20 to the power 100 possible sequences. Whilst a great resource for biology--it is a huge pool of different potential proteins to explore--this is a very daunting number for scientists to consider. So if the protein-folding problem has remained unsolved for so long, how will we contribute? We will focus on one type of protein the members of which have similar sequences, but nonetheless adopt a variety of different structures. These are called coiled coils. They occur in all biological systems, and carry out many different functions, including helping to switch genes on, and providing scaffolding material within and outside cells. Predominantly, they are responsible for bringing protein molecules together, which is an essential process in biology as proteins do not work alone but in concert. We will gather together many examples of coiled-coil sequences and structures and marry them up. By comparing and contrasting the sequences associated with the different structures, we aim to learn rules that link sequence to structure. Armed with these rules, we will be able to predict new examples of coiled coils in the emerging genomes, and possibly even create our own coiled-coil proteins for medical applications such as scaffolds for tissue engineering.

Technical Summary

The informational aspect of the protein-folding problem--that is, how protein sequence determines protein structure--remains unsolved. In this proposal, we aim to tackle the problem for one protein-folding motif, namely the alpha-helical coiled coil. The coiled coil is a ubiquitous protein-folding motif found in all biological systems, and in a wide variety of proteins with many different functions. It directs helix-to-helix interactions usually at protein-protein interfaces. Most, but not all, coiled-coil sequences are characterised by tandem repeats of seven amino acids (heptads) in which the first and fourth positions are hydrophobic. The regularity of this sequence repeat reflects the underlying structural motif that cements coiled-coil interactions known as knobs-into-holes packing of the hydrophobic side chains. Despite the apparent similarity of coiled-coil sequences and the underlying interactions, they can adopt a variety of architectures and topologies. In other words, despite sequence similarity, coiled coils are structurally heterogeneous; and, whilst it is relatively straightforward to spot coiled-coil sequences, it is not yet possible to predict with confidence what architecture or topology will be adopted, or, in the case of heterotypic coiled coils, the preferred protein partners. Herein lies the problem that this proposal addresses. The overall aim of this proposal is to better understand coiled-coil interactions with a view to improving coiled-coil structure predictions, engineering and design. The structural relatedness of coiled coils provides a key advantage for, and a route into the proposed study: coiled-coil structures will be identified in the Brookhaven Protein Databank using our own software, SOCKET, which recognises the runs of knobs-into-holes interactions common to all coiled coils. This will allow us to create a database, CC++, of structurally verified coiled coils. We will classify the structures in terms of their architecture and topology, and relate these structural data to sequence data. This will provide a rich source of information from which we aim to glean sequence-to-structure relationships. Specifically, we will create a general amino-acid profile for the coiled coil, and specific profiles for each coiled-coil structural class. In turn, these will be used to improve current coiled-coil prediction methods such as COILS, and to develop new methods for coiled-coil oligomer-state and partner prediction. The amino-acid profiles will also be examined statistically to highlight certain specifying interactions that will be tested experimentally in peptide model systems. Why is such work interesting and important? First, coiled coils play pivotal structural roles in a many proteins involved in key cell-biology processes, including transcription, translation, cytoskeleton formation, cell division, membrane curvature and fusion. Second, it is estimated that coiled-coil motifs account for 5 - 10 per cent of all protein-encoding DNA sequence. Thus, improved coiled-coil predictions would have an impact in bioinformatics and post-genomics efforts. Third, because coiled-coil motifs specify both homo- and heterotypic protein-protein interactions, analyses of coiled-coil sequences and structures carries additional challenges, but also additional rewards in terms of predicting potential protein-protein interactions and partners from sequence data.

Publications

10 25 50

publication icon
Hartmann MD (2009) A coiled-coil motif that sequesters ions to the hydrophobic core. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Moutevelis E (2009) A periodic table of coiled-coil protein structures. in Journal of molecular biology

publication icon
Testa OD (2009) CC+: a relational database of coiled-coil structures. in Nucleic acids research