Parametric de novo design of new protein folds

Lead Research Organisation: University of Bristol
Department Name: Chemistry

Abstract

The amino acid sequence of a protein dictates its fold. One of the main challenges in modern biochemistry is to understand the protein sequence structure relationship, thus permitting the accurate prediction of a protein's structure from its sequence. Furthermore, comprehension of this relationship will also inform the inverse problem of designing a sequence to adopt a desired fold, that is, de novo protein design. There are several different approaches that can be taken in de novo protein design. In the 1980s and 90s, multiple groups performed bioinformatics analysis of the various different protein folds for which structures had been deposited in the Protein Data Bank PDB, leading to the identification of patterns between protein sequences and secondary, tertiary and quaternary structures. This bioinformatics or knowledge based approach proved reasonable for in delivering de novo proteins that mimicked natural protein folds. Some of the most successful studies of this type were performed with coiled coils, resulting in straightforward yet powerful rules that relate coiled coil sequence to oligomer state. These rules are so advanced that in many cases we can now simply write down a sequence that will fold into a coiled coil with a desired oligomerisation state. However, this approach has not been equally successful for other protein folds. For example, different groups often extracted different and sometimes conflicting parameters describing the same fold, which has been attributed to differences between the datasets analysed, methods used, and so on. Moreover, limitations in both computational power and gene or protein preparation methods have prevented many of these parameters from being systematically tested either in silico or in vitro. More recently, increased computational power has enabled the wide-scale application of fragment-based approaches to de novo protein design. In this approach, a sequence is broken down into short, overlapping segments, the structures of these segments are then modelled from the structures of matching fragment libraries extracted from structures deposited in the PDB. Fragment-based approaches have achieved great success in de novo protein design. As one example from David Baker's lab in Seattle recently used their Rosetta software to design, with high accuracy, a series of curved b-sheets capped with a helices, B as compared with a secondary structure has traditionally proven much more difficult to design, owing primarily to its shallower free energy landscape that the structures lie in and experimental difficulties with alternative folding states such as amyloid. However, a key limitation of fragment based approaches is that the reliance on fragment libraries may restrict the ability to explore the wide variety of theoretically possible protein folds not observed in nature; although previously fragment based methods have been successfully used to predict and design new folds. A further disadvantage is that their use does not necessarily improve our understanding of the sequence to structure relationship. What is lacking are parametric, i.e mathematical, descriptions of protein folds, better sequence to structure relationships, and, where there are no examples to guide the latter, methods to fit sequences to completely new protein frameworks. With these in hand, in principle, it will be possible to design any protein fold, natural or hitherto unseen. The aim of this project is to attempt to parametrically describe difficult protein design targets, with a particular emphasis on all B structures. This will involve an iterative cycle of i bioinformatics analysis of previously solved structures of a selected protein fold, followed by ii computational design, using parameters extracted in the previous step, of new structures with this fold, and then iii biophysical and structural characterisation of the synthesised designs for comparison with their in silico models.

Publications

10 25 50