Rational computational protein design in ISAMBARD: new approaches, folds and functions

Lead Research Organisation: University of Bristol
Department Name: Chemistry

Abstract

It is said that DNA and RNA provide the blueprint to make cells and organisms, while proteins do everything else to make them work. The question is: can proteins perform any function that we define? The answer is: probably, but not just yet.

The point is that natural proteins appear to be limited in some ways. This is certainly the case for the 3D shapes (structures) of proteins. An example is that mammalian antibodies, some viral coat proteins, and certain proteins that trap light energy in plants share a common 3D structure (called a beta-sandwich fold), but they have different chemistries and functions. Thus, nature appears to use a small number of protein structures over again, altering functions by changing the chemistry appended onto them.

Current questions in protein science include: what lies beyond these natural protein structures? Are non-natural protein structures and functions possible? The answer to the latter is almost certainly yes, but the problem is how to access, make and confirm them? This is the task of protein design, which addresses these questions and also potentially provides routes to new protein catalysts, diagnostics and pharmaceuticals.

Protein structures are complicated, with thousands of atoms arranged precisely in space. They are linear chains of amino-acid building blocks, often many hundreds of units long. In proteins, there are 20 amino-acid types, which have different sizes, shapes and chemistries. The order of the amino-acid blocks along the chain-called the sequence-determines the protein's structure and function: change the types and order of the amino acids, and the overall shape and function change. Moreover, the 20 different amino acids can be arranged in an effectively infinite number of protein chains of any length, which means that there is an infinite number of shapes.

On this basis, designing proteins sounds difficult, but there is hope: rather than predicting how each of the possible protein chains folds up, protein designers "simply" have to find a sequence of amino acids that best fits-and therefore best defines-their targeted protein structure. Computers provide the means to do this.

Our approach to designing proteins is different from many others in the field. Much of the excellent work done by others to date learns from and mimics natural proteins. However, we propose to build proteins from scratch using equations that define the possible shapes that the protein designer wants it to adopt. This is called parametric protein design. It offers a route to entirely new protein structures and eventually to new functions. We will develop computer programs to do this.

Our software, ISAMBARD, will use traditional ways of inputting instructions for designing proteins into the computer, i.e. using the keyboard. We will make ISAMBARD freely available to all academic and not-for-profit users. We will test ISAMBARD experimentally by making proteins that have never been seen before.

Working with an industrial partner expert in virtual reality software design, we will also develop ISAMBARD-VR, which will allow users literally to "step into" the process of building a protein in the computer. This will make protein design accessible to anyone regardless of his/her familiarity with computers. Through ISAMBARD-VR, users will be able to define the shape protein they are designing interactively and intuitively. This will then be passed to ISAMBARD, which will optimise the design and find sequences that best fit and define it. This will use a combination of rules learned from how natural proteins fit together, user intuition, and computer algorithms to search efficiently through the many possible combinations of amino acids defined by these various design constraints.

Finally, using ISAMBARD-VR, we will "watch and learn" from the designers at work using machine-learning methods. In this way, we aim not only to improve ISAMBARD, ISAMBARD-VR, but protein design in general.

Technical Summary

Computational protein design is becoming a reality, with several groups (including ours) delivering de novo protein sequences that fold to architectures entirely different from those presented by nature. However, considerable challenges remain, including: expanding the accessible protein structures into the dark matter of protein-fold space; increasing the success rate of translating in silico models to experimentally validated systems; and improving the accessibility of protein-design methods to non-specialists.

We will address these issues by developing two tools for computational design, ISAMBARD and ISAMBARD-VR.

ISAMBARD is an open-source protein-design platform for the parametric design of protein backbones. This is done mathematically, and largely free from bias introduced by using fragments of natural proteins as in other approaches. Protein design in ISAMBARD will be optimised using integrated tools to find sequence and structure solutions for target protein folds, including: (1) the implementation of metaheuristic algorithms; (2) feature sets learnt from analyses of the PDB; and (3) user-specified design principles.

We will test ISAMBARD experimentally by targeting a series of all-beta-structured proteins for currently considered difficult protein-design targets. These targets have been selected to progressively include more challenging aspects of all-beta-structures needed to deliver successful designs for these proteins generally.

ISAMBARD-VR will be developed in collaboration with a local software company expert in virtual reality. It will provide an interface to ISAMBARD allowing expert and non-expert designers to build models and develop parameterizations intuitively within a virtual environment. It will allow us to watch and learn from users during the design process, using machine learning to determine strategies for design. In this way, users and future protein designs will benefit from the experience of a large community of designers.

Planned Impact

We foresee the following four areas of impact from the proposed research.

NEW WAYS OF WORKING AND DEVELOPING A WIDER ACADEMIC AND INDUSTRIAL COMMUNITY: We will combine state-of-the-art protein science, human-computer interaction, and machine learning to develop reliable and user-friendly computational strategies to advance the rational design and engineering of proteins. We will distribute the tools developed freely to UK and international users across both academia and industry. We anticipate that these tools will be taken up more widely to advance understanding, and allow manipulation and design of biomolecular systems. To help accomplish this, we will work with CCP-BioSim to deliver Annual Workshops. These will give visibility to the tools developed, and ensure that users are fully engaged in the process of developing the software both for their own needs and to benefit the wider community.

TRANSLATION: The tools delivered through this project could provide new routes to biomolecular-design applications in areas beyond basic science, including biotechnology, pharmaceuticals and synthetic biology. DNW, BrisSynBio and UoB will use their close collaborations and links with these sectors to ensure further visibility of the software tools that we develop, to help workers in these areas exploit them, and, where appropriate, to translate them through licensing deals.

Beyond biological applications, we anticipate that this work will help exploit new commodity computational technology (virtual reality and machine learning) in applications of scientific simulation and visualisation more generally. This insight should benefit a number of scientific groups, e.g. computer scientists and IT companies. Working closely with the Bristol-based software company, iSci, we will ensure that the software produced has sustainability in mind from outset; i.e. it will comply with the Silicon-Valley standards in interactive graphics, human-computer interaction, client-server architectures, and version control. Working with iSci will also help us to realise commercial opportunities should they arise, and assist with translation of ideas to commercial products.

TRAINING: This project will train scientists to gain advanced skills in two of the "Eight Great Technologies": i.e., Big Data and Synthetic Biology. The two PDRAs employed will develop specialist skills in software design and maintenance, computational biomolecular design, experimental protein design, human-computer interaction, and machine learning. The opportunity for these PDRAs to work closely with industrial-grade software designers will furnish them with a range of highly sought-after skills that are difficult to acquire in regular academic research environments. They will emerge with transferable communication, presentation and problem-solving skills ideally suited to contribute to the growth or creation of high-technology companies, enhancing UK innovative capacity. In addition, through our proposed collaborations, we will engage and train many other early career scientists in the UK and internationally in rational computational biomolecular design. These trained scientists will add to UK economic competitiveness.

OUTREACH: We will use DRG's unique international profile in cultural engagement, which has led to sci-art projects experienced by over 200,000 people worldwide. Through engagement with iSci, we will access their network of over 300 UK schools, and engage science teachers and pupils with our VR-enabled protein-design tools. This will contribute to the public understanding of science, and feed into our research programme, as it will allow us to investigate the extent to which "crowd-sourced" design strategies might help solve protein-design tasks. This unique opportunity will exploit recent work that iSci has carried out in collaboration with Oracle to make all of their computational tools and VR frameworks available anywhere in the world via the Oracle Cloud.

Publications

10 25 50
 
Description We have developed new software (SOCKET2) and designed a completely new protein structure.
Exploitation Route The SOCKET2 program will be used by many to analyse protein structures; the new protein structure could start a new field in peptide design and engineering.
Sectors Education

 
Title SOCKET2 
Description SOCKET2 analyses 3D coordinates of protein structures to identify and categorise coiled-coil domains. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact The predecessor of this software, SOCKET, has been widely used worldwide over the past 20 years. The new version of the software and the associated web application will make the software even more widely available, usable, and therefore used. 
URL http://coiledcoils.chm.bris.ac.uk/socket2/home.html
 
Description We the Curious, Bristol, UK, September 2019, Futures 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Woolfson gave an interactive talk on protein design and synthetic biology to a general as part of Bristol Futures 2019 at We the Curious, Bristol, UK, in September 2019.
Year(s) Of Engagement Activity 2019
URL https://www.futures2019.co.uk/events/we-the-curious/