Transparent Deep Learning for Directed Protein Evolution

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Biological Sciences

Abstract

Protein engineering is a complex process, which requires finding an amino acid sequence associated with a desired function. As the design space grows exponentially as a function of the number of residues, de-novo design is currently an intractable problem. To overcome the curse of protein design complexity, scientists routinely rely on an iterative process consisting of random mutagenesis and selection of protein variants, called Directed Evolution (DE, 1); while this process led to remarkable results, it is extremely slow, low-throughput and expensive, as the probability of generating functional proteins at each step is low. Thus, for the last 30 years, scientists have developed biophysical models and optimisation methods to predict protein structure and function in-silico; however, these methods are usually not scalable to large proteins and are limited by the accuracy of the underlying biophysical models.

Recently, Machine Learning (ML) and, in particular, Deep Learning (DL) have largely overcome these problems by learning functional relationships associated with protein folding and function directly from data [2]. However, it remains opaque and challenging to understand how a DL model makes structural and functional predictions [3], thus limiting their utility in understanding the biological design principles associated with functional proteins.

AIMS AND OBJECTIVES: In collaboration with ZenithAI (OT/ZAI), we propose to design and build transparent and explainable deep learning models for protein design. The protein design space increases exponentially with the number of amino acid positions considered but functional proteins are extremely rare. Therefore, transparent models can provide a principled protein selection method, by only looking at important and uncertain amino acid positions, ultimately reducing the burden of experimental screening of protein variants.

WORKPLAN. The project is structured in 3 work packages.
- WP1 - The student will develop a deep learning framework for protein engineering, using state-of-the-art variational and adversarial models coupled with sequence-to-sequence models, which will be trained using curated protein sequence information stratified by species and function.
- WP2 - The student will then develop probabilistic models to quantify uncertainty in designs by exploiting gradient and weights information learned by the model, ultimately to define a score to prioritise proteins for experimental testing.
- WP3 - The student will use the model to design variants of the human S1PL enzyme, which will then be tested in the lab. S1PL is a central enzyme in the sphingolipid pathway, which is essential for proper cell functioning and it has a causal role in many diseases, including cancer and neurodegenerative disorders.

TRAINING PROGRAM. The student will receive training in machine learning, statistical learning and deep learning, and will build a competitive profile in biological sequence modelling and design. The student will be also introduced to the emerging field of synthetic biology and will learn modern DNA cloning and assembly techniques and the use of protein expression systems at scale. We also put a strong emphasis on reproducible research; the student will receive training in advanced research software engineering and in reproducible workflows for data analyses.

Oct 22 - Sep 26

Funder:

BBSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2745409

Research Topic:

Unclassified

Organisations

People	ORCID iD
Giovanni Stracquadanio (Primary Supervisor)	http://orcid.org/0000-0001-9819-3645

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
BB/T00875X/1			01/10/2020	30/09/2028
2745409	Studentship	BB/T00875X/1	01/10/2022	30/09/2026

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects