Machine-learning and directed evolution for enzymes and therapeutic proteins

Lead Research Organisation: University College London

Department Name: Biochemical Engineering

Abstract

Biocatalytic enzymes and antibody-based therapeutics are prime examples of breakthrough applications of protein products. Manufacturing processes for protein products need to address several challenges such as poor stability of the protein product, tendency to aggregate and sub-optimal pharmacokinetic properties in the final product. Directed evolution has proven to be a very successful protein engineering route for iterative improvements in both function and stability, but is experimentally intensive. The combination of directed evolution approaches with machine-learning algorithms is a promising new area of active research that could alleviate some of the challenges.

Currently, machine-learning algorithms have been developed primarily on sequence-based features. Unfortunately, scarce successes and applications have resulted from such models. The aim of this project is to develop a robust computational framework based on machine learning algorithms able to predict the effect of additive mutations on select physicochemical properties of a given protein. Initially, a pre-existing experimental data set covering a wide set of protein features and properties, including biophysical and molecular data, will be curated for use with machine learning algorithms and split into a training and a validation subset. A thorough review and evaluation of various machine learning algorithms will be conducted based on their ability to accurately predict the properties of the validation subset after calibration on the training subset. The objective is to identify algorithms capable of predicting, with reasonable accuracy, which how mutations will affect protein properties such as propensity to aggregate, stability and biophysical activity. Ultimately, the developed methodology will be applied on different protein systems with the final objective of predicting novel sequences with stronger fitness landscape then their native structure.

By elevating the quality and quantity of labelled features, we envision the potential for the development of a novel machine-learning model which will streamline protein engineering and improve biomanufacturing yields. The results of this research will diminish the costs and time required for the optimisation of potential new protein products, expedite the identification of the 'best mutants' and facilitate the extraction of precious knowledge from protein data to better understand protein systems. Therefore, this project aligns with EPSRC's strategic priorities in 'Digital Manufacturing' and 'Future Manufacturing technologies'.

Student:

Niccolo Alberto Elia Venanzi

Period of Study:

Oct 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2248068

Research Topic:

Unclassified

Organisations

University College London (Lead Research Organisation)

People	ORCID iD
Paul Dalby (Primary Supervisor)	http://orcid.org/0000-0002-0980-8167
Niccolo Alberto Elia Venanzi (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513143/1			01/10/2018	30/09/2023
2248068	Studentship	EP/R513143/1	01/10/2019	22/09/2023	Niccolo Alberto Elia Venanzi

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects