Machine-learning and directed evolution for enzymes and therapeutic proteins

Lead Research Organisation: University College London
Department Name: Biochemical Engineering


Biocatalytic enzymes and antibody-based therapeutics are prime examples of breakthrough applications of protein products. Manufacturing processes for protein products need to address several challenges such as poor stability of the protein product, tendency to aggregate and sub-optimal pharmacokinetic properties in the final product. Directed evolution has proven to be a very successful protein engineering route for iterative improvements in both function and stability, but is experimentally intensive. The combination of directed evolution approaches with machine-learning algorithms is a promising new area of active research that could alleviate some of the challenges.

Currently, machine-learning algorithms have been developed primarily on sequence-based features. Unfortunately, scarce successes and applications have resulted from such models. The aim of this project is to develop a robust computational framework based on machine learning algorithms able to predict the effect of additive mutations on select physicochemical properties of a given protein. Initially, a pre-existing experimental data set covering a wide set of protein features and properties, including biophysical and molecular data, will be curated for use with machine learning algorithms and split into a training and a validation subset. A thorough review and evaluation of various machine learning algorithms will be conducted based on their ability to accurately predict the properties of the validation subset after calibration on the training subset. The objective is to identify algorithms capable of predicting, with reasonable accuracy, which how mutations will affect protein properties such as propensity to aggregate, stability and biophysical activity. Ultimately, the developed methodology will be applied on different protein systems with the final objective of predicting novel sequences with stronger fitness landscape then their native structure.

By elevating the quality and quantity of labelled features, we envision the potential for the development of a novel machine-learning model which will streamline protein engineering and improve biomanufacturing yields. The results of this research will diminish the costs and time required for the optimisation of potential new protein products, expedite the identification of the 'best mutants' and facilitate the extraction of precious knowledge from protein data to better understand protein systems. Therefore, this project aligns with EPSRC's strategic priorities in 'Digital Manufacturing' and 'Future Manufacturing technologies'.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513143/1 01/10/2018 30/09/2023
2248068 Studentship EP/R513143/1 23/09/2019 22/09/2023 Niccolo Alberto Elia Venanzi