Machine Learning Approaches for In silico Optimisation and Design of Therapeutic Antibodies

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Antibody therapeutics are fundamental to healthcare, acting as best-in-class therapies for diseases ranging from cancers to viruses. However, the antibody development pipeline is plagued by high failure rates (nearly 80%), long timescales (averaging close to a decade) and high costs (at least hundreds of millions of dollars). Machine learning (ML) tools hold promise for not only reducing timescales and costs of therapeutic antibody development, but also improving clinical trial success rates, safety and therefore patient outcomes.

Antibodies are proteins produced by our immune systems to defend against foreign pathogens. Their ability to bind strongly and specifically to a target, such as a tumour or viral surface protein, also makes them ideal candidates for therapeutics. Target binding affinity is the guiding property in therapeutic antibody development, but additional properties affecting safety and developability must also be considered. For example, certain antibody sequences may have a higher risk of inducing an immune response when administered in patients (immunogenicity), and others reduced stability and therefore shelf-life.

In my DPhil research, I have developed ML tools for antibody property prediction, optimisation and design. A fundamental challenge in the application of ML to biological questions is data availability. Using experimental and synthetic data, with an equivariant graph neural network architecture, I quantified the amount and type of data which will be required to achieve accurate and generalisable affinity prediction. Furthermore, I am exploring the interpretability of affinity predictions, with the aim of identifying key contributions to affinity to guide design. As antibody development involves solving a complex, multi-objective optimisation problem beyond affinity, I have also used ML to investigate additional properties. I worked on a Random Forest-based method, trained on millions of sequences, which can distinguish human from non-human sequences with near-perfect accuracy. I am extending my research of further properties using antibody inverse folding, i.e. predicting sequence given structure, which can generate meaningful embeddings of antibody structure. I will compare the performance of this resulting model with other sequence- and structure-based language models on various antibody property prediction and design tasks, to identify ways to leverage orthogonal information learned by different approaches.

In my research, I have evaluated the applications and limitations of ML to accelerate multiple steps in the antibody design pipeline. These contributions set the foundation for simultaneous multi-objective optimisation, as well as biasing antibody design towards favourable properties.

The project falls within the MRC research priority of Discovery Science - Precision medicine.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/N013468/1 01/10/2016 30/09/2025
2445727 Studentship MR/N013468/1 01/10/2020 31/03/2024 Alissa Hummer