Developing machine learning methods using antibody structural and sequence data to accelerate vaccine design

Lead Research Organisation: University of Oxford
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

This DPhil project aims to use machine learning (ML) techniques to increase the speed and reduce the cost of vaccine design. The project is a collaboration between the Oxford Protein Informatics Group (OPIG) and GSK Vaccines. The goals of this work fall within the EPSRC Analytical Science and Mathematical Biology research areas. Currently, vaccines and antibody therapeutics typically take 5-10 years and approximately £1bn to bring to market. Pre-clinical trials comprise a large proportion of this development time as researchers aim to minimise risks and maximise benefits before assessing vaccines in human volunteers. This pre-clinical trial stage can be sped up significantly by using computational methods to better select vaccine candidates to take into the lab for testing. Furthermore, improved vaccine selection at this early stage can result in higher efficacy products being produced at the end. The development of high-throughput sequencing techniques and accurate protein structure modelling tools have given access to large amounts of data in which to search for promising antibody leads. However, searching this space is still a challenge as it is not yet possible to exactly model antibody-antigen interactions due to the great computational complexity involved. Machine learning techniques also struggle to accurately search this space as only a limited amount of labelled data for training is currently available. This labelled data is largely comprised of antibody-antigen complexes imaged using X-ray crystallography - an expensive and time-consuming technique. These costs mean structural data exists for only a few thousand of complexes compared to the billions of antibody sequences that are now available. This project aims to maximise the utility of the structural and sequence data that is available to train deep neural networks to improve our predictions of how antibodies and antigens interact. These predictive methods will then be developed into robust, open-source software tools that will form part of SAbPred - OPIG's antibody prediction toolbox. This work will differentiate and improve upon existing techniques by using physically important characteristics to label data combined with descriptive feature embeddings obtained from state-of-the-art transformer models.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/V509681/1 01/10/2020 30/09/2024
2451872 Studentship BB/V509681/1 01/10/2020 30/09/2024