Multi-objective de novo protein design

Lead Research Organisation: University of Oxford

Abstract

De novo protein design involves generating protein sequences that fold to a desired structure. This endeavour encompasses numerous applications across the healthcare and the fine chemistry industries: formulating and optimising protein therapeutics that bind drug targets, generating artificial proteins that elicit immune response against a specific antigen, developing enzyme catalysts for industrial applications, or constructing protein-based nanostructures that can deliver drugs more efficiently. While the field of protein design has experienced significant progress in the past decade, one salient challenge it has faced is the limited control over the physicochemical properties of the generated designs, which invariably leads to multiple iterations of the design-make-test experimental cycle. The objective of this doctoral project is to fill this knowledge gap by developing deep learning algorithms to design proteins with specific physicochemical properties. This project falls within the EPSRC digital healthcare research area, and more generally under the artificial and intelligence research theme.
The project comprises three independent work packages.

- Protein property optimization algorithms. The first work package will expand the property prediction models developed in the summer rotation into a property engineering platform. The property prediction algorithms will be extended from thermodynamic stability into other properties such as solubility and post-translational modifications. The candidate will explore multi-objective optimization algorithms to enable targeted improvement of the physicochemical properties of a protein. This package includes experimentation across different deep learning architectures, starting from the initial combination of graph convolutional neural networks to other architectures incorporating neural attention.

- De novo protein design. The next stage of the project will address the problem of simultaneously generating a new, physically plausible three-dimensional structure for a protein, and an amino acid sequence that encodes these features. The candidate will develop generative models, starting from the currently existing literature in diffusion models and hallucination networks, and condition them on predicted physicochemical properties such as stability and solubility to ensure that the generated designs are optimal for development at a protein production facility. The candidate and her supervisors are currently in conversations with the Woolfson Lab in the University of Bristol who are keen to provide data and validate generated designs experimentally. However, the project's success is not dependent on this collaboration.

- Multi-objective protein design. The last part of the project will focus on multi-objective protein design i.e. designing protein sequences that fulfil a variety of conditions. The candidate will combine her understanding of protein property prediction and de novo protein design to generate artificial proteins, like vaccines and antibodies, that pass a performance profile on multiple properties like immunogenicity, thermostability, toxicity, half-life etc. This part of the project will integrate the machine learning architectures developed by the candidate in previous years, as well as other models generated within the ecosystem of the Oxford Protein Informatics Group.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2721830 Studentship EP/S02428X/1 01/10/2022 30/09/2026 Annie Qurat Ul Ain