Generalizable Diffusion-based Protein-Ligand Binding Mode and Affinity Prediction

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

Deep learning has demonstrated dramatic success in a number of domains but our recent work on PoseBusters revealed significant shortcomings in predicting physically valid, three-dimensional protein-ligand binding modes (Buttenschoen et al., 2024). We propose to build on our recent work predicting binding affinity (Meli et al., 2021; Dablander et al., 2023), estimating uncertainty (Zaidi et al., 2021), and function-based methods to address covariant shift in drug discovery (Klarner et al., 2023) to develop generalizable, multi-task models that simultaneously predict both binding modes and binding affinity together with an estimate of uncertainty.
We will investigate the incorporation of geometric and physical constraints using constrained flow-matching. Generative models draw samples from a distribution to assess the likelihood match, but in protein-ligand binding mode prediction these distributions are typically single-point data obtained from co-crystallized complexes. We propose to develop more robust and generalizable methods by expanding our distributions using synthetic data, and will investigate samples drawn from (i) molecular dynamics trajectories, (ii) multiple binding modes produced by classical protein-ligand docking. We will also build on our recent results showing that data-guided regularization produces more accurate deep neural network models than standard regularization methods.
Supervised deep learning requires large amounts of expensive labelled data, but by exploiting contrastive learning using 3D-representations of protein-ligand complexes, we will develop fine-tuned self-supervised models that build on pre-trained models created using large unlabelled databases.
This interdisciplinary bioinformatics project cements links between industry and academia, spans multiple research areas within the EPSRC remit, and falls within the EPSRC's "Artificial intelligence technologies", "Biological informatics", "Chemical biology and biological chemistry", "Computational and Theoretical Chemistry" research areas.

Student:

Alvaro Prat

Period of Study:

Sep 24 - Mar 28

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2928912

Research Topic:

Unclassified

Organisations

People	ORCID iD
Alvaro Prat (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/W524311/1			30/09/2022	29/09/2028
2928912	Studentship	EP/W524311/1	30/09/2024	30/03/2028	Alvaro Prat

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects