📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Generalizable Diffusion-based Protein-Ligand Binding Mode and Affinity Prediction

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Deep learning has demonstrated dramatic success in a number of domains but our recent work on PoseBusters revealed significant shortcomings in predicting physically valid, three-dimensional protein-ligand binding modes (Buttenschoen et al., 2024). We propose to build on our recent work predicting binding affinity (Meli et al., 2021; Dablander et al., 2023), estimating uncertainty (Zaidi et al., 2021), and function-based methods to address covariant shift in drug discovery (Klarner et al., 2023) to develop generalizable, multi-task models that simultaneously predict both binding modes and binding affinity together with an estimate of uncertainty.
We will investigate the incorporation of geometric and physical constraints using constrained flow-matching. Generative models draw samples from a distribution to assess the likelihood match, but in protein-ligand binding mode prediction these distributions are typically single-point data obtained from co-crystallized complexes. We propose to develop more robust and generalizable methods by expanding our distributions using synthetic data, and will investigate samples drawn from (i) molecular dynamics trajectories, (ii) multiple binding modes produced by classical protein-ligand docking. We will also build on our recent results showing that data-guided regularization produces more accurate deep neural network models than standard regularization methods.
Supervised deep learning requires large amounts of expensive labelled data, but by exploiting contrastive learning using 3D-representations of protein-ligand complexes, we will develop fine-tuned self-supervised models that build on pre-trained models created using large unlabelled databases.
This interdisciplinary bioinformatics project cements links between industry and academia, spans multiple research areas within the EPSRC remit, and falls within the EPSRC's "Artificial intelligence technologies", "Biological informatics", "Chemical biology and biological chemistry", "Computational and Theoretical Chemistry" research areas.

People

ORCID iD

Alvaro Prat (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W524311/1 30/09/2022 29/09/2028
2928912 Studentship EP/W524311/1 30/09/2024 30/03/2028 Alvaro Prat