GOLD and physics-based docking with machine learning enhancements

Lead Research Organisation: UNIVERSITY OF OXFORD
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

Molecular docking is a computational technique to predict if and how small molecules bind to target proteins, helping to understand their interaction mechanisms. A key application of docking is to reduce a vast pool of candidate molecules to identify those that bind the target protein and estimate binding affinity. This is important for identifying promising hit molecules and drug candidates in silico, with the goal of accelerating drug development by prioritising the most likely candidates for experimental testing in the early stages of drug development, saving both time and resources.
Traditional physics-based docking tools use algorithms, such as genetic algorithms, to explore the conformational space of the ligand and scoring functions based on physical principles to predict binding modes and estimate the strength of binding. For example, the docking software GOLD uses a genetic algorithm together with empirical or statistical physics-based scoring functions. Recently, new machine learning-based docking tools have emerged, which are trained on large datasets of protein-ligand complexes to predict binding modes. For example, DiffDock uses a generative diffusion model that learns to iteratively refine the binding pose by first learning a forward diffusion process, which adds noise to a ligand, before reversing the diffusion process to predict the binding conformation of the ligand. While these methods have often appeared to perform well when applied in situations similar to their training domain, further investigation has revealed that machine learning-based docking tools have been found not to generalise well beyond their training data and generate poses which are physically implausible significantly more often than their physics-based counterparts.
This project aims to integrate physics-based principles into machine learning models to enhance molecular docking predictions, thus creating generalisable tools which generate physically plausible poses. For example, one approach could be to use conditional diffusion to incorporate physics-based constraints, such as interaction energies, existing terms from molecular force fields, or even known binding site features. These constraints could help ensure poses are physically plausible and the model generalises beyond the training data.
The project will involve benchmarking current docking tools, developing novel machine learning architectures, and optimising hybrid physics-machine learning models to improve binding mode prediction, contributing to faster, more accurate drug discovery pipelines. This research falls under the EPSRC research areas of Artificial Intelligence Technologies, Biological Informatics, Chemical Biology and Biological Chemistry, Mathematical Biology, and Computational and Theoretical Chemistry.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 30/09/2019 30/03/2028
2882308 Studentship EP/S024093/1 30/09/2023 29/09/2027 James Broster