Deep Learning Approaches to Improve the Efficiency of Drug Discovery

Lead Research Organisation: University of Oxford
Department Name: Statistics


In this project, we will explore the drug discovery problem using modern statistical techniques and deep learning approaches. Focus will be placed on both improving the decision making of chemists in initial hit identification and hit-to-lead optimisation, and on developing the capabilities of automated decision making in drug design.

There is currently limited literature applying machine learning to drug design and research to-date has been largely focused on analysis of 2D ligands and string representations of molecules. While these approaches have shown some success, they omit crucial structural and 3D information that are essential to protein-based interactions. Where machine learning approaches have been applied to structure-based drug discovery, they have typically not learnt features in an end-to-end fashion and have utilised pre-determined descriptors.
So far in this project, we have developed an approach based on convolutional neural networks that achieved substantial improvement on popular virtual screening benchmarks. Our method treated virtual screening as a computer vision problem and used a minimally featurised input format. The model was thus forced to learn the features relevant for binding. Our analysis highlighted that more data is required to fully utilise the power of CNNs in this setting. As such, we are currently curating an expanded dataset using publicly available databases.

We hope that further analysis and experiments will allow us to glean insights into key fundamental properties of protein interactions, such as binding modes, interaction types etc., while also validating the suitability of a machine learning approach to areas beyond the current literature. In addition, we expect the project to highlight unusual, and possibly novel, features of protein-ligand interactions that could then be studied on a fundamental basis by other groups/researchers.

We also plan to conduct prospective evaluation of our methods, using them to predict how untested molecules will interact with a given protein, experimentally validating the theoretical hits. One area that we plan to explore is the use of machine learning techniques for guiding fragment-based approaches. In particular, we are currently designing a system to suggest elaborations of fragment hits in a principled way.

There is considerable publicly available data with which to train prediction algorithms and generative models. However, one challenge of applying machine learning approaches is that while the datasets are large overall, for a given protein there is much more limited data. Thus, successful methods will need either to train efficiently on small datasets, or to be able to utilise data from protein interactions not involving the target protein. This appears feasible, but is not without complications. As a result, we aim to develop novel machine learning techniques to combat these challenges.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 30/09/2016 29/09/2021
2105209 Studentship EP/N509711/1 30/09/2017 13/07/2021 Fergus Imrie
EP/R513295/1 30/09/2018 29/09/2023
2105209 Studentship EP/R513295/1 30/09/2017 13/07/2021 Fergus Imrie
Description Exscientia ICASE award 
Organisation Exscientia Ltd
Country United Kingdom 
Sector Private 
PI Contribution Intellectual input and conducting research.
Collaborator Contribution Intellectual input on research.
Impact Publication (preprint) -
Start Year 2018