Deep Learning Approaches to Improve the Efficiency of Drug Discovery

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

In this project, we will explore the drug discovery problem using modern statistical techniques and deep learning approaches. Focus will be placed on both improving the decision making of chemists in initial hit identification and hit-to-lead optimisation, and on developing the capabilities of automated decision making in drug design.

There is currently limited literature applying machine learning to drug design and research to-date has been largely focused on analysis of 2D ligands and string representations of molecules. While these approaches have shown some success, they omit crucial structural and 3D information that are essential to protein-based interactions. Where machine learning approaches have been applied to structure-based drug discovery, they have typically not learnt features in an end-to-end fashion and have utilised pre-determined descriptors.
So far in this project, we have developed an approach based on convolutional neural networks that achieved substantial improvement on popular virtual screening benchmarks. Our method treated virtual screening as a computer vision problem and used a minimally featurised input format. The model was thus forced to learn the features relevant for binding. Our analysis highlighted that more data is required to fully utilise the power of CNNs in this setting. As such, we are currently curating an expanded dataset using publicly available databases.

We hope that further analysis and experiments will allow us to glean insights into key fundamental properties of protein interactions, such as binding modes, interaction types etc., while also validating the suitability of a machine learning approach to areas beyond the current literature. In addition, we expect the project to highlight unusual, and possibly novel, features of protein-ligand interactions that could then be studied on a fundamental basis by other groups/researchers.

We also plan to conduct prospective evaluation of our methods, using them to predict how untested molecules will interact with a given protein, experimentally validating the theoretical hits. One area that we plan to explore is the use of machine learning techniques for guiding fragment-based approaches. In particular, we are currently designing a system to suggest elaborations of fragment hits in a principled way.

There is considerable publicly available data with which to train prediction algorithms and generative models. However, one challenge of applying machine learning approaches is that while the datasets are large overall, for a given protein there is much more limited data. Thus, successful methods will need either to train efficiently on small datasets, or to be able to utilise data from protein interactions not involving the target protein. This appears feasible, but is not without complications. As a result, we aim to develop novel machine learning techniques to combat these challenges.

Student:

Fergus Imrie

Period of Study:

Oct 17 - Jul 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2105209

Research Topic:

Unclassified

Organisations

People	ORCID iD
Charlotte Deane (Primary Supervisor)	http://orcid.org/0000-0003-1388-2252
Fergus Imrie (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Imrie F (2018) Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data. in Journal of chemical information and modeling

Imrie F (2019) Deep Generative Models for 3D Compound Design

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			01/10/2016	30/09/2021
2105209	Studentship	EP/N509711/1	01/10/2017	14/07/2021	Fergus Imrie
EP/R513295/1			01/10/2018	30/09/2023
2105209	Studentship	EP/R513295/1	01/10/2017	14/07/2021	Fergus Imrie

Collaboration


Description	Exscientia ICASE award
Organisation	Exscientia Ltd
Country	United Kingdom
Sector	Private
PI Contribution	Intellectual input and conducting research.
Collaborator Contribution	Intellectual input on research.
Impact	Publication (preprint) - http://dx.doi.org/10.1101/830497
Start Year	2018

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects