Deep Learning Approaches to Improve the Efficiency of Drug Discovery
Lead Research Organisation:
University of Oxford
Department Name: Statistics
Abstract
In this project, we will explore the drug discovery problem using modern statistical techniques and deep learning approaches. Focus will be placed on both improving the decision making of chemists in initial hit identification and hit-to-lead optimisation, and on developing the capabilities of automated decision making in drug design.
There is currently limited literature applying machine learning to drug design and research to-date has been largely focused on analysis of 2D ligands and string representations of molecules. While these approaches have shown some success, they omit crucial structural and 3D information that are essential to protein-based interactions. Where machine learning approaches have been applied to structure-based drug discovery, they have typically not learnt features in an end-to-end fashion and have utilised pre-determined descriptors.
So far in this project, we have developed an approach based on convolutional neural networks that achieved substantial improvement on popular virtual screening benchmarks. Our method treated virtual screening as a computer vision problem and used a minimally featurised input format. The model was thus forced to learn the features relevant for binding. Our analysis highlighted that more data is required to fully utilise the power of CNNs in this setting. As such, we are currently curating an expanded dataset using publicly available databases.
We hope that further analysis and experiments will allow us to glean insights into key fundamental properties of protein interactions, such as binding modes, interaction types etc., while also validating the suitability of a machine learning approach to areas beyond the current literature. In addition, we expect the project to highlight unusual, and possibly novel, features of protein-ligand interactions that could then be studied on a fundamental basis by other groups/researchers.
We also plan to conduct prospective evaluation of our methods, using them to predict how untested molecules will interact with a given protein, experimentally validating the theoretical hits. One area that we plan to explore is the use of machine learning techniques for guiding fragment-based approaches. In particular, we are currently designing a system to suggest elaborations of fragment hits in a principled way.
There is considerable publicly available data with which to train prediction algorithms and generative models. However, one challenge of applying machine learning approaches is that while the datasets are large overall, for a given protein there is much more limited data. Thus, successful methods will need either to train efficiently on small datasets, or to be able to utilise data from protein interactions not involving the target protein. This appears feasible, but is not without complications. As a result, we aim to develop novel machine learning techniques to combat these challenges.
There is currently limited literature applying machine learning to drug design and research to-date has been largely focused on analysis of 2D ligands and string representations of molecules. While these approaches have shown some success, they omit crucial structural and 3D information that are essential to protein-based interactions. Where machine learning approaches have been applied to structure-based drug discovery, they have typically not learnt features in an end-to-end fashion and have utilised pre-determined descriptors.
So far in this project, we have developed an approach based on convolutional neural networks that achieved substantial improvement on popular virtual screening benchmarks. Our method treated virtual screening as a computer vision problem and used a minimally featurised input format. The model was thus forced to learn the features relevant for binding. Our analysis highlighted that more data is required to fully utilise the power of CNNs in this setting. As such, we are currently curating an expanded dataset using publicly available databases.
We hope that further analysis and experiments will allow us to glean insights into key fundamental properties of protein interactions, such as binding modes, interaction types etc., while also validating the suitability of a machine learning approach to areas beyond the current literature. In addition, we expect the project to highlight unusual, and possibly novel, features of protein-ligand interactions that could then be studied on a fundamental basis by other groups/researchers.
We also plan to conduct prospective evaluation of our methods, using them to predict how untested molecules will interact with a given protein, experimentally validating the theoretical hits. One area that we plan to explore is the use of machine learning techniques for guiding fragment-based approaches. In particular, we are currently designing a system to suggest elaborations of fragment hits in a principled way.
There is considerable publicly available data with which to train prediction algorithms and generative models. However, one challenge of applying machine learning approaches is that while the datasets are large overall, for a given protein there is much more limited data. Thus, successful methods will need either to train efficiently on small datasets, or to be able to utilise data from protein interactions not involving the target protein. This appears feasible, but is not without complications. As a result, we aim to develop novel machine learning techniques to combat these challenges.
Publications
Imrie F
(2018)
Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data.
in Journal of chemical information and modeling
Imrie F
(2019)
Deep Generative Models for 3D Compound Design
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509711/1 | 30/09/2016 | 29/09/2021 | |||
2105209 | Studentship | EP/N509711/1 | 30/09/2017 | 13/07/2021 | Fergus Imrie |
EP/R513295/1 | 30/09/2018 | 29/09/2023 | |||
2105209 | Studentship | EP/R513295/1 | 30/09/2017 | 13/07/2021 | Fergus Imrie |
Description | Exscientia ICASE award |
Organisation | Exscientia Ltd |
Country | United Kingdom |
Sector | Private |
PI Contribution | Intellectual input and conducting research. |
Collaborator Contribution | Intellectual input on research. |
Impact | Publication (preprint) - http://dx.doi.org/10.1101/830497 |
Start Year | 2018 |