Novel Data Science and Mathematical Approaches to Drug Discovery

Lead Research Organisation: Newcastle University
Department Name: Sch of Natural & Environmental Sciences

Abstract

The process of bringing a new drug to market is lengthy and expensive, with estimated costs of around $2.6 billion and an average timescale of 10-15 years. The development of novel drugs is currently limited by high failure rates during clinical trials, unknown target structures and the use of resource-intensive high-throughput screening (HTS) to identify active molecules. Computer-aided drug discovery (CADD) provides a "virtual shortcut" for drug molecule identification, along with the prediction of potential side effects and biological activity, reducing the time and financial resources required compared to high-throughput screening. One approach used in CADD is to make use of a set of ligands known to interact with a target as a reference, and to compare the shape of these to other molecules. This method makes use of the Similar Property Principle: structurally similar molecules will display similar properties, and therefore are likely to interact with the same protein target. To enable this comparison the shape of the ligand can be described in many ways, including descriptions based on the distances between atoms, volume overlap of Gaussian spheres and those based on the molecular surface. However existing methods in this field present challenges in the selection of the correct query molecule, visualisation of the structure and in some cases with the speed at which the structures can be compared.

This project aims to address these issues with the development of a novel description of ligand shape based on its molecular surface. This will be achieved by considering a mathematical approach to the description of the geometry of a surface, in which the quantised Kähler potential (a concept drawn from the fields of differential geometry and superstring theory) for a particular molecule is computed. The potential can be tailored to balance accuracy with speed. This produces a set of coefficients for each molecule, which can be used to compare two structures through a similarity metric (for example the Euclidean Distance, the Manhattan Distance or the Tanimoto Coefficient, all of which are commonly used in shape similarity approaches). A python-coded version of this shape descriptor will be produced, before testing on protein targets with well-defined sets of active and inactive molecules (e.g. G-Protein Coupled Receptors and EGFR kinase inhibitors). The new method will then be applied in virtual screening of databases and used as input to machine learning models. Benchmarking studies will also be completed to compare our new descriptor to the existing methods outlined above. If successful, the ultimate aim for the project is to apply the new descriptor to drug discovery efforts at Newcastle University.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R51309X/1 01/10/2018 30/09/2023
2281156 Studentship EP/R51309X/1 01/10/2019 31/03/2023 Rachael Pirie