Data-Driven Lead Optimisation for Drug Discovery
Lead Research Organisation:
University of Sheffield
Department Name: Information School
Abstract
Research questions:
This research provides a lead optimisation tool that allows chemists to explore the chemical space that has previously been worked in for a certain target. This tool should then be able to suggest molecules or areas of chemical space that should be explored with reasoning. This should allow the chemist to have a greater understanding as to why a certain molecule has been selected to explore for a certain target. The main molecular representation that will be used for the tool is reduced graphs, which is a graphical representation that has been reduced down to the key interacting nodes. The visualisation tool should be an interactive interface that the chemists can have an overall view of the chemical space whilst also still allowing them to complex details of each molecule.
Original methodology:
The methodology that has been done so far has been to reduce the chemical structures down into reduced graphs. These reduced graphs have been made so they are customisable for the user as they can set different parameters and different definitions depending on what they are looking for and what they find key. The next step was to then find the maximum common substructure (MCS), both the connected and disconnected versions, using a python module RDKit. These MCS' are found so that some clustering techniques can occur as they are based upon similarity or dissimilarity scores which are found using the MCS in the Tanimoto coefficient equation. Several different clustering techniques are then performed and cluster validity techniques are applied in order to establish the best clusters for that dataset. A core reduced graph of these clusters are found to aid the visualisation technique. A visualisation is then produced that summaries this information, the chemical space that has been worked in. From here a method needs to be established that enables a reduced graph to be converted back into a chemical graph. This chemical graph will be ran in an activity model to see how well this molecule should perform and if the new prediction of activity is adequate then the molecule will be put forward as a suggestion. The visualisation will then be used to back up this suggestion.
This research provides a lead optimisation tool that allows chemists to explore the chemical space that has previously been worked in for a certain target. This tool should then be able to suggest molecules or areas of chemical space that should be explored with reasoning. This should allow the chemist to have a greater understanding as to why a certain molecule has been selected to explore for a certain target. The main molecular representation that will be used for the tool is reduced graphs, which is a graphical representation that has been reduced down to the key interacting nodes. The visualisation tool should be an interactive interface that the chemists can have an overall view of the chemical space whilst also still allowing them to complex details of each molecule.
Original methodology:
The methodology that has been done so far has been to reduce the chemical structures down into reduced graphs. These reduced graphs have been made so they are customisable for the user as they can set different parameters and different definitions depending on what they are looking for and what they find key. The next step was to then find the maximum common substructure (MCS), both the connected and disconnected versions, using a python module RDKit. These MCS' are found so that some clustering techniques can occur as they are based upon similarity or dissimilarity scores which are found using the MCS in the Tanimoto coefficient equation. Several different clustering techniques are then performed and cluster validity techniques are applied in order to establish the best clusters for that dataset. A core reduced graph of these clusters are found to aid the visualisation technique. A visualisation is then produced that summaries this information, the chemical space that has been worked in. From here a method needs to be established that enables a reduced graph to be converted back into a chemical graph. This chemical graph will be ran in an activity model to see how well this molecule should perform and if the new prediction of activity is adequate then the molecule will be put forward as a suggestion. The visualisation will then be used to back up this suggestion.
People |
ORCID iD |
Valerie Gillet (Primary Supervisor) | |
Jessica Stacey (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509735/1 | 01/10/2016 | 30/09/2021 | |||
1960258 | Studentship | EP/N509735/1 | 01/10/2017 | 30/09/2021 | Jessica Stacey |
Description | Lecture at Spring 2021 American Chemical Society |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Gave a lecture on 'Using Reduced Graphs to Cultivate Lead Optimisation Series' to people in industry and academia. The new visualisation that has been created within the PhD was presented alongside two new scores that have been generated to measure the extent of the exploration and exploitation in chemical space within the lead optimisation series and the impact a new molecule could have. There were enquiries about getting this method published and having potential access to the source code. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.acscinf.org/meetingsevents/meeting-archive/acs-spring-meeting-2021 |
Description | Lecture at the 16th German Conference on Chemoinformatics |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Gave a lecture on 'Using Reduced Graphs to Aid Lead Optimisation Projects' to people in industry and academia. A new visualisation was presented which caused questions and discussions to arise on the method on how it could be helpful for chemists as well as potential improvements to be made. There were enquiries about getting the source code as the audience could see the potential. Therefore, the hope is to publish the method soon. |
Year(s) Of Engagement Activity | 2020 |
URL | https://veranstaltungen.gdch.de/tms/frontend/index.cfm?l=10731&sp_id=1&selSiteID=sciprog_v2 |
Description | Poster Presentation at Women in Chemistry Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presented a poster on 'Enhancing the Lead Optimisation Process Using Data Mining Approaches and Reduced Graphs' 'to an audience of chemists that are not all in the same area as myself. This allowed a further outreach of the work undertaken in this PhD to a wider audience. |
Year(s) Of Engagement Activity | 2021 |