Next generation atomistic modelling for medicinal chemistry and biology
Lead Research Organisation:
Newcastle University
Department Name: Sch of Natural & Environmental Sciences
Abstract
Nobel Laureate Richard Feynman in his Lectures on Physics famously remarked that "...everything that living things do can be understood in terms of the jigglings and wigglings of atoms". This deceptively simple statement highlights the difficulty that structural biologists, medicinal chemists and computational scientists are faced with when attempting to understand human health and disease. We are used to thinking about a static, isolated picture of objects at the atomic scale, but often it is the dynamics (the "jigglings and wigglings") of the system and its environmental interactions that determine the underlying science, such as the role of intrinsically disordered proteins in neurodegenerative diseases or the possible link between quantum entanglement and molecular vibrations in biological photosynthesis.
Twentieth century science not only set the challenge of studying life at the level of the structure and dynamics of atoms, but also provided (in theory) the solution, through the laws of quantum mechanics and the famous Schroedinger equation. Quantum mechanics explains the fundamental behaviour of matter at the atomic scale, and smaller. It enables scientists to make predictions about materials that are inaccessible to experiment, such as the structure of solid hydrogen in a star's core. At a more everyday level, quantum mechanics is routinely used by researchers in the microelectronics and renewable energy industries to rapidly scan multitudes of hypothetical materials compositions. In this way, the costly manufacturing process of the new materials need only begin once the desired properties have been predicted.
However, quantum mechanics does not directly enable scientists to understand the biomolecular origins of disease, or to design new medicines to combat it. The reason for this comes down to Feynman's statement. It is infeasible to solve (even approximate) equations of quantum mechanics for the length and time scales sufficient to model all of the atomistic movements that need to take place, for example, for a drug molecule to find its target. Instead, computational chemists use a much simplified computational model, known as a force field, to estimate the dynamics of atoms. The force field models the atoms as bonded together in a molecule by springs, and interacting with other atoms through electrostatic and van der Waals forces, which are much stronger than gravity at the atomic scale. The strengths of these interactions are modelled by thousands of adjustable parameters, which have been manually tuned to reproduce experimental data over a period of many decades. We are reaching a stagnation point where accuracy is urgently needed for computer-aided design of new medicines, but parameter tuning delivers only small improvements.
My vision for this UKRI Future Leaders Fellowship is to build a multi-disciplinary team that will work together to close the accuracy gap between quantum mechanics, and the approximate force fields used in biology and medicine. By working with international coding efforts, I will build the theory and software infrastructure required to dispense with these adjustable force field parameters, and instead derive them directly for the system under study, such as a protein implicated in disease. This will enable me to build more accurate computational models of the electrostatic and van der Waals interactions that determine the strength of binding of potential drugs to their targets. By crossing disciplinary boundaries to train in data science and machine learning, I will deploy the expertise that has been made famous for its applications in face and speech recognition, to create a spectrum of tools for speeding up the assignment of parameters and improving the accuracy of force field design. Finally, by undertaking secondments in the pharmaceutical industry, I will ensure that the developed methods will be used for the cost efficient design of the next generation of medicines.
Twentieth century science not only set the challenge of studying life at the level of the structure and dynamics of atoms, but also provided (in theory) the solution, through the laws of quantum mechanics and the famous Schroedinger equation. Quantum mechanics explains the fundamental behaviour of matter at the atomic scale, and smaller. It enables scientists to make predictions about materials that are inaccessible to experiment, such as the structure of solid hydrogen in a star's core. At a more everyday level, quantum mechanics is routinely used by researchers in the microelectronics and renewable energy industries to rapidly scan multitudes of hypothetical materials compositions. In this way, the costly manufacturing process of the new materials need only begin once the desired properties have been predicted.
However, quantum mechanics does not directly enable scientists to understand the biomolecular origins of disease, or to design new medicines to combat it. The reason for this comes down to Feynman's statement. It is infeasible to solve (even approximate) equations of quantum mechanics for the length and time scales sufficient to model all of the atomistic movements that need to take place, for example, for a drug molecule to find its target. Instead, computational chemists use a much simplified computational model, known as a force field, to estimate the dynamics of atoms. The force field models the atoms as bonded together in a molecule by springs, and interacting with other atoms through electrostatic and van der Waals forces, which are much stronger than gravity at the atomic scale. The strengths of these interactions are modelled by thousands of adjustable parameters, which have been manually tuned to reproduce experimental data over a period of many decades. We are reaching a stagnation point where accuracy is urgently needed for computer-aided design of new medicines, but parameter tuning delivers only small improvements.
My vision for this UKRI Future Leaders Fellowship is to build a multi-disciplinary team that will work together to close the accuracy gap between quantum mechanics, and the approximate force fields used in biology and medicine. By working with international coding efforts, I will build the theory and software infrastructure required to dispense with these adjustable force field parameters, and instead derive them directly for the system under study, such as a protein implicated in disease. This will enable me to build more accurate computational models of the electrostatic and van der Waals interactions that determine the strength of binding of potential drugs to their targets. By crossing disciplinary boundaries to train in data science and machine learning, I will deploy the expertise that has been made famous for its applications in face and speech recognition, to create a spectrum of tools for speeding up the assignment of parameters and improving the accuracy of force field design. Finally, by undertaking secondments in the pharmaceutical industry, I will ensure that the developed methods will be used for the cost efficient design of the next generation of medicines.
Planned Impact
This Fellowship will close the accuracy gap between quantum mechanical modelling and the approximate classical force fields used at the forefront of biomolecular modelling. It provides a key underpinning technology for understanding the roles of biological molecules in human health and disease, and for the rational structure-based design of new medicines. As such, it is important to me that my Fellowship takes an open science, open data approach to knowledge generation. By effectively helping to democratise force field parameterisation, design and dissemination, I will improve the culture of inclusivity in the field of computational modelling.
1) Economic Impact. Analysis of research and development costs in the pharmaceutical industry places the price of each new drug that reaches the consumer at $1.8bn, of which $0.4bn is spent on improving binding affinity between the molecule and its therapeutic target whilst minimising off-target effects (hit-to-lead optimisation). This Fellowship will have the following commercial benefits in the pharmaceutical industry:
- Within 4 years, the technology developed in this Fellowship will allow medicinal chemistry researchers to accurately screen hundreds of potential drug candidates on the computer and only synthesise in the lab those predicted to strongly bind to their target, thus increasing the efficiency of hit-to-lead optimisation and reducing experimental workload. This will drastically reduce costs in the pharmaceutical industry.
- I will further develop computational methods to begin to address therapeutic targets that are traditionally considered "undruggable". Within 4 years, I will develop a general approach incorporating atomistic modelling with deep learning for knowledge-based design of peptide libraries against protein targets, which may have hidden allosteric binding pockets. My project partners have many such targets in their discovery pipeline in oncology and neurodegenerative disease areas.
- Many next-generation medicinal technologies are not currently amenable to computational study. For example, the design of drugs against proteins with metals in their binding sites, and molecules for light-induced drug delivery, photodynamic therapy and targeted nanomedicines. To address this issue, within 7 years, I will develop automated and accurate computational methods to model metals in biology and molecules in electronically excited states in complex environments, thus providing underpinning technologies for new industrial sectors.
2) Skills Generation
- This Fellowship will provide a source of post-doctoral researchers and research software engineers with combined expertise in data science, medicinal chemistry and scientific computing. These skills will be highly sought after as the UK pharmaceutical industry enhances the roles of molecular modelling and AI in drug discovery.
3) Wider Society
- In the longer term (7-10 years), new medicines will be designed with substantial computational input, ultimately resulting in the improved health and wellbeing of UK citizens. Wider society will benefit from an accelerated and larger pipeline of effective drugs. This will be particularly important as focus shifts to personalised medicines that will require accurate predictive molecular simulation to feed into multi-scale models that are able to determine links between single amino acid substitutions and disease.
- This Fellowship will inspire the next-generation of researchers. School students in the North East will benefit from hands-on virtual reality demonstrations of the drug discovery process, and will gain an appreciation of local success stories through examples of medicines developed in Newcastle.
- Young people who may already be considering a career in research will benefit through our collaborations with the Journal of Sketching Science (with readership numbers of 100K+), which will produce visually appealing articles describing our research.
1) Economic Impact. Analysis of research and development costs in the pharmaceutical industry places the price of each new drug that reaches the consumer at $1.8bn, of which $0.4bn is spent on improving binding affinity between the molecule and its therapeutic target whilst minimising off-target effects (hit-to-lead optimisation). This Fellowship will have the following commercial benefits in the pharmaceutical industry:
- Within 4 years, the technology developed in this Fellowship will allow medicinal chemistry researchers to accurately screen hundreds of potential drug candidates on the computer and only synthesise in the lab those predicted to strongly bind to their target, thus increasing the efficiency of hit-to-lead optimisation and reducing experimental workload. This will drastically reduce costs in the pharmaceutical industry.
- I will further develop computational methods to begin to address therapeutic targets that are traditionally considered "undruggable". Within 4 years, I will develop a general approach incorporating atomistic modelling with deep learning for knowledge-based design of peptide libraries against protein targets, which may have hidden allosteric binding pockets. My project partners have many such targets in their discovery pipeline in oncology and neurodegenerative disease areas.
- Many next-generation medicinal technologies are not currently amenable to computational study. For example, the design of drugs against proteins with metals in their binding sites, and molecules for light-induced drug delivery, photodynamic therapy and targeted nanomedicines. To address this issue, within 7 years, I will develop automated and accurate computational methods to model metals in biology and molecules in electronically excited states in complex environments, thus providing underpinning technologies for new industrial sectors.
2) Skills Generation
- This Fellowship will provide a source of post-doctoral researchers and research software engineers with combined expertise in data science, medicinal chemistry and scientific computing. These skills will be highly sought after as the UK pharmaceutical industry enhances the roles of molecular modelling and AI in drug discovery.
3) Wider Society
- In the longer term (7-10 years), new medicines will be designed with substantial computational input, ultimately resulting in the improved health and wellbeing of UK citizens. Wider society will benefit from an accelerated and larger pipeline of effective drugs. This will be particularly important as focus shifts to personalised medicines that will require accurate predictive molecular simulation to feed into multi-scale models that are able to determine links between single amino acid substitutions and disease.
- This Fellowship will inspire the next-generation of researchers. School students in the North East will benefit from hands-on virtual reality demonstrations of the drug discovery process, and will gain an appreciation of local success stories through examples of medicines developed in Newcastle.
- Young people who may already be considering a career in research will benefit through our collaborations with the Journal of Sketching Science (with readership numbers of 100K+), which will produce visually appealing articles describing our research.
Publications
Bieniek M
(2022)
An open-source molecular builder and free energy preparation workflow
in Communications Chemistry
Boothroyd S
(2023)
Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field.
in Journal of chemical theory and computation
Clark F
(2023)
Comparison of Receptor-Ligand Restraint Schemes for Alchemical Absolute Binding Free Energy Calculations.
in Journal of chemical theory and computation
Hall S
(2024)
Riemannian geometry and molecular similarity I: spectrum of the Laplacian
in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Horton JT
(2023)
A transferable double exponential potential for condensed phase simulations of small molecules.
in Digital discovery
Horton JT
(2022)
Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale.
in Journal of chemical information and modeling
Jorge M
(2023)
What is the Optimal Dipole Moment for Nonpolarizable Models of Liquids?
in Journal of chemical theory and computation
Kovács DP
(2021)
Linear Atomic Cluster Expansion Force Fields for Organic Molecules: Beyond RMSE.
in Journal of chemical theory and computation
Nelson L
(2021)
Implementation of the QUBE Force Field in SOMD for High-Throughput Alchemical Free-Energy Calculations.
in Journal of chemical information and modeling
Description | In the first three years of my fellowship, my team has developed and released open software for modelling accurate bonded (e.g. OpenFF-BespokeFit) and non-bonded (e.g. QUBE, DE-FF) interactions in and between molecules. We have contributed to modern, robust, automated workflows for collecting and curating quantum mechanical data for model training (QCSubmit), and to a state-of-the-art molecular mechanics force field (OpenFF-Sage), which is already widely used both in academia and the pharmaceutical industry. All scientific advances have been rigorously benchmarked, not only on simple liquid properties, but also directly in protein-ligand binding affinity calculations. Encouraging accuracy gains have stemmed, in particular, from bespoke parameterisation of force field terms (BespokeFit) and advances in functional form (QUBE and DE-FF). |
Exploitation Route | The main software outcomes are available open source: https://github.com/qubekit/QUBEKit https://github.com/cole-group/FEgrow https://github.com/openforcefield/openff-bespokefit with accompanying tutorials and publications, and uptake by industry. Follow-on collaborations with Kuano and ASAP Discovery will apply the methods to computer-aided design. Some online workshops demonstrating use are available, and an in-person workshop was run as part of the CCPBioSim Training Week. |
Sectors | Chemicals Energy Pharmaceuticals and Medical Biotechnology |
Description | The Open Force Field Initiative (https://openforcefield.org) is a network of researchers working together to advance the science and software infrastructure required to build the next generation of molecular mechanics force fields. I have joined OpenFF as a co-investigator, which has enabled me to use and contribute to quality, well-documented tools to accelerate community force field science, and gives access to a wide range of industry partners. For example, our co-developed OpenFF-BespokeFit software package allows custom refits of torsion parameters to provide especially high accuracy on specific chemistries of interest. This tool is already being used by several pharmaceutical companies and has also been implemented within the Cresset Flare software, and has been disseminated via the CCPBioSim Training Week to users in academia and industry. Additionally, we have been awarded an Innovate UK KTP grant to deploy charge models for electrostatic potential comparisons as part of Kuano's computer-aided design workflows. |
First Year Of Impact | 2022 |
Sector | Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | Establishing the Accessible Computational Regimes for Biomolecular Simulations at Exascale |
Amount | £471,209 (GBP) |
Funding ID | EP/Y008693/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 05/2023 |
End | 11/2024 |
Description | International Partnership seed funding (to team members Chris Ringrose and Josh Horton) |
Amount | £3,821 (GBP) |
Organisation | Newcastle University |
Sector | Academic/University |
Country | United Kingdom |
Start | 06/2021 |
End | 09/2021 |
Description | MoSMed PhD studentships |
Amount | £60,000 (GBP) |
Organisation | Newcastle University |
Sector | Academic/University |
Country | United Kingdom |
Start | 01/2023 |
End | 12/2027 |
Description | North East Ultrafast Transient Absorption Spectroscopy Facility |
Amount | £902,433 (GBP) |
Funding ID | EP/W006340/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 12/2021 |
End | 11/2023 |
Description | Supporting the OpenMM Community-led Development of Next-Generation Condensed Matter Modelling Software |
Amount | £464,870 (GBP) |
Funding ID | EP/W030276/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2022 |
End | 10/2025 |
Description | University of Newcastle upon Tyne (The) and Kuano Limited KTP 22_23 R4 |
Amount | £91,457 (GBP) |
Funding ID | 10054822 |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 07/2023 |
End | 01/2025 |
Title | OpenFF benchmark ligand fragments v2.0 |
Description | Torsion scans for 490 small, drug-like molecules for community use in fitting and evaluating force field accuracy. Created and submitted by project researcher, Dr Josh Horton. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | None yet. |
URL | https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-08-10-OpenFF-J... |
Description | Bespoke torsion parameterisation |
Organisation | Cresset Biomolecular Discovery Ltd |
Country | United Kingdom |
Sector | Private |
PI Contribution | Advice on implementation and use of OpenFF-BespokeFit in Cresset's Flare software, including industry visit. |
Collaborator Contribution | Testing and implementation of OpenFF-BespokeFit in Cresset's Flare software, including researcher time. |
Impact | A subset of the BespokeFit workflow focusing on bespoke torsions has been implemented within the Cresset Flare software (https://www.cresset-group.com/software/flare/), as described in the publication: https://doi.org/10.1021/acs.jcim.2c01153 |
Start Year | 2022 |
Title | DE-forcefields |
Description | Transferable Double Exponential non-bonded potential for condensed phase simulations of small molecules. |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Highlights flexibility of open force field software stack for derivation of force fields with novel functional forms. |
URL | https://doi.org/10.1039/D3DD00070B |
Title | FEgrow: An Open-Source Molecular Builder and Free Energy Preparation Workflow |
Description | FEgrow code snapshot. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | FEgrow is an interactive workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. The code has one associated publication and 63 github stars. It is too early to tell what it is being used for. |
URL | https://zenodo.org/record/7105598 |
Title | OpenFF-BespokeFit |
Description | BespokeFit is an automated solution for creating bespoke force field parameters for small molecules of interest in the SMIRNOFF-format that can be used seamlessly with more general force fields (such as Parsley and Sage) that are based on SMIRNOFF. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | - one publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c01153?ref=PDF - the software has 34 stars, and a subset of the BespokeFit workflow focusing on bespoke torsions has also been implemented within the Cresset Flare software. |
URL | https://docs.openforcefield.org/projects/bespokefit/en/latest/index.html |
Title | QUBEKit - version 2 |
Description | A significant re-write of our QUBEKit software for force field parameterisation has improved ease-of-use, extensibility and accuracy of the resulting force fields. The software package is now fully open source, including all dependences, and is available on conda-forge for download under a permissive license. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | There have been 1k downloads from conda-forge since release. It is too early to tell what the software has been used for. |
URL | https://github.com/qubekit/QUBEKit |
Title | RGMolSA |
Description | Ligand-based virtual screening aims to reduce the cost and duration of drug discovery campaigns. Shape similarity can be used to screen large databases, with the goal of predicting potential new hits by comparing to molecules with known favourable properties. RGMolSA is a new alignment-free and mesh-free surface-based molecular shape descriptor derived from the mathematical theory of Riemannian geometry. The treatment of a molecule as a series of intersecting spheres allows the description of its surface geometry using the Riemannian metric, obtained by considering the spectrum of the Laplacian. This gives a simple vector descriptor constructed of the weighted surface area and eight non-zero eigenvalues, which capture the surface shape. |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Multiple conference presentations and a journal paper. |
URL | https://doi.org/10.1098/rspa.2023.0343 |
Description | 4th Conference on Multiscale Modelling of Condensed Phase and Biological Systems |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | I joined the organising committee for the 4th Conference on Multiscale Modelling of Condensed Phase and Biological Systems organised by the CCP-BioSim and CCP5 computational research communities. The goal was to bring together practitioners of molecular modelling from biological and materials modelling communities to share best practices. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.ccpbiosim.ac.uk/events/past-conferences/eventdetail/128/-/rescheduled-4th-conference-on-... |
Description | Blog post on bespoke-fit |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Researcher on the grant, Dr Josh Horton, wrote a blog demonstrating the methods and use cases behind his software package, bespoke-fit, with our project partners at Open Force Field. The blog was promoted to academic and industry members of Open Force Field on twitter. |
Year(s) Of Engagement Activity | 2021 |
URL | https://openforcefield.org/community/news/science-updates/bespokefit-update-2021-10-20/ |
Description | Industry visit to Cresset |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Industry visit to Cresset to promote and teach the use of our software, and discuss future collaboration opportunities. |
Year(s) Of Engagement Activity | 2022 |
Description | Participation in CCPBioSim Training Week |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Around 30-50 participants from PhD programmes and industry attended a 2-day workshop run by me, my team and the open force field initiative on the use of force fields in (bio)molecular simulation. The workshop was part of the wider CCPBioSim training week. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.ccpbiosim.ac.uk/events/past-events/eventdetail/104/-/training-week |
Description | Recorded workshop and youtube video demonstrating use of bespoke-fit |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | A recorded workshop and talk was presented to industry and academic affiliates of the open force field initiative by researcher, Dr Josh Horton. It covers a demonstration of the bespoke-fit package, co-developed with Open Force Field. It has been released on Youtube and viewed >100 times. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=xQ8pnYcmWSU |