Next generation atomistic modelling for medicinal chemistry and biology

Lead Research Organisation: Newcastle University
Department Name: Sch of Natural & Environmental Sciences

Abstract

Nobel Laureate Richard Feynman in his Lectures on Physics famously remarked that "...everything that living things do can be understood in terms of the jigglings and wigglings of atoms". This deceptively simple statement highlights the difficulty that structural biologists, medicinal chemists and computational scientists are faced with when attempting to understand human health and disease. We are used to thinking about a static, isolated picture of objects at the atomic scale, but often it is the dynamics (the "jigglings and wigglings") of the system and its environmental interactions that determine the underlying science, such as the role of intrinsically disordered proteins in neurodegenerative diseases or the possible link between quantum entanglement and molecular vibrations in biological photosynthesis.

Twentieth century science not only set the challenge of studying life at the level of the structure and dynamics of atoms, but also provided (in theory) the solution, through the laws of quantum mechanics and the famous Schroedinger equation. Quantum mechanics explains the fundamental behaviour of matter at the atomic scale, and smaller. It enables scientists to make predictions about materials that are inaccessible to experiment, such as the structure of solid hydrogen in a star's core. At a more everyday level, quantum mechanics is routinely used by researchers in the microelectronics and renewable energy industries to rapidly scan multitudes of hypothetical materials compositions. In this way, the costly manufacturing process of the new materials need only begin once the desired properties have been predicted.

However, quantum mechanics does not directly enable scientists to understand the biomolecular origins of disease, or to design new medicines to combat it. The reason for this comes down to Feynman's statement. It is infeasible to solve (even approximate) equations of quantum mechanics for the length and time scales sufficient to model all of the atomistic movements that need to take place, for example, for a drug molecule to find its target. Instead, computational chemists use a much simplified computational model, known as a force field, to estimate the dynamics of atoms. The force field models the atoms as bonded together in a molecule by springs, and interacting with other atoms through electrostatic and van der Waals forces, which are much stronger than gravity at the atomic scale. The strengths of these interactions are modelled by thousands of adjustable parameters, which have been manually tuned to reproduce experimental data over a period of many decades. We are reaching a stagnation point where accuracy is urgently needed for computer-aided design of new medicines, but parameter tuning delivers only small improvements.

My vision for this UKRI Future Leaders Fellowship is to build a multi-disciplinary team that will work together to close the accuracy gap between quantum mechanics, and the approximate force fields used in biology and medicine. By working with international coding efforts, I will build the theory and software infrastructure required to dispense with these adjustable force field parameters, and instead derive them directly for the system under study, such as a protein implicated in disease. This will enable me to build more accurate computational models of the electrostatic and van der Waals interactions that determine the strength of binding of potential drugs to their targets. By crossing disciplinary boundaries to train in data science and machine learning, I will deploy the expertise that has been made famous for its applications in face and speech recognition, to create a spectrum of tools for speeding up the assignment of parameters and improving the accuracy of force field design. Finally, by undertaking secondments in the pharmaceutical industry, I will ensure that the developed methods will be used for the cost efficient design of the next generation of medicines.

Planned Impact

This Fellowship will close the accuracy gap between quantum mechanical modelling and the approximate classical force fields used at the forefront of biomolecular modelling. It provides a key underpinning technology for understanding the roles of biological molecules in human health and disease, and for the rational structure-based design of new medicines. As such, it is important to me that my Fellowship takes an open science, open data approach to knowledge generation. By effectively helping to democratise force field parameterisation, design and dissemination, I will improve the culture of inclusivity in the field of computational modelling.

1) Economic Impact. Analysis of research and development costs in the pharmaceutical industry places the price of each new drug that reaches the consumer at $1.8bn, of which $0.4bn is spent on improving binding affinity between the molecule and its therapeutic target whilst minimising off-target effects (hit-to-lead optimisation). This Fellowship will have the following commercial benefits in the pharmaceutical industry:

- Within 4 years, the technology developed in this Fellowship will allow medicinal chemistry researchers to accurately screen hundreds of potential drug candidates on the computer and only synthesise in the lab those predicted to strongly bind to their target, thus increasing the efficiency of hit-to-lead optimisation and reducing experimental workload. This will drastically reduce costs in the pharmaceutical industry.

- I will further develop computational methods to begin to address therapeutic targets that are traditionally considered "undruggable". Within 4 years, I will develop a general approach incorporating atomistic modelling with deep learning for knowledge-based design of peptide libraries against protein targets, which may have hidden allosteric binding pockets. My project partners have many such targets in their discovery pipeline in oncology and neurodegenerative disease areas.

- Many next-generation medicinal technologies are not currently amenable to computational study. For example, the design of drugs against proteins with metals in their binding sites, and molecules for light-induced drug delivery, photodynamic therapy and targeted nanomedicines. To address this issue, within 7 years, I will develop automated and accurate computational methods to model metals in biology and molecules in electronically excited states in complex environments, thus providing underpinning technologies for new industrial sectors.

2) Skills Generation

- This Fellowship will provide a source of post-doctoral researchers and research software engineers with combined expertise in data science, medicinal chemistry and scientific computing. These skills will be highly sought after as the UK pharmaceutical industry enhances the roles of molecular modelling and AI in drug discovery.

3) Wider Society

- In the longer term (7-10 years), new medicines will be designed with substantial computational input, ultimately resulting in the improved health and wellbeing of UK citizens. Wider society will benefit from an accelerated and larger pipeline of effective drugs. This will be particularly important as focus shifts to personalised medicines that will require accurate predictive molecular simulation to feed into multi-scale models that are able to determine links between single amino acid substitutions and disease.

- This Fellowship will inspire the next-generation of researchers. School students in the North East will benefit from hands-on virtual reality demonstrations of the drug discovery process, and will gain an appreciation of local success stories through examples of medicines developed in Newcastle.

- Young people who may already be considering a career in research will benefit through our collaborations with the Journal of Sketching Science (with readership numbers of 100K+), which will produce visually appealing articles describing our research.

Publications

10 25 50
publication icon
Bieniek M (2022) An open-source molecular builder and free energy preparation workflow in Communications Chemistry

publication icon
Boothroyd S (2023) Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field. in Journal of chemical theory and computation

publication icon
Hall S (2024) Riemannian geometry and molecular similarity I: spectrum of the Laplacian in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences

publication icon
Horton JT (2022) Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale. in Journal of chemical information and modeling

publication icon
Jorge M (2023) What is the Optimal Dipole Moment for Nonpolarizable Models of Liquids? in Journal of chemical theory and computation

publication icon
Kovács DP (2021) Linear Atomic Cluster Expansion Force Fields for Organic Molecules: Beyond RMSE. in Journal of chemical theory and computation

 
Description In the first three years of my fellowship, my team has developed and released open software for modelling accurate bonded (e.g. OpenFF-BespokeFit) and non-bonded (e.g. QUBE, DE-FF) interactions in and between molecules. We have contributed to modern, robust, automated workflows for collecting and curating quantum mechanical data for model training (QCSubmit), and to a state-of-the-art molecular mechanics force field (OpenFF-Sage), which is already widely used both in academia and the pharmaceutical industry. All scientific advances have been rigorously benchmarked, not only on simple liquid properties, but also directly in protein-ligand binding affinity calculations. Encouraging accuracy gains have stemmed, in particular, from bespoke parameterisation of force field terms (BespokeFit) and advances in functional form (QUBE and DE-FF).
Exploitation Route The main software outcomes are available open source:
https://github.com/qubekit/QUBEKit
https://github.com/cole-group/FEgrow
https://github.com/openforcefield/openff-bespokefit
with accompanying tutorials and publications, and uptake by industry. Follow-on collaborations with Kuano and ASAP Discovery will apply the methods to computer-aided design. Some online workshops demonstrating use are available, and an in-person workshop was run as part of the CCPBioSim Training Week.
Sectors Chemicals

Energy

Pharmaceuticals and Medical Biotechnology

 
Description The Open Force Field Initiative (https://openforcefield.org) is a network of researchers working together to advance the science and software infrastructure required to build the next generation of molecular mechanics force fields. I have joined OpenFF as a co-investigator, which has enabled me to use and contribute to quality, well-documented tools to accelerate community force field science, and gives access to a wide range of industry partners. For example, our co-developed OpenFF-BespokeFit software package allows custom refits of torsion parameters to provide especially high accuracy on specific chemistries of interest. This tool is already being used by several pharmaceutical companies and has also been implemented within the Cresset Flare software, and has been disseminated via the CCPBioSim Training Week to users in academia and industry. Additionally, we have been awarded an Innovate UK KTP grant to deploy charge models for electrostatic potential comparisons as part of Kuano's computer-aided design workflows.
First Year Of Impact 2022
Sector Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Establishing the Accessible Computational Regimes for Biomolecular Simulations at Exascale
Amount £471,209 (GBP)
Funding ID EP/Y008693/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 05/2023 
End 11/2024
 
Description International Partnership seed funding (to team members Chris Ringrose and Josh Horton)
Amount £3,821 (GBP)
Organisation Newcastle University 
Sector Academic/University
Country United Kingdom
Start 06/2021 
End 09/2021
 
Description MoSMed PhD studentships
Amount £60,000 (GBP)
Organisation Newcastle University 
Sector Academic/University
Country United Kingdom
Start 01/2023 
End 12/2027
 
Description North East Ultrafast Transient Absorption Spectroscopy Facility
Amount £902,433 (GBP)
Funding ID EP/W006340/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 12/2021 
End 11/2023
 
Description Supporting the OpenMM Community-led Development of Next-Generation Condensed Matter Modelling Software
Amount £464,870 (GBP)
Funding ID EP/W030276/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 11/2022 
End 10/2025
 
Description University of Newcastle upon Tyne (The) and Kuano Limited KTP 22_23 R4
Amount £91,457 (GBP)
Funding ID 10054822 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 07/2023 
End 01/2025
 
Title OpenFF benchmark ligand fragments v2.0 
Description Torsion scans for 490 small, drug-like molecules for community use in fitting and evaluating force field accuracy. Created and submitted by project researcher, Dr Josh Horton. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact None yet. 
URL https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-08-10-OpenFF-J...
 
Description Bespoke torsion parameterisation 
Organisation Cresset Biomolecular Discovery Ltd
Country United Kingdom 
Sector Private 
PI Contribution Advice on implementation and use of OpenFF-BespokeFit in Cresset's Flare software, including industry visit.
Collaborator Contribution Testing and implementation of OpenFF-BespokeFit in Cresset's Flare software, including researcher time.
Impact A subset of the BespokeFit workflow focusing on bespoke torsions has been implemented within the Cresset Flare software (https://www.cresset-group.com/software/flare/), as described in the publication: https://doi.org/10.1021/acs.jcim.2c01153
Start Year 2022
 
Title DE-forcefields 
Description Transferable Double Exponential non-bonded potential for condensed phase simulations of small molecules. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Highlights flexibility of open force field software stack for derivation of force fields with novel functional forms. 
URL https://doi.org/10.1039/D3DD00070B
 
Title FEgrow: An Open-Source Molecular Builder and Free Energy Preparation Workflow 
Description FEgrow code snapshot. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact FEgrow is an interactive workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. The code has one associated publication and 63 github stars. It is too early to tell what it is being used for. 
URL https://zenodo.org/record/7105598
 
Title OpenFF-BespokeFit 
Description BespokeFit is an automated solution for creating bespoke force field parameters for small molecules of interest in the SMIRNOFF-format that can be used seamlessly with more general force fields (such as Parsley and Sage) that are based on SMIRNOFF. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact - one publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c01153?ref=PDF - the software has 34 stars, and a subset of the BespokeFit workflow focusing on bespoke torsions has also been implemented within the Cresset Flare software. 
URL https://docs.openforcefield.org/projects/bespokefit/en/latest/index.html
 
Title QUBEKit - version 2 
Description A significant re-write of our QUBEKit software for force field parameterisation has improved ease-of-use, extensibility and accuracy of the resulting force fields. The software package is now fully open source, including all dependences, and is available on conda-forge for download under a permissive license. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact There have been 1k downloads from conda-forge since release. It is too early to tell what the software has been used for. 
URL https://github.com/qubekit/QUBEKit
 
Title RGMolSA 
Description Ligand-based virtual screening aims to reduce the cost and duration of drug discovery campaigns. Shape similarity can be used to screen large databases, with the goal of predicting potential new hits by comparing to molecules with known favourable properties. RGMolSA is a new alignment-free and mesh-free surface-based molecular shape descriptor derived from the mathematical theory of Riemannian geometry. The treatment of a molecule as a series of intersecting spheres allows the description of its surface geometry using the Riemannian metric, obtained by considering the spectrum of the Laplacian. This gives a simple vector descriptor constructed of the weighted surface area and eight non-zero eigenvalues, which capture the surface shape. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Multiple conference presentations and a journal paper. 
URL https://doi.org/10.1098/rspa.2023.0343
 
Description 4th Conference on Multiscale Modelling of Condensed Phase and Biological Systems 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I joined the organising committee for the 4th Conference on Multiscale Modelling of Condensed Phase and Biological Systems organised by the CCP-BioSim and CCP5 computational research communities. The goal was to bring together practitioners of molecular modelling from biological and materials modelling communities to share best practices.
Year(s) Of Engagement Activity 2021
URL https://www.ccpbiosim.ac.uk/events/past-conferences/eventdetail/128/-/rescheduled-4th-conference-on-...
 
Description Blog post on bespoke-fit 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Researcher on the grant, Dr Josh Horton, wrote a blog demonstrating the methods and use cases behind his software package, bespoke-fit, with our project partners at Open Force Field. The blog was promoted to academic and industry members of Open Force Field on twitter.
Year(s) Of Engagement Activity 2021
URL https://openforcefield.org/community/news/science-updates/bespokefit-update-2021-10-20/
 
Description Industry visit to Cresset 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Industry visit to Cresset to promote and teach the use of our software, and discuss future collaboration opportunities.
Year(s) Of Engagement Activity 2022
 
Description Participation in CCPBioSim Training Week 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Around 30-50 participants from PhD programmes and industry attended a 2-day workshop run by me, my team and the open force field initiative on the use of force fields in (bio)molecular simulation. The workshop was part of the wider CCPBioSim training week.
Year(s) Of Engagement Activity 2023
URL https://www.ccpbiosim.ac.uk/events/past-events/eventdetail/104/-/training-week
 
Description Recorded workshop and youtube video demonstrating use of bespoke-fit 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact A recorded workshop and talk was presented to industry and academic affiliates of the open force field initiative by researcher, Dr Josh Horton. It covers a demonstration of the bespoke-fit package, co-developed with Open Force Field. It has been released on Youtube and viewed >100 times.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=xQ8pnYcmWSU