Data-driven approaches for fragment merging

Lead Research Organisation: University of Oxford
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

This project focuses on fragment-based drug discovery (FBDD), involving the screening of low-molecular-weight compounds against a target of interest to find chemical starting points that can be optimized to become lead-like molecules. The proposed work is situated within the lead optimization stage of the FBDD pipeline, which looks at how crystallographic fragment hits and the structural information they yield can be exploited to propose larger, drug-like compounds that bind to a target with increased affinity. There are several approaches to fragment elaboration, including fragment growing, linking and merging. Fragment merging is a relatively unexplored technique compared with its counterparts yet has the potential to find potent molecules. Existing strategies for merge design include manual design by a medicinal chemist, which is slow and not scalable to large datasets, and de novo design, which often results in molecules that lack synthetic accessibility and are therefore difficult and costly to pursue. Thus, the aim of this project is to use knowledge-based approaches to propose fragment merges that are synthetically feasible, allowing rapid and cheap progression from fragment hits to lead-like compounds. Various data-driven approaches will be explored, including database exploitation and AI-based techniques, which may be used synergistically to propose new compounds. Use of the former has already been demonstrated during the rotation project, which involved use of the Fragment Network, a graph database containing catalogue compounds, to create a pipeline able to find and filter fragment merges that can be prioritized for further screening. Several avenues to extend this work by enhancing the efficiency of this tool and increasing the diversity in the molecules found have already been identified. Compounds proposed by this project will also have the opportunity for experimental validation. Automated synthesis planning and execution on a robotic platform are currently being explored at XChem, and a key component of this work will be identifying compounds that can be made given the available synthetic repertoire. Integrating this entire pipeline has the potential to be high impact with respect to improving the speed at which we can progress potent ligands in drug development, thereby reducing the number of iterations required for the design-make-test cycle. The proposed work (which will exist in conjunction with another DPhil project, focused on the robotics aspect of the pipeline) will involve industrial collaboration with Vernalis and LifeArc; exact industrial supervisors are to be confirmed. This project falls within the EPSRC's 'Computational and theoretical chemistry' research area and ties strongly to the council's outlined strategies within this field. This research is highly interdisciplinary and will involve collaboration with beamline scientists, medicinal chemists and automation experts. As described above, the software developed during this project will have direct, actionable consequences for the drug discovery community, producing molecules that can be purchased or easily synthesized for further screening. Making source code and data from this project available will allow others to apply this technique to new targets and enable comparison with their own developed algorithms.

Planned Impact

The UK's world-leading position in biomedical research is critically dependent upon training scientists with the cutting-edge research skills and technological know-how needed to drive future scientific advances. Since 2009, the EPSRC and MRC CDT in Systems Approaches to Biomedical Science (SABS) has been working with its consortium of 22 industrial and institutional partners to meet this training need.

Over this period, our partners have identified a growing training need caused by the increasing reliance on computational approaches and research software. The new EPSRC CDT in Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research - SABS:R^3 will address this need. By embedding a sustainable approach to software and computational model development into all aspects of the existing SABS training programme, we aim to foster a culture change in how the computational tools and research software that now underpin much of biomedical research are developed, and hence how quantitative and predictive translational biomedical research is undertaken.

As with all CDT Programmes, the future impact of SABS:R^3 will be through its alumni, and by the culture change that its training engenders. By these measures, our existing SABS CDT is already proving remarkably successful. Our alumni have gone on to a wide range of successful careers, 21 in academic research, 19 in industry (including 5 in SABS partner companies) and the other 10 working in organisations from the Office of National Statistics to the EPSRC. SABS' unique Open Innovation framework has facilitated new company connections and a high level of operational freedom, facilitating 14 multi-company, pre-competitive, collaborative doctoral research projects between 11 companies, each focused on a SABS student.

The impact of sustainable and open computational approaches on biomedical research is clear from existing SABS' student projects. Examples include SAbDab which resulted from the first-ever co-sponsored doctorate in SABS, by UCB and Roche. It was released as open source software, is embedded in the pipelines of several pharmaceutical companies (including UCB, Medimmune, GSK, and Lonza) and has resulted in 13 papers. The SABS student who developed SAbDab was initially seconded to MedImmune, sponsored by EPSRC IAA funding; he went on to work at Roche, and is now at BenevolentAI. Similarly, PanDDA, multi-dataset X-ray crystallographic software to detect ligand-bound states in protein complexes is in CCP4 and is an integral part of Diamond Light Source's XChem Pipeline. The SABS student who developed PanDDA was awarded an EMBO Fellowship.

Future SABS:R^3 students will undertake research supported by both our industrial partners and academic supervisors. These supervisors have a strong track record of high impact research through the release of open source software, computational tools, and databases, and through commercialisation and licensing of their research. All of this research has been undertaken in collaboration with industrial partners, with many examples of these tools now in routine use within partner companies.

The newly focused SABS:R^3 will permit new industrial collaborations. Six new partners have joined the consortium to support this new bid, ranging from major multinationals (e.g. Unilever) to SMEs (e.g. Lhasa). SABS:R^3 will continue to make all of its research and teaching resources publicly available and will continue to help to create other centres with similar aims. To promote a wider cultural change, the SABS:R^3 will also engage with the academic publishing industry (Elsevier, OUP, and Taylor & Francis). We will explore novel ways of disseminating the outputs of computational biomedical research, to engender trust in the released tools and software, facilitate more uptake and re-use.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 01/10/2019 31/03/2028
2445537 Studentship EP/S024093/1 01/10/2020 30/09/2024 Stephanie Wills