Continual and Meta-Learning in Drug Discovery

Lead Research Organisation: University of Oxford
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

Identifying and validating novel therapeutic targets, screening and optimising potential drug candidates and investigating their safety and efficacy profiles in pre-clinical and clinical trials is a notoriously difficult process, with attrition rates estimated to be as high as 95%. A single therapeutic may often take more than a decade to develop and requires hundreds of millions to billions of US dollars in R&D spending before it can be brought to market. Additionally, addressing unresolved clinical needs is becoming progressively more demanding, with estimates indicating that inflation-adjusted R&D spending per approved drug is doubling every 9 years. Coinciding with the advent and maturation of many high-throughput screening and multi-omics techniques, there has been a renewed interest in employing machine learning, and specifically deep learning, algorithms to address this decline in productivity. While numerous highly publicised studies have illustrated the undeniable potential of applying deep learning techniques to biomedical research questions, many difficulties remain to be addressed before they can be pervasively employed throughout the drug discovery pipeline. Perhaps the most pertinent obstacle is obtaining adequate data sets that can be used to train deep learning algorithms. Setting aside more fundamental concerns about the reliability, reproducibility, errors and biases of biomedical data in the scientific literature, the predominant practical issues that researchers are faced with are data scarcity and data heterogeneity. A promising approach that could address these shortcomings is optimisation-based meta-learning. Instead of only using the limited data that exists for a given task, a model is trained on a collection of related tasks with the objective of finding an initialisation that can be easily adapted to a new setting with as few additional data points as possible. The potential applications in drug discovery are numerous. Be it for predicting the binding affinities of drug candidates to different macromolecules of interest or identifying essential genes in different cell lines and under different conditions, the concept of leveraging shared structures across distinct tasks to attain an increase in robustness and predictive power promises to increase the breadth of settings to which machine learning techniques can be successfully applied. The goal of the project is to closely examine the numerous meta-learning techniques that are being continually developed and adapt them to the field of drug discovery. Very few publications have employed meta-learning algorithms in the context of biomedical science and many diverse and impactful applications remain to be explored. The project falls within the EPSRC Artificial Intelligence Technologies, Biological Informatics, and Computational and Theoretical Chemistry research areas. The project is co-supervised by scientists from F. Hoffmann-La Roche AG.

Planned Impact

The UK's world-leading position in biomedical research is critically dependent upon training scientists with the cutting-edge research skills and technological know-how needed to drive future scientific advances. Since 2009, the EPSRC and MRC CDT in Systems Approaches to Biomedical Science (SABS) has been working with its consortium of 22 industrial and institutional partners to meet this training need.

Over this period, our partners have identified a growing training need caused by the increasing reliance on computational approaches and research software. The new EPSRC CDT in Sustainable Approaches to Biomedical Science: Responsible and Reproducible Research - SABS:R^3 will address this need. By embedding a sustainable approach to software and computational model development into all aspects of the existing SABS training programme, we aim to foster a culture change in how the computational tools and research software that now underpin much of biomedical research are developed, and hence how quantitative and predictive translational biomedical research is undertaken.

As with all CDT Programmes, the future impact of SABS:R^3 will be through its alumni, and by the culture change that its training engenders. By these measures, our existing SABS CDT is already proving remarkably successful. Our alumni have gone on to a wide range of successful careers, 21 in academic research, 19 in industry (including 5 in SABS partner companies) and the other 10 working in organisations from the Office of National Statistics to the EPSRC. SABS' unique Open Innovation framework has facilitated new company connections and a high level of operational freedom, facilitating 14 multi-company, pre-competitive, collaborative doctoral research projects between 11 companies, each focused on a SABS student.

The impact of sustainable and open computational approaches on biomedical research is clear from existing SABS' student projects. Examples include SAbDab which resulted from the first-ever co-sponsored doctorate in SABS, by UCB and Roche. It was released as open source software, is embedded in the pipelines of several pharmaceutical companies (including UCB, Medimmune, GSK, and Lonza) and has resulted in 13 papers. The SABS student who developed SAbDab was initially seconded to MedImmune, sponsored by EPSRC IAA funding; he went on to work at Roche, and is now at BenevolentAI. Similarly, PanDDA, multi-dataset X-ray crystallographic software to detect ligand-bound states in protein complexes is in CCP4 and is an integral part of Diamond Light Source's XChem Pipeline. The SABS student who developed PanDDA was awarded an EMBO Fellowship.

Future SABS:R^3 students will undertake research supported by both our industrial partners and academic supervisors. These supervisors have a strong track record of high impact research through the release of open source software, computational tools, and databases, and through commercialisation and licensing of their research. All of this research has been undertaken in collaboration with industrial partners, with many examples of these tools now in routine use within partner companies.

The newly focused SABS:R^3 will permit new industrial collaborations. Six new partners have joined the consortium to support this new bid, ranging from major multinationals (e.g. Unilever) to SMEs (e.g. Lhasa). SABS:R^3 will continue to make all of its research and teaching resources publicly available and will continue to help to create other centres with similar aims. To promote a wider cultural change, the SABS:R^3 will also engage with the academic publishing industry (Elsevier, OUP, and Taylor & Francis). We will explore novel ways of disseminating the outputs of computational biomedical research, to engender trust in the released tools and software, facilitate more uptake and re-use.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 01/10/2019 31/03/2028
2451635 Studentship EP/S024093/1 01/10/2020 30/09/2024 Leo Klarner