📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Transfer learning of pharmacogenomic information across disease types and preclinical models for drug sensitivity prediction.

Lead Research Organisation: University of Sheffield
Department Name: Neurosciences

Abstract

The failure rate for new drugs entering clinics is in excess of 90%, with more than a quarter of drugs failing due to lack of efficacy. Earlier treatment decisions for complex diseases like lung cancer considered a small number of patient factors and prescribed a fixed treatment regimen for all patients, resulting in severe drug side effects for some and highly-varying outcomes. Recently, personalised treatments have become popular through the discovery and use of genetic markers that can explain a patient's response to a drug. If the goal of personalised medicine is to give the right drug to the right patient, we may be able to combine pharmacogenomics with machine learning to help make better treatment decisions.

Due to the potential waste of testing ineffective drugs on patient cells and animal models in the laboratory, we are motivated to leverage the power of machine learning to predict drug response from a limited number of experiments. We and many others in drug development have used computational methods to learn from drug responses measured in vitro and provide evidence for clinical trials, however, existing machine learning methods do poorly at predicting drug response in disease types where we have a limited number of samples. This situation unfortunately happens quite often for rare cancers and other diseases like motor neurone disease (also known as ALS), because there are few patients or their samples are difficult to collect. Overcoming this limitation by extending machine learning to learn from different disease contexts would mean that we can reduce the time-consuming step of gathering biological resources and then accelerate drug development.

In this project, we will develop machine learning algorithms that will take into account all of the dose-response data we have for each drug tested in only a few samples. To overcome the issue of few training cases in a disease, we will develop a transfer learning framework that will use knowledge from other diseases with more drug response data to address the problem in the disease with less data. The algorithms will be developed and tested in five stages: 1) develop a learning model that maps genomic information to drug response in both the disease with more data and the disease with limited data; 2) develop an inference model for predicting drug response in the disease with limited data; 3) apply the learning and inference models to use genomic relationships to drug sensitivity in lung cancer to predict drug response in bladder cancer; 4) learn from drug responses in cell lines and predict response in mice tumour models; 5) learn and predict biomarkers that describe a particular drug's sensitivity in both lung cancer and motor neurone disease. Genomic information will be used as inputs for the prediction algorithms because they can be reliably measured in the laboratory and in the clinic. We use prediction test cases of increasing difficulty, but successes in transferring pharmacogenomics information between diseases will highlight opportunities for scientists to leverage existing data sets to solve challenges of testing a drug in a new disease.

We are conducting this interdisciplinary study as a team of computer scientists, clinicians and cell biologists with expertise in machine learning, cancer and neuroscience. The end goal is to eventually develop a suite of software tools that can be readily used flexibly by the drug development community to apply transfer learning to many different problems.

Publications

10 25 50

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/V029045/1 30/09/2021 29/09/2022 £461,757
EP/V029045/2 Transfer EP/V029045/1 01/11/2022 31/10/2025 £346,843
 
Description Drug response prediction is hampered by uncertainty in the measures of response and selection of doses. In this study, we propose a novel approach using probabilistic multi-output models to predict drug response at all doses and uncover their biomarkers. By leveraging genomic features and chemical properties of drugs, our multi-output Gaussian Process (MOGP) models provide a comprehensive understanding of drug efficacy across different dose metrics. This approach was tested across two drug screening studies and five cancer types. It captured underlying response trends and enabled the identification of the EZH2 gene as a novel biomarker of BRAF inhibitor response. We demonstrate the effectiveness of our MOGP models in accurately predicting dose-responses in different cancer types and when there is a limited number of drug screening experiments for training. Our findings highlight the potential of MOGP models in enhancing drug development pipelines by reducing data requirements and improving precision in dose-response predictions.
Exploitation Route Machine learning models can be used by drug developers to assess the effect of investigational treatments in different common cancer types.
Sectors Pharmaceuticals and Medical Biotechnology

 
Description AI for Drug Discovery initiatives in Singapore
Geographic Reach Asia 
Policy Influence Type Participation in a guidance/advisory committee
URL https://ai4science.sg/
 
Description Grant review panel for clinical starter grants
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
 
Description Scientific Committee for Multiple Long Term Conditions (NIHR and MRC)
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Launched a unit within the Turing Institute to support data aggregation and training initiatives to encourage researchers to share data on treating multiple diseases. Research groups highlighted difficulties with drug coding, which my group and Turing supported.
 
Description Longitudinal machine learning of molecular and phenotypic trajectories of pulmonary hypertension.
Amount £934,385 (GBP)
Funding ID MR/Z505468/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 01/2025 
End 01/2027
 
Description 100,000 Genome Project 
Organisation Genomics England
Country United Kingdom 
Sector Public 
PI Contribution I have been a research consultant and a member of the Genomics England Clinical Interpretation Partnership (GeCIP) for the neurology, cancer and bioinformatics domains. My team has helped to assess the quality of variant identification by Genomics England in the 100,000 Genome Project.
Collaborator Contribution Genomics England has provided >100,000 clinical grade whole genomes across various diseases and a computing platform (Research Environment) to enable us to conduct analysis.
Impact We have analysed systematic sequencing biases in clinical whole genomes that would affect variants used for genetic diagnosis. This has been published in Freeman et al. https://genome.cshlp.org/content/early/2020/03/10/gr.255349.119
Start Year 2019
 
Description Combination therapy development with AstraZeneca 
Organisation AstraZeneca
Country United Kingdom 
Sector Private 
PI Contribution Develop a machine learning model to predict synergistic drug responses. Test model predictions using published combination screens and internal AZ datasets. Analyze and interpret model predictions for breast, lung, and ovarian cancer drug combinations. Write and publish a manuscript detailing the methodology and results. Improve drug development processes by integrating predictive modeling into R&D.
Collaborator Contribution Define objectives and overall goals of the research plan. Collate, curate, and format data required for model development. Guide model development and optimization as part of the collaborative team. Establish an internal AZ working group and stakeholders for regular project feedback. Lead biological and pharmacological interpretation of results. Co-lead the development of a manuscript describing the project. Provide overall co-supervision of the project.
Impact Multi-disciplinary project involving computer science, pharmacology, molecular biology and chemistry. 1. Machine Learning Model - A predictive machine learning model for identifying synergistic drug responses at different concentrations. - Model performance assessment against preclinical datasets and published combination screens. - Benchmarking against other published models. 2. Data Insights & Predictions - Identification of drug combinations with dose-specific synergies in breast, lung, and ovarian cancers. - Analysis of drug response predictions using preclinical datasets from AstraZeneca. Insights into the biological and pharmacological mechanisms underlying drug synergy. 3. Research Manuscript & Publications - A manuscript describing the methodology, model performance, and research findings. - Potential journal publication(s) in bioinformatics, computational biology, or oncology research. 4. Contribution to Drug Development - Improved understanding of how dose-specific combinations enhance treatment efficacy. - Application of dose-response prediction models in pharmaceutical R&D. - Potential for optimizing drug development strategies by integrating predictive analytics.
Start Year 2024
 
Description Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study. 
Organisation Leeds Teaching Hospitals NHS Trust
Country United Kingdom 
Sector Public 
PI Contribution Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year 2021
 
Description Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study. 
Organisation Sheffield Teaching Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year 2021
 
Title Machine learning tools for automated transcriptome clustering analysis 
Description Symptomatic heterogeneity in complex diseases reveals differences in molecular states that need to be investigated. However, selecting the numerous parameters of an exploratory clustering analysis in RNA profiling studies requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent and further gene association analyses need to be performed independently. We have developed a suite of tools to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with four datasets characterised by different expression signal strengths. Our toolkit's decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Even in datasets with less clear biological distinctions, stable subgroups with different expression profiles and clinical associations were found. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Published in the R Bioconductor library that is used by thousands of researchers around the world 
URL https://bioconductor.org/packages/release/bioc/html/omada.html
 
Title Multi-output Gaussian Process Models for Dose_Response Predictions 
Description This repository contains an implementation of a Multi-output Gaussian processes (MOGPs) model to predict dose response curves and an implementation of a features relevance determination method based on the Kullback-Leibler divergence. 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact Interest from AstraZeneca to further extend this software for drug combinations, and led to a research collaboration agreement between Imperial College London and AstraZeneca 
URL https://github.com/juanjogg1987/MOGPs_for_Dose_Response_Predictions
 
Description ELLIS Summer School on Machine Learning for Healthcare and Biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Manchester's European Laboratory for Learning and Intelligent Systems (ELLIS) unit are hosting their second Summer School during 11-13 June 2024, which will bring participants up-to-speed on the latest methods and technologies in machine learning with a focus on healthcare and biology. The school includes a set of Lectures by renowned researchers at the intersection of ML and Healthcare and Biology, and with an excellent track record of delivering educational content.
Year(s) Of Engagement Activity 2023
URL https://www.idsai.manchester.ac.uk/connect/events/ellis-summer-school-2024/