Transfer learning of pharmacogenomic information across disease types and preclinical models for drug sensitivity prediction.

Lead Research Organisation: University of Sheffield

Department Name: Neurosciences

Abstract

The failure rate for new drugs entering clinics is in excess of 90%, with more than a quarter of drugs failing due to lack of efficacy. Earlier treatment decisions for complex diseases like lung cancer considered a small number of patient factors and prescribed a fixed treatment regimen for all patients, resulting in severe drug side effects for some and highly-varying outcomes. Recently, personalised treatments have become popular through the discovery and use of genetic markers that can explain a patient's response to a drug. If the goal of personalised medicine is to give the right drug to the right patient, we may be able to combine pharmacogenomics with machine learning to help make better treatment decisions.

Due to the potential waste of testing ineffective drugs on patient cells and animal models in the laboratory, we are motivated to leverage the power of machine learning to predict drug response from a limited number of experiments. We and many others in drug development have used computational methods to learn from drug responses measured in vitro and provide evidence for clinical trials, however, existing machine learning methods do poorly at predicting drug response in disease types where we have a limited number of samples. This situation unfortunately happens quite often for rare cancers and other diseases like motor neurone disease (also known as ALS), because there are few patients or their samples are difficult to collect. Overcoming this limitation by extending machine learning to learn from different disease contexts would mean that we can reduce the time-consuming step of gathering biological resources and then accelerate drug development.

In this project, we will develop machine learning algorithms that will take into account all of the dose-response data we have for each drug tested in only a few samples. To overcome the issue of few training cases in a disease, we will develop a transfer learning framework that will use knowledge from other diseases with more drug response data to address the problem in the disease with less data. The algorithms will be developed and tested in five stages: 1) develop a learning model that maps genomic information to drug response in both the disease with more data and the disease with limited data; 2) develop an inference model for predicting drug response in the disease with limited data; 3) apply the learning and inference models to use genomic relationships to drug sensitivity in lung cancer to predict drug response in bladder cancer; 4) learn from drug responses in cell lines and predict response in mice tumour models; 5) learn and predict biomarkers that describe a particular drug's sensitivity in both lung cancer and motor neurone disease. Genomic information will be used as inputs for the prediction algorithms because they can be reliably measured in the laboratory and in the clinic. We use prediction test cases of increasing difficulty, but successes in transferring pharmacogenomics information between diseases will highlight opportunities for scientists to leverage existing data sets to solve challenges of testing a drug in a new disease.

We are conducting this interdisciplinary study as a team of computer scientists, clinicians and cell biologists with expertise in machine learning, cancer and neuroscience. The end goal is to eventually develop a suite of software tools that can be readily used flexibly by the drug development community to apply transfer learning to many different problems.

Funded Value:

£461,756

Funded Period:

Sep 21 - Sep 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/V029045/1

Principal Investigator:

Dennis Wang

Research Subject:

Info. & commun. Technol. (50%)

Tools, technologies & methods (50%)

Research Topic:

Artificial Intelligence (50%)

Med.Instrument.Device& Equip. (50%)

Organisations

People	ORCID iD
Dennis Wang (Principal Investigator)	http://orcid.org/0000-0003-0068-1005
Laura Ferraiuolo (Co-Investigator)
Mauricio Alvarez Lopez (Co-Investigator)	http://orcid.org/0000-0002-8980-4472
Mark Dunning (Researcher)	http://orcid.org/0000-0002-8853-9435

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Kariotis S (2024) Omada: robust clustering of transcriptomes through multiple testing in GigaScience

Leroy A (2024) Prospective prediction of childhood body mass index trajectories using multi-task Gaussian processes in International Journal of Obesity

Ma C (2023) Large scale multi-output multi-class classification using Gaussian processes in Machine Learning

McDonald T (2022) Shallow and Deep Nonparametric Convolutions for Gaussian Processes

McDonald T.M. (2021) Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features in Advances in Neural Information Processing Systems

Pham T (2024) DeepARV: ensemble deep learning to predict drug-drug interaction of clinical relevance with antiretroviral therapy. in NPJ systems biology and applications

Rajab M (2022) Assessment of Alzheimer-related Pathologies of Dementia Using Machine Learning Feature Selection

Rajab MD (2023) Assessment of Alzheimer-related pathologies of dementia using machine learning feature selection. in Alzheimer's research & therapy

Rajab MD (2024) Ranking and filtering of neuropathology features in the machine learning evaluation of dementia studies. in Brain pathology (Zurich, Switzerland)

Related Projects

Project Reference	Relationship	Related To	Start	End	Award Value
EP/V029045/1			30/09/2021	29/09/2022	£461,757
EP/V029045/2	Transfer	EP/V029045/1	01/11/2022	31/10/2025	£346,843

Key Findings
Policy Influence
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	Drug response prediction is hampered by uncertainty in the measures of response and selection of doses. In this study, we propose a novel approach using probabilistic multi-output models to predict drug response at all doses and uncover their biomarkers. By leveraging genomic features and chemical properties of drugs, our multi-output Gaussian Process (MOGP) models provide a comprehensive understanding of drug efficacy across different dose metrics. This approach was tested across two drug screening studies and five cancer types. It captured underlying response trends and enabled the identification of the EZH2 gene as a novel biomarker of BRAF inhibitor response. We demonstrate the effectiveness of our MOGP models in accurately predicting dose-responses in different cancer types and when there is a limited number of drug screening experiments for training. Our findings highlight the potential of MOGP models in enhancing drug development pipelines by reducing data requirements and improving precision in dose-response predictions.
Exploitation Route	Machine learning models can be used by drug developers to assess the effect of investigational treatments in different common cancer types.
Sectors	Pharmaceuticals and Medical Biotechnology


Description	AI for Drug Discovery initiatives in Singapore
Geographic Reach	Asia
Policy Influence Type	Participation in a guidance/advisory committee
URL	https://ai4science.sg/


Description	Grant review panel for clinical starter grants
Geographic Reach	National
Policy Influence Type	Influenced training of practitioners or researchers


Description	Scientific Committee for Multiple Long Term Conditions (NIHR and MRC)
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee
Impact	Launched a unit within the Turing Institute to support data aggregation and training initiatives to encourage researchers to share data on treating multiple diseases. Research groups highlighted difficulties with drug coding, which my group and Turing supported.


Description	Longitudinal machine learning of molecular and phenotypic trajectories of pulmonary hypertension.
Amount	£934,385 (GBP)
Funding ID	MR/Z505468/1
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	01/2025
End	01/2027


Description	100,000 Genome Project
Organisation	Genomics England
Country	United Kingdom
Sector	Public
PI Contribution	I have been a research consultant and a member of the Genomics England Clinical Interpretation Partnership (GeCIP) for the neurology, cancer and bioinformatics domains. My team has helped to assess the quality of variant identification by Genomics England in the 100,000 Genome Project.
Collaborator Contribution	Genomics England has provided >100,000 clinical grade whole genomes across various diseases and a computing platform (Research Environment) to enable us to conduct analysis.
Impact	We have analysed systematic sequencing biases in clinical whole genomes that would affect variants used for genetic diagnosis. This has been published in Freeman et al. https://genome.cshlp.org/content/early/2020/03/10/gr.255349.119
Start Year	2019


Description	Combination therapy development with AstraZeneca
Organisation	AstraZeneca
Country	United Kingdom
Sector	Private
PI Contribution	Develop a machine learning model to predict synergistic drug responses. Test model predictions using published combination screens and internal AZ datasets. Analyze and interpret model predictions for breast, lung, and ovarian cancer drug combinations. Write and publish a manuscript detailing the methodology and results. Improve drug development processes by integrating predictive modeling into R&D.
Collaborator Contribution	Define objectives and overall goals of the research plan. Collate, curate, and format data required for model development. Guide model development and optimization as part of the collaborative team. Establish an internal AZ working group and stakeholders for regular project feedback. Lead biological and pharmacological interpretation of results. Co-lead the development of a manuscript describing the project. Provide overall co-supervision of the project.
Impact	Multi-disciplinary project involving computer science, pharmacology, molecular biology and chemistry. 1. Machine Learning Model - A predictive machine learning model for identifying synergistic drug responses at different concentrations. - Model performance assessment against preclinical datasets and published combination screens. - Benchmarking against other published models. 2. Data Insights & Predictions - Identification of drug combinations with dose-specific synergies in breast, lung, and ovarian cancers. - Analysis of drug response predictions using preclinical datasets from AstraZeneca. Insights into the biological and pharmacological mechanisms underlying drug synergy. 3. Research Manuscript & Publications - A manuscript describing the methodology, model performance, and research findings. - Potential journal publication(s) in bioinformatics, computational biology, or oncology research. 4. Contribution to Drug Development - Improved understanding of how dose-specific combinations enhance treatment efficacy. - Application of dose-response prediction models in pharmaceutical R&D. - Potential for optimizing drug development strategies by integrating predictive analytics.
Start Year	2024


Description	Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study.
Organisation	Leeds Teaching Hospitals NHS Trust
Country	United Kingdom
Sector	Public
PI Contribution	Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution	My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact	Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year	2021


Description	Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study.
Organisation	Sheffield Teaching Hospital
Country	United Kingdom
Sector	Hospitals
PI Contribution	Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution	My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact	Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year	2021


Title	Machine learning tools for automated transcriptome clustering analysis
Description	Symptomatic heterogeneity in complex diseases reveals differences in molecular states that need to be investigated. However, selecting the numerous parameters of an exploratory clustering analysis in RNA profiling studies requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent and further gene association analyses need to be performed independently. We have developed a suite of tools to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with four datasets characterised by different expression signal strengths. Our toolkit's decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Even in datasets with less clear biological distinctions, stable subgroups with different expression profiles and clinical associations were found.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	Published in the R Bioconductor library that is used by thousands of researchers around the world
URL	https://bioconductor.org/packages/release/bioc/html/omada.html


Title	Multi-output Gaussian Process Models for Dose_Response Predictions
Description	This repository contains an implementation of a Multi-output Gaussian processes (MOGPs) model to predict dose response curves and an implementation of a features relevance determination method based on the Kullback-Leibler divergence.
Type Of Technology	Software
Year Produced	2024
Open Source License?	Yes
Impact	Interest from AstraZeneca to further extend this software for drug combinations, and led to a research collaboration agreement between Imperial College London and AstraZeneca
URL	https://github.com/juanjogg1987/MOGPs_for_Dose_Response_Predictions


Description	ELLIS Summer School on Machine Learning for Healthcare and Biology
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Manchester's European Laboratory for Learning and Intelligent Systems (ELLIS) unit are hosting their second Summer School during 11-13 June 2024, which will bring participants up-to-speed on the latest methods and technologies in machine learning with a focus on healthcare and biology. The school includes a set of Lectures by renowned researchers at the intersection of ML and Healthcare and Biology, and with an excellent track record of delivering educational content.
Year(s) Of Engagement Activity	2023
URL	https://www.idsai.manchester.ac.uk/connect/events/ellis-summer-school-2024/

Abstract

Organisations

People

ORCID iD

Publications

Related Projects