Transfer learning of pharmacogenomic information across disease types and preclinical models for drug sensitivity prediction.

Lead Research Organisation: University of Sheffield
Department Name: Neurosciences

Abstract

The failure rate for new drugs entering clinics is in excess of 90%, with more than a quarter of drugs failing due to lack of efficacy. Earlier treatment decisions for complex diseases like lung cancer considered a small number of patient factors and prescribed a fixed treatment regimen for all patients, resulting in severe drug side effects for some and highly-varying outcomes. Recently, personalised treatments have become popular through the discovery and use of genetic markers that can explain a patient's response to a drug. If the goal of personalised medicine is to give the right drug to the right patient, we may be able to combine pharmacogenomics with machine learning to help make better treatment decisions.

Due to the potential waste of testing ineffective drugs on patient cells and animal models in the laboratory, we are motivated to leverage the power of machine learning to predict drug response from a limited number of experiments. We and many others in drug development have used computational methods to learn from drug responses measured in vitro and provide evidence for clinical trials, however, existing machine learning methods do poorly at predicting drug response in disease types where we have a limited number of samples. This situation unfortunately happens quite often for rare cancers and other diseases like motor neurone disease (also known as ALS), because there are few patients or their samples are difficult to collect. Overcoming this limitation by extending machine learning to learn from different disease contexts would mean that we can reduce the time-consuming step of gathering biological resources and then accelerate drug development.

In this project, we will develop machine learning algorithms that will take into account all of the dose-response data we have for each drug tested in only a few samples. To overcome the issue of few training cases in a disease, we will develop a transfer learning framework that will use knowledge from other diseases with more drug response data to address the problem in the disease with less data. The algorithms will be developed and tested in five stages: 1) develop a learning model that maps genomic information to drug response in both the disease with more data and the disease with limited data; 2) develop an inference model for predicting drug response in the disease with limited data; 3) apply the learning and inference models to use genomic relationships to drug sensitivity in lung cancer to predict drug response in bladder cancer; 4) learn from drug responses in cell lines and predict response in mice tumour models; 5) learn and predict biomarkers that describe a particular drug's sensitivity in both lung cancer and motor neurone disease. Genomic information will be used as inputs for the prediction algorithms because they can be reliably measured in the laboratory and in the clinic. We use prediction test cases of increasing difficulty, but successes in transferring pharmacogenomics information between diseases will highlight opportunities for scientists to leverage existing data sets to solve challenges of testing a drug in a new disease.

We are conducting this interdisciplinary study as a team of computer scientists, clinicians and cell biologists with expertise in machine learning, cancer and neuroscience. The end goal is to eventually develop a suite of software tools that can be readily used flexibly by the drug development community to apply transfer learning to many different problems.

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/V029045/1 30/09/2021 29/09/2022 £461,757
EP/V029045/2 Transfer EP/V029045/1 01/11/2022 29/04/2025 £346,843
 
Description Drug response prediction is hampered by uncertainty in the measures of response and selection of doses. In this study, we propose a novel approach using probabilistic multi-output models to predict drug response at all doses and uncover their biomarkers. By leveraging genomic features and chemical properties of drugs, our multi-output Gaussian Process (MOGP) models provide a comprehensive understanding of drug efficacy across different dose metrics. This approach was tested across two drug screening studies and five cancer types. It captured underlying response trends and enabled the identification of the EZH2 gene as a novel biomarker of BRAF inhibitor response. We demonstrate the effectiveness of our MOGP models in accurately predicting dose-responses in different cancer types and when there is a limited number of drug screening experiments for training. Our findings highlight the potential of MOGP models in enhancing drug development pipelines by reducing data requirements and improving precision in dose-response predictions.
Exploitation Route Machine learning models can be used by drug developers to assess the effect of investigational treatments in different common cancer types.
Sectors Pharmaceuticals and Medical Biotechnology

 
Description Scientific Committee for Multiple Long Term Conditions (NIHR and MRC)
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Launched a unit within the Turing Institute to support data aggregation and training initiatives to encourage researchers to share data on treating multiple diseases. Research groups highlighted difficulties with drug coding, which my group and Turing supported.
 
Description 100,000 Genome Project 
Organisation Genomics England
Country United Kingdom 
Sector Public 
PI Contribution I have been a research consultant and a member of the Genomics England Clinical Interpretation Partnership (GeCIP) for the neurology, cancer and bioinformatics domains. My team has helped to assess the quality of variant identification by Genomics England in the 100,000 Genome Project.
Collaborator Contribution Genomics England has provided >100,000 clinical grade whole genomes across various diseases and a computing platform (Research Environment) to enable us to conduct analysis.
Impact We have analysed systematic sequencing biases in clinical whole genomes that would affect variants used for genetic diagnosis. This has been published in Freeman et al. https://genome.cshlp.org/content/early/2020/03/10/gr.255349.119
Start Year 2019
 
Description Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study. 
Organisation Leeds Teaching Hospitals NHS Trust
Country United Kingdom 
Sector Public 
PI Contribution Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year 2021
 
Description Genotype of Urothelial cancer: Stratified Treatment and Oncological outcomes (GUSTO): Phase II study. 
Organisation Sheffield Teaching Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution Bioinformatics lead for a Phase II clinical trial using genomic characterisation of bladder cancer to determine treatment. I advised on the trial design and worked with the commercial company doing algorithmic diagnosis.
Collaborator Contribution My partners launched the clinical trial from Shieffleld and Leeds Teaching Hospitals. They executed the recruitment of study participants, governance, ethics, etc.
Impact Contracts and collaboration agreements signed with AstraZeneca and Veracyte for drugs and diagnostics.
Start Year 2021
 
Title Machine learning tools for automated transcriptome clustering analysis 
Description Symptomatic heterogeneity in complex diseases reveals differences in molecular states that need to be investigated. However, selecting the numerous parameters of an exploratory clustering analysis in RNA profiling studies requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent and further gene association analyses need to be performed independently. We have developed a suite of tools to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with four datasets characterised by different expression signal strengths. Our toolkit's decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Even in datasets with less clear biological distinctions, stable subgroups with different expression profiles and clinical associations were found. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Published in the R Bioconductor library that is used by thousands of researchers around the world 
URL https://bioconductor.org/packages/release/bioc/html/omada.html
 
Description ELLIS Summer School on Machine Learning for Healthcare and Biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Manchester's European Laboratory for Learning and Intelligent Systems (ELLIS) unit are hosting their second Summer School during 11-13 June 2024, which will bring participants up-to-speed on the latest methods and technologies in machine learning with a focus on healthcare and biology. The school includes a set of Lectures by renowned researchers at the intersection of ML and Healthcare and Biology, and with an excellent track record of delivering educational content.
Year(s) Of Engagement Activity 2023
URL https://www.idsai.manchester.ac.uk/connect/events/ellis-summer-school-2024/