Accelerating medicine development timelines through new approaches in knowledge extraction from diverse biological data sets

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Cancer Research Centre

Abstract

Never has the impact of the development time of a new medicine been better understood with the current situation in the UK relating to COVID. This remains and has always been a big driver for Pharmaceutical companies, to reduce the time it takes from identification of an interesting compound or vaccine to a marketed medicine. The main focus of this project is to develop new Artificial Intelligence and Machine Learning (AI/ML) analytics tools to speed up the identification of immunological medicines, including small molecules, biologics and vaccines.

One of the biggest challenges we face in drug development that delays our ability to bring medicines to market quickly is the high rates of candidate molecule termination due to poor clinical efficacy and poor pre-clinical safety. To reduce the high rates of compound attrition, we are implementing disease relevant and physiological human cellular models in early stage discovery. We are using these cellular models to identify new therapeutic targets and screen our pharmacological agents to prioritise those with a better chance of success and to stop projects earlier with either efficacy or safety liabilities. These assays often rely on cellular imaging, in recent years automated microscopy has opened up the ability to characterise the state and phenotypes of cells, at single cell and even subcellular resolution. Thus, biological diversity can be visualized and the effects of perturbations on cells can be quantified more richly than by almost any other means. Relating this back to the challenge of compound failure we can use this imaging data to study compound effects in complex human in vitro systems to enable us to identify potential efficacy and safety risks much earlier in drug discovery enabling us to prioritise those medicines with a higher chance of success. As well as automated imaging, advancements have also been made to study other high content data sets such as transcriptomic, and proteomic information. This used to be completed in a screening cascade of assays all run independently to optimise our pharmacology, we are now able to multiplex these endpoints in the same system enabling us to pull together a complete fingerprint of cellular activity much faster.
With these models in place, the current challenge in drug discovery is that the level of complexity of both cellular models as well as the data derived from these models is often beyond the level of data analytics technologies we have available. There is an interesting relationship between data volume and understanding. Initially if you increase the amount of information you have on a system, your understanding increases before reaching a point where understanding rapidly decreases and increased data actually causes confusion as it becomes impossible to interpret.
This project is focused on resolving this bottleneck in two phases. Firstly, in recent years, AI methods, specifically convolutional deep learning methods, have revolutionised the field of computer vision by achieving performance often better than humans in image interpretation. In phase one of this project we intend to use AI methods to build a toolbox of image analytics solutions that will be used across our cellular imaging studies, to gather more information from our high-resolution images and transform millions of pixels into parameters. Similar deep learning networks have also now demonstrated the ability to explore vast data sets to interpret and deliver biological understanding. Thus, the second phase of this project will be applying AI tools to transform the millions of parameters obtained from all our high content technologies into features and mechanistic information that can be used to support project decisions.
By utilising the immense power of AI approaches in image analysis and big data analytics we hope to enable a better understanding of both chemical and genetic perturbations, ultimately improving our success rate in the clinic.

Technical Summary

Early stage drug discovery involves conducting panels of well-validated assays to generate datasets on collections of candidate chemical, biological or genetic entities. Such datasets include metabolomic, proteomic, transcriptomic and image-based cellular profiles and provide decision-making data on the suitability of a candidate to be moved to clinical phases of development. Historically, these types of datasets have been used in isolation but growing evidence shows that by considering the same datasets in relation to one another through improved machine learning strategies, rates of attrition in drug discovery can be reduced; an effect mediated through identification of disease-associated phenotypes and mechanisms in early screening and greater fidelity in early prediction of efficacy and toxicity. To this aim the secondee will utilise immunological datasets derived from primary T cell and iPSC-derived macrophage assays to develop ontological frameworks within the GSK R&D information platform in order to integrate imaging and omics datasets and simultaneously provide a structure suitable for future reuse. The framework will provide inter-relation of previously disparate databases enabling the secondee to deploy deep learning and machine learning tools to map responses to fingerprints from compounds or disease profiles annotated by in vivo or clinical data. Critically, as the effect of drug candidates are highly dose-dependent, the analytics developed will trace dose across multidimensional space to inform upon pathway preferentiality, off-target effects, and action of multiple targets in each therapeutic mechanism. This will provide new ranking metrics for medicinal chemists to optimise a drug series and inform on opportunities for combination screening. The work will provide the secondee with valuable networking opportunities and insights across the pharmaceutical industry through collaboration with units including AI/machine learning and medicinal chemistry.

Publications

10 25 50
publication icon
Shave S (2023) Phenonaut: multiomics data integration for phenotypic space exploration. in Bioinformatics (Oxford, England)

publication icon
Way GP (2023) Evolution and impact of high content imaging. in SLAS discovery : advancing life sciences R & D

 
Title Phenonaut 
Description This is a novel software platform for analysis of multiomic and single-omics datasets 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? No  
Impact Phenonaut fills an unmet need in multiomics enabled phenotypic drug discovery, allowing integration of multiomic data and application of machine learning, and data science techniques for phenotypic space exploration, hit calling and prediction. We anticipate Phenonaut will be utilized by a broad user base to enable complex multiomic data analysis, which is robust and audited to support both basic and translational research applications, thus filling a significant gap in currently available tools. The software has been presented at conferences and is currently under review in oxford Bioinformatics prior to full public release of methods and source code. Source code is publicly available in github 
URL https://github.com/CarragherLab/phenonaut
 
Description GSK staff secondment 
Organisation GlaxoSmithKline (GSK)
Department Research and Development GSK
Country United Kingdom 
Sector Private 
PI Contribution Supervision, datasets and intellectual guidance
Collaborator Contribution Senior staff supervision, computational resources and datsets.
Impact Talk was presented by Dr Steven Shave at the ELRIG Drug Discovery 2022 meeting 4th/5th October Excel London. A manuscript titled: "Phenonaut; multiomics data integration for phenotypic space exploration" is currently under review in the journal oxford bioinformatics. Source code is publicly available on github: https://github.com/CarragherLab/phenonaut This collaboration is Multidisciplinary: Cell biology; drug discovery; computational biology; Artificial Intelligence/Machine Learning; software development.
Start Year 2021