Learning to learn how to design drugs

Lead Research Organisation: Brunel University London

Department Name: Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£173,170

Funded Period:

Oct 13 - Oct 15

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K030582/1

Principal Investigator:

Larisa Soldatova

Research Subject:

Info. & commun. Technol. (70%)

Tools, technologies & methods (30%)

Research Topic:

Artificial Intelligence (40%)

Bioinformatics (30%)

Information & Knowledge Mgmt (30%)

Organisations

People	ORCID iD
Larisa Soldatova (Principal Investigator)	http://orcid.org/0000-0001-6489-3029

Publications

Author Name

Title Publication Date Published

10 25 50

Wang K (2021) NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding. in NPJ systems biology and applications

Sadawi N (2019) Multi-task learning with a natural metric for quantitative structure activity relationship learning. in Journal of cheminformatics

Panov P (2014) Ontology of core data mining entities in Data Mining and Knowledge Discovery

Panov P (2016) Generic ontology of datatypes in Information Sciences

Orhobor O (2020) Predicting rice phenotypes with meta and multi-target learning in Machine Learning

Olier I (2018) Meta-QSAR: a large-scale application of meta-learning to drug design and discovery. in Machine learning

Olier I (2021) Transformational machine learning: Learning how to learn from many related scientific problems. in Proceedings of the National Academy of Sciences of the United States of America

Bandrowski A (2016) The Ontology for Biomedical Investigations. in PloS one

Key Findings
Impact Summary
Research Databases and Models
Research Tools and Methods
Collaboration
Engagement Activities


Description	We worked in close collaboration with our project partners from the University of Manchester and the University of Dundee. We also established a close collaboration with the OpenML Team from the University of Technology, Eindhoven and internally, in Brunel University with Dr Crina Grosan - an expert in machine learning. Below is a summary of the key outputs. 1. Annotation scheme. We have developed an annotation scheme for the annotation of machine learning experiments that are typically used for the prediction of biological activities of chemical compounds. The annotation scheme consists of descriptors for datasets, machine learners, their predictions and also drag targets. While several formalisms for the description of datasets and machine learning algorithms already exist, they are generic and not tuned for the prediction of biological activities necessary for drug discovery. For example, the ontology DMOP (Data Mining Optimization), developed within the European e-LICO project (http://www.e-lico.eu/DMOP.html), is sufficient to describe such properties of a dataset as number of data items, feature correlation, etc. However DMOP and other formalism do not provide descriptors that capture information about, for example, diversity of a chemical space. There also exists a classification of drug targets (see ChEMBL: https://www.ebi.ac.uk/chembl/target/browser), but it does not capture the functionality of targets or their similarity. Our annotation scheme captures all the essential properties of datasets, predictions, and drug targets. The results of a preliminary work on an annotation scheme for the drug discovery have been published as a use case in this paper: Panov, P., Soldatova, L.N., Dzeroski, S. (2014) Ontology of Core Data Mining Entities. J. of Data Mining and Knowledge Discovery. 28/5-6: 1222-1265. 2. Software development. We have developed and successfully tested the software infrastructure to run QSAR and meta-QSAR experiments. 3. Software integration. We have integrated the workflow of our project into the OpenML platform (http://openml.org/). OpenML is a popular platform where datasets and predictions made by various machine learning algorithms are stored, shared and compared. This open approach to the sharing of machine learning experiments is designed to save time and efforts, as there is no need to repeat computational experiments. We have made an agreement with the OpenML Team that this platform will have a dedicated to drug discovery section. In this way the drug discovery research community will benefit from this already established platform and its functionality (i.e. the comparison of different predictions across different datsets). We have adjusted our software to enable the deposition of our datasets and the results of machine learning runs directly to OpenML. We have worked together with the developers of OpenML platform and they implemented a secure access to the store datasets. Dr Sadawi and Prof. King (the University of Manchester) participated in OpenML Workshop, University of Technology Eindhoven in October, 2014 By the end of the project all data and machine learning predictions will be available at OpenML. 4. QSAR learning. We have done initial trails of QSAR and meta-QSAR learning. We are currently running the main QSAR learning experiments. 5. Meta-QSAR learning. Our meta- QSAR approach has proven to be more successful than the typical approaches used in drug activity predictions. The meta- QSAR predictions have outperformed random forest, vector support machines, and other popular algorithms in the majority of cases. We extracted 2,750 targets from ChEMBL with a very diverse number of chemical compounds. For the meta-learning stage we conceived a classification problem that indicates which QSAR method should be used for a particular QSAR problem. The training and learning dataset is formed by meta-features extracted from the datasets of the base learning level and are based on target properties (hydrophobicity, molecular weight, aliphatic index, etc) and on information theory (mean, mutual information, entropy, etc). The hypothesis that there is no single way to learning QSARs has been confirmed. We have obtained sufficient experimental evidence that meta-QSAR learning is correctly suggesting for almost all targets which QSAR method should be used. 6. Additionally to the planned work we have worked on transfer learning. We developed a novel approach for a transfer learning using the evolutionary distance of targets to improve the standard QSAR learning through use of related targets. Dissemination: 1) Intermediate project results have been presented by Dr Soldatova at the 20th Euro-QSAR conference in St Petersburg, Russia in September, 2014 as an oral communication "Meta QSAR" (http://www.ldorganisation.com/v2/produits.php?langue=english&cle_menus=1238915734). Slides of the presentation are available at the project website: http://www.meta-qsar.org/pubs.html 2) Initial project results were presented by Prof. King at OpenML Workshop, University of Technology Eindhoven in October, 2014. His talk is available at youtube: https://www.youtube.com/watch?v=llTppH2zLuE?dex=1&list=PLBZBIkixHEicUl5fE2BQwHTc0GjytF_tH 3) The work on meta-QSAR learning has been presented at the ECML PKDM (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases) conference (http://www.ecmlpkdd2015.org/) at the MetaSel - Meta-learning & Algorithm Selection workshop (http://metasel2015.inesctec.pt/) in Porto in September 2015. The extended abstract "Meta-QSAR: learning how to learn QSARs" by Iván Olier, Crina Grosan, Noureddin Sadawi, Larisa Soldatova and Ross King is available in the Proceedings at: http://ceur-ws.org/Vol-1455/paper-11.pdf A video record of the presentation is available at: https://www.youtube.com/watch?v=wb6aOmpp8mQ 4) The work on the transfer learning has been presented at the ECML PKDM conference (http://www.ecmlpkdd2015.org/) at the BigTargets: Big Multi-Target Prediction workshop (http://www.kermit.ugent.be/big-multi-target-prediction/index.php) in Porto in September 2015. An extended abstract "Multiple Task Learning for Quantitative Structure Activity Relationship Learning: Use of a Natural Metric" by Iván Olier, Crina Grosan, Noureddin Sadawi, Larisa Soldatova and Ross King is available in the Proceedings at: http://www.kermit.ugent.be/big-multi-target-prediction/files/abstracts/Sadawi.pdf 5) A paper "Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology." by Zhe He, Christopher Ochs, Larisa N Soldatova, Yehoshua Perl, Sivaram Arabandi, James Geller has been presented at ICBO (International Conference on Biomedical Ontology)/ VDOS (Vaccine and Drug Ontology Studies) in 2013. The presentation is available at: http://www.columbia.edu/~zh2132/VDOS2013-Zhe-Slides.pdf The paper is available at: http://www2.unb.ca/csas/data/ws/semantic-trilogy-workshops/papers/vdos/vdos2013_submission_4.pdf (cited by 5). 6) A paper about meta-qsar learning is due to be submitted in April, 2016 to a Special Issue on Meta-Learning and Algorithm Selection in the Machine Learning Journal
Exploitation Route	Our results will enable the better design of drugs by academic and commercial laboratories. The problem of how best to learn QSARs is of great industrial and medical importance. Drug development is arguably the most important applications of science in the UK. The average cost to bring a new drug to market is ~£500 million. A successful drug can earn £billions a year, and as patent protection is time-limited, even an extra week of protection can be of great financial significance. The UK (both academia and industry) is a leader in QSAR research and chemoinformatics in general as can be seen by its publication record. This project aims to help to maintain this lead.
Sectors	Chemicals,Healthcare,Pharmaceuticals and Medical Biotechnology
URL	http://www.meta-qsar.org/index.html


Description	The finding of the project contributed to the methods used by the company Exiscentia (https://www.exscientia.ai/) - one of the most prominent AI company in drug design.
First Year Of Impact	2015
Sector	Chemicals,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Title	Closed-loop AI experimentation
Description	Our work is now causing a revolution in materials science.
Type Of Material	Improvements to research infrastructure
Year Produced	2009
Provided To Others?	Yes
Impact	Closed-loop AI experimentation


Title	QSAR models in OpenML platform
Description	a collection of QSAR predictive models. All models will be publicly available at OpenML after publishing a paper
Type Of Material	Computer model/algorithm
Year Produced	2017
Provided To Others?	No
Impact	QSAR models will enable the better design of drugs by academic and commercial laboratories
URL	http://www.openml.org/


Description	OpenML
Organisation	Eindhoven University of Technology
Country	Netherlands
Sector	Academic/University
PI Contribution	We worked together on the development of software for depositing datasets and the results of machine learning experiments to modify OpenML platform to suite the needs of our project. Our project will benefit from the use of this well established and popular platform.
Collaborator Contribution	Our project will contribute to OpenML platform datasets and models.
Impact	The collaboration is multidisciplinary, it involves researchers from biochemistry, software engineering and machine learning. Expected Output: a QSAR-specific version of OpenML platform
Start Year	2013


Description	Horizons article
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Ross King was interviewed for an article in Cartlidge, E. "Let the Robots do the tedious work", Horizons (Swiss magazine for Scientific Research); Vol 113; pg 10-11
Year(s) Of Engagement Activity	2017
URL	http://www.snf.ch/SiteCollectionDocuments/horizonte/Horizonte_gesamt/SNSF_horizons_113_en.pdf


Description	Science Museum Antenna Live Event
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	Yes
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	We presented the Robot Scientist and discussed the types of experiments it was capable of undertaking, within the framework of 'how to think like a scientist'. We also had an interactive (computer simulation) demonstration of drug design in which visitors could ascertain what features of a compound rendered it as a 'good' or a 'bad' drug. Both activities provoked significant interest and enthusiasm from members of the public of all ages, from 8 to 80! The robot itself sparked more general discussion about the potential uses of a robot scientist, as well as the technicalities of how it operates, whereas the computer simulated demonstration enabled those who took part to think about the characteristics looked for in drug design, which also generated much discussion. Visitor records to this specific exhibit recorded more than 3500 visitors either 'spectating' or actually 'engaging' with the scientists presenting the robot. We anticipate that impact from this event will be long-term and on-going. For example, the event will certainly have increased public awareness as to what a Robot Scientist is capable of (i.e. it is more than just a technical operator, but is also capable of thinking like a human scientist), and we saw evidence of increased discussion around this subject between friends and families. We also anticipate increased interest in STEM subjects at secondary and higher education levels, given the curios
Year(s) Of Engagement Activity	2015
URL	https://www.youtube.com/watch?v=wMIcMrzDgNc

Abstract

Organisations

People

ORCID iD

Publications