Learning to learn how to design drugs
Lead Research Organisation:
Brunel University London
Department Name: Computer Science
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Wang K
(2021)
NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding.
in NPJ systems biology and applications
Sadawi N
(2019)
Multi-task learning with a natural metric for quantitative structure activity relationship learning.
in Journal of cheminformatics
Panov P
(2014)
Ontology of core data mining entities
in Data Mining and Knowledge Discovery
Panov P
(2016)
Generic ontology of datatypes
in Information Sciences
Orhobor O
(2020)
Predicting rice phenotypes with meta and multi-target learning
in Machine Learning
Olier I
(2018)
Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.
in Machine learning
Olier I
(2021)
Transformational machine learning: Learning how to learn from many related scientific problems.
in Proceedings of the National Academy of Sciences of the United States of America
Bandrowski A
(2016)
The Ontology for Biomedical Investigations.
in PloS one
Description | We worked in close collaboration with our project partners from the University of Manchester and the University of Dundee. We also established a close collaboration with the OpenML Team from the University of Technology, Eindhoven and internally, in Brunel University with Dr Crina Grosan - an expert in machine learning. Below is a summary of the key outputs. 1. Annotation scheme. We have developed an annotation scheme for the annotation of machine learning experiments that are typically used for the prediction of biological activities of chemical compounds. The annotation scheme consists of descriptors for datasets, machine learners, their predictions and also drag targets. While several formalisms for the description of datasets and machine learning algorithms already exist, they are generic and not tuned for the prediction of biological activities necessary for drug discovery. For example, the ontology DMOP (Data Mining Optimization), developed within the European e-LICO project (http://www.e-lico.eu/DMOP.html), is sufficient to describe such properties of a dataset as number of data items, feature correlation, etc. However DMOP and other formalism do not provide descriptors that capture information about, for example, diversity of a chemical space. There also exists a classification of drug targets (see ChEMBL: https://www.ebi.ac.uk/chembl/target/browser), but it does not capture the functionality of targets or their similarity. Our annotation scheme captures all the essential properties of datasets, predictions, and drug targets. The results of a preliminary work on an annotation scheme for the drug discovery have been published as a use case in this paper: Panov, P., Soldatova, L.N., Dzeroski, S. (2014) Ontology of Core Data Mining Entities. J. of Data Mining and Knowledge Discovery. 28/5-6: 1222-1265. 2. Software development. We have developed and successfully tested the software infrastructure to run QSAR and meta-QSAR experiments. 3. Software integration. We have integrated the workflow of our project into the OpenML platform (http://openml.org/). OpenML is a popular platform where datasets and predictions made by various machine learning algorithms are stored, shared and compared. This open approach to the sharing of machine learning experiments is designed to save time and efforts, as there is no need to repeat computational experiments. We have made an agreement with the OpenML Team that this platform will have a dedicated to drug discovery section. In this way the drug discovery research community will benefit from this already established platform and its functionality (i.e. the comparison of different predictions across different datsets). We have adjusted our software to enable the deposition of our datasets and the results of machine learning runs directly to OpenML. We have worked together with the developers of OpenML platform and they implemented a secure access to the store datasets. Dr Sadawi and Prof. King (the University of Manchester) participated in OpenML Workshop, University of Technology Eindhoven in October, 2014 By the end of the project all data and machine learning predictions will be available at OpenML. 4. QSAR learning. We have done initial trails of QSAR and meta-QSAR learning. We are currently running the main QSAR learning experiments. 5. Meta-QSAR learning. Our meta- QSAR approach has proven to be more successful than the typical approaches used in drug activity predictions. The meta- QSAR predictions have outperformed random forest, vector support machines, and other popular algorithms in the majority of cases. We extracted 2,750 targets from ChEMBL with a very diverse number of chemical compounds. For the meta-learning stage we conceived a classification problem that indicates which QSAR method should be used for a particular QSAR problem. The training and learning dataset is formed by meta-features extracted from the datasets of the base learning level and are based on target properties (hydrophobicity, molecular weight, aliphatic index, etc) and on information theory (mean, mutual information, entropy, etc). The hypothesis that there is no single way to learning QSARs has been confirmed. We have obtained sufficient experimental evidence that meta-QSAR learning is correctly suggesting for almost all targets which QSAR method should be used. 6. Additionally to the planned work we have worked on transfer learning. We developed a novel approach for a transfer learning using the evolutionary distance of targets to improve the standard QSAR learning through use of related targets. Dissemination: 1) Intermediate project results have been presented by Dr Soldatova at the 20th Euro-QSAR conference in St Petersburg, Russia in September, 2014 as an oral communication "Meta QSAR" (http://www.ldorganisation.com/v2/produits.php?langue=english&cle_menus=1238915734). Slides of the presentation are available at the project website: http://www.meta-qsar.org/pubs.html 2) Initial project results were presented by Prof. King at OpenML Workshop, University of Technology Eindhoven in October, 2014. His talk is available at youtube: https://www.youtube.com/watch?v=llTppH2zLuE?dex=1&list=PLBZBIkixHEicUl5fE2BQwHTc0GjytF_tH 3) The work on meta-QSAR learning has been presented at the ECML PKDM (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases) conference (http://www.ecmlpkdd2015.org/) at the MetaSel - Meta-learning & Algorithm Selection workshop (http://metasel2015.inesctec.pt/) in Porto in September 2015. The extended abstract "Meta-QSAR: learning how to learn QSARs" by Iván Olier, Crina Grosan, Noureddin Sadawi, Larisa Soldatova and Ross King is available in the Proceedings at: http://ceur-ws.org/Vol-1455/paper-11.pdf A video record of the presentation is available at: https://www.youtube.com/watch?v=wb6aOmpp8mQ 4) The work on the transfer learning has been presented at the ECML PKDM conference (http://www.ecmlpkdd2015.org/) at the BigTargets: Big Multi-Target Prediction workshop (http://www.kermit.ugent.be/big-multi-target-prediction/index.php) in Porto in September 2015. An extended abstract "Multiple Task Learning for Quantitative Structure Activity Relationship Learning: Use of a Natural Metric" by Iván Olier, Crina Grosan, Noureddin Sadawi, Larisa Soldatova and Ross King is available in the Proceedings at: http://www.kermit.ugent.be/big-multi-target-prediction/files/abstracts/Sadawi.pdf 5) A paper "Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology." by Zhe He, Christopher Ochs, Larisa N Soldatova, Yehoshua Perl, Sivaram Arabandi, James Geller has been presented at ICBO (International Conference on Biomedical Ontology)/ VDOS (Vaccine and Drug Ontology Studies) in 2013. The presentation is available at: http://www.columbia.edu/~zh2132/VDOS2013-Zhe-Slides.pdf The paper is available at: http://www2.unb.ca/csas/data/ws/semantic-trilogy-workshops/papers/vdos/vdos2013_submission_4.pdf (cited by 5). 6) A paper about meta-qsar learning is due to be submitted in April, 2016 to a Special Issue on Meta-Learning and Algorithm Selection in the Machine Learning Journal |
Exploitation Route | Our results will enable the better design of drugs by academic and commercial laboratories. The problem of how best to learn QSARs is of great industrial and medical importance. Drug development is arguably the most important applications of science in the UK. The average cost to bring a new drug to market is ~£500 million. A successful drug can earn £billions a year, and as patent protection is time-limited, even an extra week of protection can be of great financial significance. The UK (both academia and industry) is a leader in QSAR research and chemoinformatics in general as can be seen by its publication record. This project aims to help to maintain this lead. |
Sectors | Chemicals,Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | http://www.meta-qsar.org/index.html |
Description | The finding of the project contributed to the methods used by the company Exiscentia (https://www.exscientia.ai/) - one of the most prominent AI company in drug design. |
First Year Of Impact | 2015 |
Sector | Chemicals,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Title | Closed-loop AI experimentation |
Description | Our work is now causing a revolution in materials science. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2009 |
Provided To Others? | Yes |
Impact | Closed-loop AI experimentation |
Title | QSAR models in OpenML platform |
Description | a collection of QSAR predictive models. All models will be publicly available at OpenML after publishing a paper |
Type Of Material | Computer model/algorithm |
Year Produced | 2017 |
Provided To Others? | No |
Impact | QSAR models will enable the better design of drugs by academic and commercial laboratories |
URL | http://www.openml.org/ |
Description | OpenML |
Organisation | Eindhoven University of Technology |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | We worked together on the development of software for depositing datasets and the results of machine learning experiments to modify OpenML platform to suite the needs of our project. Our project will benefit from the use of this well established and popular platform. |
Collaborator Contribution | Our project will contribute to OpenML platform datasets and models. |
Impact | The collaboration is multidisciplinary, it involves researchers from biochemistry, software engineering and machine learning. Expected Output: a QSAR-specific version of OpenML platform |
Start Year | 2013 |
Description | Horizons article |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Ross King was interviewed for an article in Cartlidge, E. "Let the Robots do the tedious work", Horizons (Swiss magazine for Scientific Research); Vol 113; pg 10-11 |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.snf.ch/SiteCollectionDocuments/horizonte/Horizonte_gesamt/SNSF_horizons_113_en.pdf |
Description | Science Museum Antenna Live Event |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | Yes |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | We presented the Robot Scientist and discussed the types of experiments it was capable of undertaking, within the framework of 'how to think like a scientist'. We also had an interactive (computer simulation) demonstration of drug design in which visitors could ascertain what features of a compound rendered it as a 'good' or a 'bad' drug. Both activities provoked significant interest and enthusiasm from members of the public of all ages, from 8 to 80! The robot itself sparked more general discussion about the potential uses of a robot scientist, as well as the technicalities of how it operates, whereas the computer simulated demonstration enabled those who took part to think about the characteristics looked for in drug design, which also generated much discussion. Visitor records to this specific exhibit recorded more than 3500 visitors either 'spectating' or actually 'engaging' with the scientists presenting the robot. We anticipate that impact from this event will be long-term and on-going. For example, the event will certainly have increased public awareness as to what a Robot Scientist is capable of (i.e. it is more than just a technical operator, but is also capable of thinking like a human scientist), and we saw evidence of increased discussion around this subject between friends and families. We also anticipate increased interest in STEM subjects at secondary and higher education levels, given the curios |
Year(s) Of Engagement Activity | 2015 |
URL | https://www.youtube.com/watch?v=wMIcMrzDgNc |