Symbolic Support for Scientific Discovery in Systems Biology

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

The aim of this proposal is to advance the UK's world-leading research on the automation of science by developing novel Artificial Intelligence (AI) support for an existing laboratory robot called Eve (whose predecessor Adam was popularised by Time Magazine and Science in 2009). The purpose of this project is to develop a new logic-based reasoning tool that will allow robots to correct errors in their knowledge. Unlike prior work aimed at extending knowledge that is incomplete, we argue such machines also need the ability to revise knowledge that is incorrect. Indeed, we suggest the capacity to make (and learn from) mistakes is an indispensible part of scientific reasoning. Thus our goals are to realise this ability in a software system for automating intelligent inference about scientific theories and experiments and to demonstrate its benefit in a genuine application of Eve. We believe this will pave the way to a new era in which Robot Scientists will be more productive, more cost-effective, and better able to assist humans in all parts of scientific method.

This project is based on the hypothesis that ground-breaking advances in a field of AI known as Answer Set Programming (ASP) can be used to develop a novel form of (multi-semantic meta-logical) reasoning that will give Robot Scientists the ability to continuously revise and extend their knowledge. Evidence to support this claim is provided by 2 preliminary studies which link the applicant's previous work on the integration of abductive and inductive inference with the robot Adam and a leading ASP system called Clasp. The 1st study showed how a combination of non-monotonic and non-deductive logic can be used to revise a state-of-the-art metabolic model of yeast metabolism in order to fit data seen by Adam; but it also showed a further combination of meta-logical and multi-semantic logic was needed to design new experiments for testing the proposed revisions. The 2nd study suggests how a combination of features recently included in Clasp can be used to do this. Hence this proposal affords a timely opportunity to draw together and build upon these complementary strands of research in a way that will open the door to exciting new opportunities for scientific discovery in systems biology.

The most direct beneficiaries of this work will be our collaborators in the Robot Scientist group (now at the Univ. of Manchester) as our software will enable their robot to correct mistakes in its knowledge and thereby allow the continual evolution of scientific models through many cycles of analysis and experiment. This will represent a major step towards Robot Scientists that participate more effectively in science. By making our tools portable we hope to facilitate their application in other tasks that will benefit from their enhanced reasoning abilities. These tasks include planned follow-on work in the modelling of social insect behaviour (previously studied by our research group) and the automation of some aspects of legal reasoning (recently formalised in argumentation theory). We also plan to study probabilistic extensions of this research that can be built on the logical foundations we will lay. Once our system has been deployed on the Robot Scientist, we also hope to use data generated by planned applications of Eve in high-throughput drug screens to improve our understanding of living organisms.

Planned Impact

Although the main goals of this project and its most immediate extensions are primarily of an academic nature, their successful realisation could provide the basis for longer-term contributions to both industry and society. Since our work involves the automation of the abstract mechanisms of scientific inference (for automatically correcting errors in formalised knowledge and for designing experiments to test those corrections) it has the potential to be incorporated into a wide range of different scientific fields. We chose to focus initially on the domain of symbolic systems biology to exploit the expertise of our collaborators and facilitate future integration into pharmaceutical applications. This is because the Robot Scientist Eve has been specifically designed to support high-throughput drug screens that are likely to be of both medical and scientific interest; and while they are mainly intended to identify leads for new candidate drugs, such screens will also generate large amounts of data that our methods could potentially use to identify and correct errors in current biological models of the underlying organisms. We believe the possibility of interleaving medical and scientific investigations in this way could represent a major potential benefit of Robot Scientists that our work will help to realise.

In general, since industrial progress often depends upon scientific progress, it follows that any means of improving the efficiency of scientific method (such as Robot Scientists with the ability to automate the intellectual aspects of science as well as the experimental ones) could have significant benefits. At the Univ. of Bristol we have plans to exploit our reasoning methods for the automation of certain types of legal contractual reasoning (which could have subsequent applications in e-commerce) and the study of social insect behaviour (which could have long term implications in agriculture). We see our work as part of a growing trend where formal methods from computer science are increasingly adopted into the practice of science; and which has been publicised by the "2020 Science Group", commissioned by Microsoft Research in 2006 to explore key scientific challenges over the next decade. Given the recent uptake of experimental automation in both science and industry, now is the time to invest in intelligent methods for scientific reasoning so that the experimental and analytical aspects can be synergistically integrated in a way that truly addresses the challenges of 21st century science.

Many of these themes have been explored in a series of workshops organised by Dr Ray on the role of abduction and induction in artificial intelligence and their application to science (AIAI'05/06/07/09). The proceedings of most recent edition (http://www.cs.bris.ac.uk/~oray/AIAI09/) at the International Joint Conference on AI in 2009 reveals a range of related issues and applications the proposed research could benefit. These include scientific policy (e.g. Bradley's invited talk on open notebook science), publication models (e.g. Poole's paper on semantic science), and a spectrum of uses ranging from basic science (e.g. Brodaric's paper on geological knowledge evolution) to national security (e.g. Josephson's invited talk on military situation awareness). Last year, by extending the scope of these earlier workshops, the author organised the 1st International Symposium on Symbolic Systems Biology - a new field at the intersection of formal methods and systems biology - which reveals more potential applications in synthetic biology (e.g. Phillips's work on cell programming) and agriculture (e.g. Muggleton's work on food webs). All of these formal approaches can benefit from the type of logical inference this proposal seeks to develop.
 
Description This research developed an open-source reasoning approach called CATHEXESIS (Computational Assistance for THeories and Experiments in Science) that integrates the main types of logical inference used for scientific reasoning within a system called XHAIL (eXtended Hybrid Abductive Inductive Learning). Our methods were applied to the automatic revision of metabolic networks using data acquired by a Robot Scientist developed by our project collaborators. Our tool was able to correct an error in a state-of-the-art logical model of yeast metabolism and was also incorporated into an active learning system called HUGINN that integrates experiment design with hypothesis generation in a closed scientific loop which extends previous work in the type of hypotheses it can generate and the range of experiments it can reason about.
Exploitation Route Although we have succeeded in showing the effectiveness of our methods in simulated scientific experiments we would like to incorporate them on board the actual "Robot scientist" as it physically performs experiments in real-time. This was not possible during the project because the robot hardware was unavailable and we wanted to first obtain some guarantees that the scientific model would converge while the system maintained a lid on experimental costs. Although we have succeeded in integrating our tools into a continuous cycle of scientific investigation, we did so using a different method of reasoning about collections of competing hypotheses than originally proposed. We still believe that it is worth investigating metalogical reasoning approaches to this problem that we originally envisaged. In the course of this work we discovered that many existing methods for modelling metabolic networks with inhibitory or competitive interactions produce counter-intuitive results when those networks contain cycles. We are still working on better ways of modelling such networks
Sectors Pharmaceuticals and Medical Biotechnology,Security and Diplomacy

URL https://sites.google.com/site/cathexisxhail/home
 
Title Prototype implementation of XHAIL in Java 
Description Java implementation of XHAIL (eXtended Hybrid Abductive Inductive Learning) - which is a nonmonotonic ILP (Inductive Logic Programming) system that combines deductive (consequence-based), abductive (assumption-based) and inductive (generalisation-based) inference types within a common logical framework. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Our XHAIL prototype implementation has been further extended by other groups as the basis of several subsequent projects such as * Improving Scalability of Inductive Logic Programming via Pruning and Best-Effort Optimisation, by Mishal Kazmi, Peter Schüller, Yücel Saygin, in Expert systems with applications: 87 (2017) 291-303. The authors forked our code (see https://github.com/knowlp/XHAIL) after stating "we also therefore opted to base our research on XHAIL due to it being the most robust tool for our task" (p.294). * Incremental and Iterative Learning of Answer Set Programs from Mutually Distinct Examples, by Arindam Mitra, Chitta Baral, in Theory of Logic Programming: 18 (3-4) 623-637, 2018. The authors forked our code (https://github.com/ari9dam/XHAIL) after stating "XHAIL...plays a crucial role in the algorithm we present here" (p.627). 
URL https://github.com/stefano-bragaglia/XHAIL