Data-based optimal control of synthetic biology gene circuits

Lead Research Organisation: Imperial College London
Department Name: Bioengineering

Abstract

Synthetic Biology aims at the engineering of biological systems. Its most prominent application is the rational modification or (re-)design of living organisms, ideally in a way akin to the engineering of man-made devices, for their efficient use in sectors such as energy, biomedicine, drug production and food technology. The availability of control mechanisms that can ensure robust and optimal operation of engineered systems is one of the key factors behind the tremendous advances in engineering fields such as transportation, industrial production and energy. However, in the case of engineered biosystems, their accurate control must typically overcome two important hurdles: uncertainty and noise. Uncertainty arises from a high number of components that interact in a nonlinear (and often unknown) manner, and makes it often extremely hard to build accurate mathematical models of their behaviour. Noise, on the other hand, is ubiquitous in cellular systems since the environmental conditions in which they operate typically vary unpredictably and gene expression is inherently a stochastic process.

In this research, we investigate the possibility of automatically learning to optimally control synthetic biology gene networks from input-output data collected from these gene networks, i.e. without using a mathematical model built a priori. In particular, we will develop algorithms that allow computer-based systems to autonomously learn how to vary the inputs of a given system so as to optimise its performance defined in terms of the time evolution of its measured outputs. The control strategies learned by our methods will take into account noise and uncertainties in the data and will be developed to be robust with respect to these. Such data-based strategies are analogous to, for example, the way we drive our cars: without any a priori mathematical model of the car behaviour on the road, we can effectively learn how and when to steer, accelerate and break (inputs) based on our observations of the car's position and velocity on the road (outputs) so as to, for example, minimise our lap time around an unknown track using appropriate input scheduling strategies.

The algorithms we will develop will allow users to define the desired behaviour and performance objectives and will compute input-scheduling strategies that allow these objectives to be satisfied. The project will build on methods that I have developed and successfully applied to the optimal control of nonlinear systems in noisy environments, e.g., my work on data-based optimal drug-scheduling for HIV infected patients. The use of such purely data-based optimal control methods is particularly important in synthetic biology applications where the system to be controlled is typically poorly characterised and model uncertainties prevail, yet large amount of high-throughput input-output data are available or can be extracted. To showcase the potential of these computational techniques, we will develop data-based methods to optimally control two landmark synthetic biomodules: the light-inducible genetic toggle switch, and the light-inducible generalised repressilator, both of which are currently under implementation in my host Department.

Planned Impact

* Impact on Society: One of the main goal of Synthetic Biology is the engineering of biology. Its considerable anticipated impact in biotechnology, energy production, food technology, biomedicine and society in general has been acknowledged worldwide by UK, EU and US policy makers.

The availability of control mechanisms that can ensure robust and optimal operation of controlled systems is one of the key factors behind the tremendous advances in engineering fields like transportation, industrial production and energy. However, unlike for these traditional engineering technologies, we are still unable to externally control synthetic biology circuits so as to optimise their behaviour in the presence of uncertainty. This inability is a major bottleneck in the development of Synthetic Biology as a true engineering discipline.

This project will narrow this gap and bring Synthetic Biology technologies one step closer to real-world applications. It will work at a foundational level by delivering a computational framework that can be adopted in a broad range of areas. The project will have an important impact on the development of Synthetic Biology and will lay the groundwork for future applications that require optimal control of gene networks. This will open up avenues for new research and development for the years to come.

* Impact on Knowledge: There is a recent trend in the Control Engineering community to focus on problems from the biosciences. This tendency is having an enormous impact in Control Theory and Biology, by providing a new class of theoretical problems and practical solutions motivated by the inherent complexity of intracellular systems. This project is in line with this tendency and lies at the interface between Control Theory and biological systems. It will stimulate the development of new data-based optimal control methods and will provide a major emphasis on the role of Control Engineering in biological applications.

* Impact on Economy: The biotechnology industry constitutes the main target for economic impact. With large amounts of money spent in the efficient development of bacteria capable of efficiently producing biofuels, commodity chemicals, pharmaceuticals, etc., the economic impact of efficient and robust optimal control solutions can be very large. The increasing availability of large quantities of input-output data opens avenues for the systematic investigation of methods capable of inferring input-scheduling strategies allowing to obtain an optimal dynamical behaviour at the observed outputs of the considered systems. The multidisciplinary knowledge developed through this project will thus allow us to contribute to the excellence of control engineering and synthetic biology research in the UK and worldwide.

* Impact on research communities: All the participants involved in the project will greatly benefit from the project. The RA will further develop his or her expertise in several areas of optimal control, machine learning and cellular biology and will be directly exposed to leading research in Synthetic Biology. The project will also provide the RA with the opportunity to deepen his or her understanding of experimental techniques used to measure input-output data that will be used in the project, and of the details relating to several synthetic biology gene networks chosen as benchmark systems in this project. My colleagues at the Department of Bioengineering and the Centre for Synthetic Biology and Innovation at Imperial College and, more broadly, the Synthetic Biology and Control Engineering communities, will also benefit from the results of this project through the engineering methods that this project will provide them with. In particular, this work will serve as a pioneering step to showcase the potential and usefulness of directly using input-output data to infer optimal control strategies for Synthetic Biology systems.

Publications

10 25 50

publication icon
Sootla Aivar (2013) Toggling a Genetic Switch Using Reinforcement Learning in arXiv e-prints

publication icon
Wei Pan (2014) Inference of Switched Biochemical Reaction Networks Using Sparse Bayesian Learning in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014)

 
Description Through this research we have investigated the theoretical development of data-based algorithms for the automatic learning and control of the dynamics of biosystems (gene regulatory networks in particular) in engineered cells. For these systems we have focused on inducible genetic systems for which the input triggering gene expression can be provided in the form of chemical inductions (e.g. through the use of bacterial inducible promoters) or light pulses inductions (e.g. through the use of bacterial opto-genetic modules). The algorithm accumulates information from applied inputs and corresponding outputs (reactions of the cellular systems to these applied inputs). Based on these input-output data the algorithms infers a general input-output mapping that it then uses to propose an optimal input-scheduling control strategy yielding a desired output behaviour for the controlled cells. The novelty consists in the fact that the algorithm only uses input-output data (and an a priori defined optimality criterion defined in terms of the inputs and outputs) to perform the control. There is no mathematical model of the system to be controlled provided a priori. The dynamics of the system is automatically learned through inference of the input-output mapping and used directly to find optimal feedback control strategies. Furthermore, as the inferred control strategy is a feedback control law this approach allows to intrinsically deal with uncertainty and perturbations. Finally, if used in online mode (where the algorithm continuously learns from new input-output pairs), the inferred feedback control law can evolve and adapt to (slow) changes in the dynamics of the system over time (which is very important for the control of biological systems).
Exploitation Route The research we have performed might serve as the basis for the future development of fully automatic feedback control systems where optimal ways to optimally control the dynamic behaviour of cells can be automatically inferred using the initial results and approach we obtained through this research (through their input-output interface to the external world; where, for example, input = light genetic induction and output = expressed fluorescent protein). Ideally, this could lead in the future to automatic lab experiments where the algorithm decides on its own how to apply external inductions on cells until their measured output behaves in an a priori defined way, leading to "robot scientists" and automated optimal design of experiments.
Sectors Agriculture, Food and Drink,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Other

 
Description The findings of this research have enabled to raise the awareness of a new paradigm for the control of biological systems using data-based (as opposed to model-based) feedback control methods executed on a computer. This new paradigm is very important in biology (and synthetic biology in particular) as the biosystems to be controlled are typically poorly characterised and model uncertainties prevail - yet large amount of high-throughput input/output data are available or can be extracted. A publicly available software package implemented in Python has been created to make the use of data-based reinforcement learning methods adapted to the input-output control of cellular systems easier. This package can be found at: https://sites.google.com/site/aivarsootla/pyFQI_v0.01.zip The corresponding user guide is available at: https://docs.google.com/viewer?a=vpd=sites&srcid=ZGVmYXVsdGRvbWFpbnxhaXZhcnNvb3RsYXxneDozODlhODdkMzJlNjAyMDVl
First Year Of Impact 2013
Sector Digital/Communication/Information Technologies (including Software),Education,Other
Impact Types Cultural,Societal

 
Description Collaboration with Microsoft Research Cambridge 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution The ideas initially developed in the EPSRC proposal EP/J014214/1 led to a Microsoft Research Project PhD studentship funding and collaboration on the theme of "Sparse Inference of Nonlinear Dynamical Systems from Time Series Data". This studentship was supported by the Microsoft PhD Scholarship programme, the EPSRC Dorothy Hodgkin Postgraduate Award programme, and Department of Bioengineering Industrial PhD studentship programme.
Collaborator Contribution Microsoft Research Cambridge provide guidance, and supervision through various in-person meetings and Skype meetings during the PhD work of the student.
Impact Various publications and conferences presentations have resulted from this collaboration. This collaboration is multi-disciplinary with research at the intersection of systems theory, machine learning, systems biology and synthetic biology.
Start Year 2012
 
Description Collaboration with the University of Liege, Belgium 
Organisation University of Liege
Country Belgium 
Sector Academic/University 
PI Contribution Design of a new Python package for the implementation of the Fitted-Q iteration reinforcement learning algorithm.
Collaborator Contribution Intellectual and advisory contribution in the development of the method and associated algorithm by Prof Damien Ernst, who is an authority in reinforcement learning for the control of systems
Impact Multi-disciplinary collaboration: control engineering, machine learning, synthetic biology
Start Year 2012
 
Title BSID 
Description Matlab toolbox for nonlinear systems identification from time-series data 
Type Of Technology Software 
Year Produced 2015 
Impact Prediction and control of behaviour and abnormalities in any complex dynamical network, and in particular in those encountered in biology require the development of multivariate predictive models that integrate large dataset from different sources. Although, a large amount of data are being collected on a daily basis, very few methods allow the automatic creation from these data of nonlinear Ordinary Differential Equation models for understanding and (re-)design/control, and an inordinate amount of time is still being spent on the manual aggregation of information and expert development of models that explains these data. In this context, the problem of reconstruction or identification of biological systems from experimental time series data is of fundamental importance. Yet, the development of general reconstruction techniques remains challenging, especially for nonlinear system identification. We are currently developing new methods to identify both parametric structure and parameter values in nonlinear Ordinary Differential Equation models from heterogeneous datasets. Applications of such nonlinear systems identification methods cover fundamental questions in systems biology, synthetic biology (debugging and design of cellular systems) and modelling of complex dynamical networks. 
URL https://github.com/panweihit/BSID
 
Title PyFQI 
Description Implementation in python of the Fitted Q Iteration Algorithm, which is used for the data-based optimal control of gene networks without the need for a priori identifying dynamical models for these networks. A manual for the use of this Python software is available here: https://workspace.imperial.ac.uk/people/Public/sootla/main.pdf 
Type Of Technology Software 
Year Produced 2014 
Impact An outstanding feature of our approach is that the control techniques we are developing are solely based on input-output data readouts from the network, with the goal to provide optimal and robust operation of potentially poorly characterised or unknown networks. We call such an approach data-based optimal control as opposed to more classical model-based approaches. Data-based methods aim to infer/learn a feedback optimal control law from input-output data collected from the system, i.e., from the way the output of the system reacts to various perturbations at the input of the system. 
URL https://workspace.imperial.ac.uk/people/Public/sootla/pyFQI_v0.01.zip
 
Description Control Theory in Synthetic Biology, Proposer and organiser of the invited session "Control Theory in Synthetic Biology", 51st IEEE Conference on Decision and Control (IEEE-CDC 2012), Maui, Hawaii, USA, 10-13 December, 2012. 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Type Of Presentation Paper Presentation
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Proposer and organiser of the invited session "Control Theory in Synthetic Biology", 51st IEEE Conference on Decision and Control (IEEE-CDC 2012), Maui, Hawaii, USA, 10-13 December, 2012.

Larger awareness of new control paradigm for the control of cellular engineered systems and gene regulatory networks within those.
Year(s) Of Engagement Activity 2012
 
Description Data-based optimal control of gene regulatory networks, Uppsala University and Royal Institute of Technology (Stockholm), December 2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Dr Aivar Sootla was invited to give talks in Uppsala University and Royal Institute of Technology (Stockholm), December 2013. In total around 30 people attended the talks, which stimulated a good discussion on the proposed methods.

Dr Sootla establish good contacts with research groups in Stockholm and Uppsala
Year(s) Of Engagement Activity 2013
 
Description Invited talk at Information, probability and inference in systems biology, Edinburgh, UK, July 15-17, 2013. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Type Of Presentation poster presentation
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Stimulated discussions and novel research questions.

Increased awarness of the novel methods proposed by our research group
Year(s) Of Engagement Activity 2013
 
Description Scientific Committee for Work Package 3 ("Community Building") of the European Research Area Network on Synthetic Biology (ERASynBio Twinning programme), EU FP7, 2012-2013 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Member of the Scientific Committee for Work Package 3 ("Community Building") of the European Research Area Network on Synthetic Biology (ERASynBio Twinning programme), EU FP7, 2012-2013 .

Around 15-20 twinning projects involving a minimum of 3 research institutions in different countries have been awarded through this activity
Year(s) Of Engagement Activity 2012
 
Description Taking a Forward-Engineering Approach to the Design of Synthetic Biology Systems, GARNet Synthetic Biology Workshop, 21-22 May, 2013. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact GARNet Synthetic Biology Workshop, 21-22 May, 2013.

Invited by Ruth Bastow and Charis Cook.

Invited Speaker by the organisers
Year(s) Of Engagement Activity 2013
 
Description Taking a Systems Control Approach in Biology : exogenous and endogenous control of biological systems, Department of Mathematics and Statistics, University of Reading, March 20th, 2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Department of Mathematics and Statistics, University of Reading, March 20th, 2013. Invited by Marcus Tindall.

Invited Speaker for Departmental Seminar
Year(s) Of Engagement Activity 2013
 
Description Taking a Systems Control Approach in Biology, Dept. d'Enginyeria de Sistemes i Automatica, Universitat Politecnica de Valencia, Spain, April 11th, 2013. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dept. d'Enginyeria de Sistemes i Automatica, Universitat Politecnica de Valencia, Spain, April 11th, 2013.

Invited by Prof Jesús Andrés Picó Marco.

Invited Plenary Speaker by the organisers
Year(s) Of Engagement Activity 2013
 
Description Taking a Systems Control Approach in Biology, Invited Plenary Speaker, "Design, optimization and control in systems and synthetic biology" international workshop, École Normale Supérieure, Paris, June 11-12, 2012. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Invited Plenary Speaker, "Design, optimization and control in systems and synthetic biology" international workshop, École Normale Supérieure, Paris, June 11-12, 2012.

Invited by Dr Grégory Batt.

Invited Plenary Speaker by the organising committee. This was the very first international workshop for world-leading experts working at the intersection of systems design, control engineering and systems and synthetic biology.
Year(s) Of Engagement Activity 2012
URL http://contraintes.inria.fr/~batt/workshop_2012/guy-bart_stan.html
 
Description Workshop on Decision Making in Nature. Imperial College London, UK, May 2-4, 2013. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation poster presentation
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Stimulated discussions and novel research ideas


increased awarness of the novel methods proposed by our research group
Year(s) Of Engagement Activity 2013