The Parameter Optimisation Problem: Addressing a Key Challenge in Computational Systems Biology

Lead Research Organisation: University of Exeter
Department Name: Computer Science

Abstract

In recent years, advances in both the physical and life sciences have increasingly come from the collaborations of researchers across disciplines, and the development and use of tools from a range of areas. A prototypical example of this interdisciplinary approach to science is systems biology, the field concerned with quantifying how the interaction of individual system components control biological function and behaviour. Systems biology has become increasingly quantitative, with a shift from diagrammatic representations of interaction networks to sets of mathematical equations that model (i.e. simulate) how the concentrations of molecular species vary with time. A key advantage of such models is that they can be used to predict how the networks they represent will respond to specific perturbations, such as changes in environmental conditions (e.g. temperature) or the addition of pharmacological agents. The ability to easily generate such predictions reduces the need for large numbers of expensive and time-consuming experiments.

However, the more complex a biological network is, the more complex the corresponding model needs to be, and the greater the range of possible biological behaviours that can be exhibited. This means that extensive computer simulations are needed to adjust the parameters controlling the model so as to accurately reproduce (i.e. fit) the experimental behaviour observed. For biologically realistic models which can involve hundreds of different molecular species, the number of simulations required to adjust the parameters of a given model to achieve the optimal fit to data can be prohibitively large, far exceeding that which is possible on practical timescales. Thus, for the predictive power of mathematical models to be fully realised in the systems biology domain, methods are required that allow this parameter optimisation procedure to be carried out in a computationally efficient manner.

The proposed project will address this need by bringing state-of-the-art methods from computer science to bear on the problem, which have been successfully applied previously to highly parametrised problems like aircraft conflict alert systems, design optimisation of lightweight materials and routing of mesh sensor networks (amongst others). In addition, we propose to develop new methods specifically engineered for the systems biology domain that can provide insight into model behaviour, beyond simply returning a single estimate of the best fit parametrisation (e.g. methods for identifying parameters yielding equally good fits to data, and also parameters which simultaneously fit the model to data generated in diverse experimental conditions). As part of this, we will develop a package of open source software tools that will be embedded within a software infrastructure designed for systems biologists, enabling the methods developed in this work to be readily applied to problems in the field that are currently computationally intractable.

To test and refine the algorithms developed, they will be applied to the gene network that generates circadian oscillations (the circadian clock) in the key plant species Arabidopsis thaliana, for which high-quality experimental data recorded in a range of genetic and environmental backgrounds is available, together with a suite of mathematical models of varying complexity. As part of this work, biochemically detailed models of the clock will be directly fitted to multiple experimental datasets for the first time, yielding models with greater predictive power. Many processes critical for plant growth and reproduction are regulated by the clock (e.g. photosynthesis and flowering time). In the long term, the ability to optimise plant models of increasing complexity with the class of methods we will develop here may thus help predict how the viability of economically important crop species will be affected by future temperature shifts resulting from climate change.

Planned Impact

Economic Impact

Systems biology is one of the UK government's core research themes, with significant potential in economically critical areas such as medicine, biofuels and crop breeding. The Royal Academy of Engineering and the Academy of Medical Sciences have described systems biology as a vehicle for "advancing knowledge and building the nation's wealth", highlighting the construction of predictive mathematical models of complex dynamical networks as a fundamental objective. For the field to be successful in exploiting mathematical modelling approaches to understand and design biological systems, it is critical to develop model-fitting techniques that can: (i) cope with highly parametrised systems (i.e. large numbers of biochemical species); and (ii) combine the information provided by multiple experimental datasets. This project addresses this fundamental need, and although the empirical evaluation of the project outputs focuses on circadian clock networks, the algorithms developed and software released can be broadly applied across the systems biology domain. The workshops delivered as part of the project will explicitly encourage the uptake of the developed tools by industrial and academic researchers from diverse application areas.

The outcomes of the proposed work will therefore impact on the ability of systems biology researchers in academia and industry to produce technologies that deliver a more sustainable and healthy future. As a specific example of this potential impact, recent results show that plant breeders have unwittingly modified several clock genes in major crops, including wheat and barley, over hundreds to thousands of years. This process progressively adapted flowering and harvest times to more northerly latitudes, allowing the cereal crops to spread across Europe from the Near East, and by extension facilitated the population expansion in these areas. The clock genes may well be needed again, as climate change brings new combinations of temperature and photoperiod, affecting phonological timing in both agricultural and natural ecosystems. It is anticipated that the optimisation methods developed during the project will be subsequently employed to construct large-scale temperature-dependent models of the plant clock that will be able to simulate such genetic and environmental changes. These models may thus help predict how future climate shifts will affect the ability of crops to survive, grow and reproduce, and the corresponding optimal genetic modifications for sustaining viable yields.

There are therefore many potential economic beneficiaries of this research. Companies (and universities) working in the medical, energy and agricultural sectors can exploit the outputs of this project to fit and interrogate their models, as part of their product development. Governments and the public will benefit in areas such as food security and energy security (through e.g. improvements in crop yield/resistance and biofuels). Consumers will benefit in a range of areas, from cheaper food and fuel to advances in health.

Social Impact

To engage groups outside academia and industry with the research, public engagement activities will be organised to highlight the societal and economic implications of the project outputs, and, more broadly, the key role played by mathematics and computer science in addressing 21st century challenges. In addition, project results will be incorporated into taught undergraduate and postgraduate courses at Exeter and Edinburgh, highlighting the extent to which computational techniques are crucial to cutting-edge research in systems biology. Students on these modules will also benefit from being exposed to the various job opportunities that exist in the biomedical and biotechnology sectors for graduates from the physical sciences. Outreach activities will also be organised for local schoolchildren in Exeter, showcasing how mathematics is used to model natural phenomena.

Publications

10 25 50
 
Description The field of systems biology is concerned with quantifying how the interaction of individual system components control biological function and behaviour. Systems biology has become increasingly quantitative, with a shift from diagrammatic representations of interaction networks to sets of mathematical equations that model (i.e. simulate) how the concentrations of molecular species vary with time. A key advantage of such models is that they can be used to predict how the networks they represent will respond to specific perturbations, such as changes in environmental conditions (e.g. temperature) or the addition of pharmacological agents. The ability to easily generate such predictions reduces the need for large numbers of expensive and time-consuming experiments. However, the more complex a biological network is, the more complex the corresponding model needs to be, and the greater the range of possible biological behaviours that can be exhibited. This means that extensive computer simulations are needed to adjust the parameters controlling the model so as to accurately reproduce (i.e. fit) the experimental behaviour observed.

This project was focused on identifying and developing effective parameter optimisation tools for these type of problems, and determining the properties of their search landscape. So far, over 20 peer-reviewed publications have been published or accepted as a result of the project. Some key findings covered by these works include:


- For the Gene Regulatory Network (GRN) models considered, we have found their parameterisation landscapes are highly multi-modal, and the basin sizes are not well-correlated with fitness.

- Evolutionary computing methods have been shown to be well-suited to the challenging problem of fitting GRN models to experimental data, compared with traditional approaches.

- Posing GRN optimisation as a multi-objective problem can make the task easier to tackle.

- Local Optima Networks (LONs), a graph representation of an optimisation problem provide a compact, readily interpretable summary of a search problem, and as such are a very useful tool in model-fitting. We have developed and presented a novel extension of the LON framework to the multi-objective problem domain as part of the project, as well as showing how GRN landscapes `look' like via the LON representation.

- We have developed a new approach for robust optimisation, which is competitive with the state-of-the-art robust multi-objective optimisation in terms of performance, and substantially less computationally expensive in the process it undertakes to select the next design(s) to evaluate. This is particularly well-suited to GRN optimisation where the function evaluation cost does not swamp the algorithmic aspects.


In addition a number of open source software tools/packages have been released, permitting other research groups and industry to exploit these results and others. The two postdoctoral researchers hired for the project have both exploited the skills development afforded by the project: one appointed to a research position in industry working in the machine learning and optimisation field, and the other appointed as a university lecturer to an optimisation group within a computer science department at a research-led institution in the UK.
Exploitation Route The various software open software tools developed can be utilised by researchers and practitioners in industry working on multimodal, robust, and multi-objective optimisation - in particular, computational biologists interested in GRN modelling. More generally, we anticipate that the findings of the project will (i) stimulate interest within the computational biology community in the use of evolutionary optimisation and landscape analysis as analytical tools; and (ii) highlight to the optimisation community the utility of computational biology models, as both benchmark functions and real-world exemplars of highly multi-modal and complex optimisation tasks.

The research discovery and frameworks (landscape analysis, experimental results, etc.) will be of use to computational biology researchers when identifying appropriate optimisation and visualisation techniques for their problems.
Sectors Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://pop-project.ex.ac.uk
 
Description Three workshops involving academia and industry on the topic of evolutionary algorithms for problems with uncertainty were organised as part of this project (later ones were hybrid with the impact of the pandemic). The presentation of outputs from this project have also appeared in journals, conferences and workshops, and have amassed approaching 200 citations so far. Six separate software tools (and version controlled code repositories) have also been released providing open source implementations of various outputs, and facilitating their use in industry and academia. A number of the outputs from the problem are cross-cutting across the sciences, and have gained traction in a range of locations. For example, work on efficient LON generation has been used for instance in the machine learning domain for feature selection: Mostert, W., Malan, K.M., Ochoa, G., Engelbrecht, A.P. (2019). Insights into the Feature Selection Problem Using Local Optima Networks. In: Liefooghe, A., Paquete, L. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2019. Lecture Notes in Computer Science, vol 11452. Springer, Cham. and our work which identified computational inefficiencies in some multi-objective data structures and released a package implementing a range of data structures for the task is already being leveraged by other teams looking at optimisation problems with large numbers of competing objectives: Allmendinger, R., Jaszkiewicz, A., Liefooghe, A., Tammer, C. (2022) What if we increase the number of objectives? Theoretical and empirical implications for many-objective combinatorial optimization, Computers & Operations Research, Volume 145 Our output on population-based local optima networks and multi-objective local optima networks is opening up new avenues of research on network visualisation on optimisation landscapes. Again, this is cross-cutting research that benefits the optimisation task independent of the application domain, and we are looking to develop this further for use in industry tools. This includes ongoing work with our collaborator Hydro International to aid understanding of the optimisation landscapes experienced in their computational fluid dynamics based design pipeline for stormwater treatment and flow control systems. A range of other groups have built on the work on Boolean clock models papers released as part of this project. The recent release the BDETools software package which should further support the uptake in the computational biology community and related industry of outputs from the project.
First Year Of Impact 2023
Sector Construction,Environment
Impact Types Economic

 
Title BDETools 
Description Systems of Boolean Delay Equations (BDEs) - in which time is continuous but state is binary - are capable of generating surprisingly complex behaviour, despite their apparent simplicity. In addition to simulating convergence to steady states, BDEs can also generate periodic and quasiperiodic oscillations, : frequency locking and even chaotic dynamics. Furthermore, the enumerability of the Boolean update functions and the compact parametrisation resulting from discretisation means that BDE systems can be readily leveraged to generate low-level descriptions of physical systems, from which more quantitative model formulations (e.g. differential equations) can be constructed. The utility of BDE modelling in this regard has been demonstrated in several fields, including computational biology and climate science, but the use of BDEs is still primarily restricted to a few research laboratories. In order to facilitate the wider adoption of the BDE formalism by the computational modelling community, BDEtools has been developed to enable researchers to construct, solve and analyse BDE systems in a straightforward fashion. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Just released, no impacts to report yet 
URL https://www.liebertpub.com/doi/abs/10.1089/cmb.2021.0658
 
Title Data structures for non-dominated sets 
Description A package containing various specialised multi-objective data structures in Java. Package includes implementations of 8 of the ten distinct data structures described in the literature for this task over the last two decades. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Currently picking up references and use in community -- will keep track and list at next return. 
URL https://doi.org/10.1145/3377930.3390150
 
Title Distance-based visualisable many-objective problem instance generator (Matlab) 
Description A generator for problem many-objective instances which are visualisable and comprehensible. Incorporates the ability to embed a range of (scalable) problem features in the instance generated, and underpins robust algorithm testing and assessment. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Currently picking up references and use in community -- will keep track and list at next return. 
URL https://ore.exeter.ac.uk/repository/handle/10871/36824
 
Title Efficient Real-Time Hypervolume Estimation Codes (Java) 
Description The huyervolume indicator is a popular quality indicator in multi- and many-objective optimisation. It's popularity stems from its Pareto-compliant properties. The software enables the real-time estimation of this indicator, orders of magnitude faster than the previous state-of-the-art. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Only recently released --- likely to be integrated into main multi-objective optimisation platforms, but not yet integrated. 
URL https://ore.exeter.ac.uk/repository/bitstream/handle/10871/36825/fieldsend_gecco_2019_hypervolume.pd...
 
Title Local Optima Network Generator (Java package) 
Description Software package provides core tool and interfaces for generating Local Optima Networks for fitness landscape analysis, using either exact or approximate methods. (Generic) implementation in the Java language. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact I have been made aware that a team from the Department of Decision Sciences, University of South Africa (led by Dr Malan) is using this library for investigating feature selection problems, and also to pipeline with visualisation code from the University of Stirling (Prof. Ochoa). 
URL https://github.com/fieldsend/local_optima_networks
 
Title Simple multi-objective local optima network generator (Matlab) 
Description Code to reproduce example multi-objective local optima networks, corresponding to publication in this area. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact None as yet. 
URL http://hdl.handle.net/10871/36891
 
Description Organised the Workshop on Evolutionary Algorithms for Problems with Uncertainty (at GECCO 2020 in Cancun [virtual conference]) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Project team (plus Prof Juergen Branke from the University of Warwick) organised the second Workshop on Evolutionary Algorithms for Problems with Uncertainty, disseminating new work on coping with uncertainty and noise in optimisation. Workshop was a success with 60-70 academics and practitioners participating, and will be running again in 2021.
Year(s) Of Engagement Activity 2020
URL http://eapwu.ex.ac.uk/
 
Description Organised the Workshops on Evolutionary Algorithms for Problems with Uncertainty (at GECCO 2018 in Kyoto) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Project team (plus Prof Juergen Branke from the University of Warwick) organised the first Workshop on Evolutionary Algorithms for Problems with Uncertainty, disseminating new work on coping with uncertainty and noise in optimisation. Workshop was a success with roughly 40 academics and practitioners participating, and will be running again in 2019.
Year(s) Of Engagement Activity 2018,2019
URL http://eapwu.ex.ac.uk/
 
Description Organised the Workshops on Evolutionary Algorithms for Problems with Uncertainty (at GECCO 2019 in Prague) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Project team (plus Prof Juergen Branke from the University of Warwick) organised the second Workshop on Evolutionary Algorithms for Problems with Uncertainty, disseminating new work on coping with uncertainty and noise in optimisation. Workshop was a success with 60-70 academics and practitioners participating, and will be running again in 2020.
Year(s) Of Engagement Activity 2020
URL http://eapwu.ex.ac.uk/