Biological Network Reconstruction and Equation Inference with Hidden Nodes

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Biological systems are highly complex and usually involve components which cannot be directly measured. These systems can be modelled using networks, with nodes representing variables and edges their interactions. This project will start from methodology developed by the Isambert group at Institut Curie, Paris called temporal multivariate information-based inductive causation (tMIIC) which takes time series data and constructs a causal network. A key feature of tMIIC is its ability to identify latent causal factors, what we will call 'hidden nodes'.
This project can be split into three main subsections. Firstly, tMIIC will be benchmarked on its ability to reconstruct hidden nodes. Here, toy models with known network structures are used to generate trajectories. These are given to tMIIC which should be able to reconstruct the network. A trajectory corresponding to a variable will then be omitted, and tMIIC will be expected to identify the presence of the hidden node. Its performance will be assessed and can be used as a benchmark.
Once tMIIC has been used to infer a network to uncover the interactions of the system, we plan to use a generative modelling approach to infer the equations which describe them. This involves the use of two machine learning models: a generator and a discriminator. The objective of the generator is to generate trajectories of the model which are indistinguishable from the true data. The discriminator is a classifier and is trained simultaneously, with the objective of distinguishing true trajectories from those produced by the generator. Initially, the dynamic form of the equations will be assumed as Langevin type stochastic equations which couple the observed and hidden dynamics. A starting point would be to use linear equations where the interaction matrix is determined by the adjacency matrix of the learned network structure. Depending on the system being modelled, different functional forms for terms in the equations (such as mass-action and Michaelis-Menten type kinetics) can be adapted and used. An extension to this part of the project is to use neural ODEs to learn appropriate functions for the equations. Neural ODEs are models which use neural networks trained on observational data to specify the dynamics of the system of interest.
Once this methodology has been developed, we plan to apply the techniques to a dataset of live cell imaging microscopy from an ex-vivo tumour ecosystem. This is a technology which allows cancer cells to grow in the presence of components found where they normally grow (what is known as the tumour microenvironment). Specifically, the data that we will be looking at considers the effect of immune cells and cancer associated fibroblasts (CAFs). This data comes from experimental collaborators in the Parrini group at Institut Curie, Paris. A preliminary literature search found a gap in published models for this system, so novel model design will be required here.
This project falls within the EPSRC biological informatics and mathematical biology research themes.

Planned Impact

Probabilistic modelling permeates the Financial services, healthcare, technology and other Service industries crucial to the UK's continuing social and economic prosperity, which are major users of stochastic algorithms for data analysis, simulation, systems design and optimisation. There is a major and growing skills shortage of experts in this area, and the success of the UK in addressing this shortage in cross-disciplinary research and industry expertise in computing, analytics and finance will directly impact the international competitiveness of UK companies and the quality of services delivered by government institutions.
By training highly skilled experts equipped to build, analyse and deploy probabilistic models, the CDT in Mathematics of Random Systems will contribute to
- sharpening the UK's research lead in this area and
- meeting the needs of industry across the technology, finance, government and healthcare sectors

MATHEMATICS, THEORETICAL PHYSICS and MATHEMATICAL BIOLOGY

The explosion of novel research areas in stochastic analysis requires the training of young researchers capable of facing the new scientific challenges and maintaining the UK's lead in this area. The partners are at the forefront of many recent developments and ideally positioned to successfully train the next generation of UK scientists for tackling these exciting challenges.
The theory of regularity structures, pioneered by Hairer (Imperial), has generated a ground-breaking approach to singular stochastic partial differential equations (SPDEs) and opened the way to solve longstanding problems in physics of random interface growth and quantum field theory, spearheaded by Hairer's group at Imperial. The theory of rough paths, initiated by TJ Lyons (Oxford), is undergoing a renewal spurred by applications in Data Science and systems control, led by the Oxford group in conjunction with Cass (Imperial). Pathwise methods and infinite dimensional methods in stochastic analysis with applications to robust modelling in finance and control have been developed by both groups.
Applications of probabilistic modelling in population genetics, mathematical ecology and precision healthcare, are active areas in which our groups have recognized expertise.

FINANCIAL SERVICES and GOVERNMENT

The large-scale computerisation of financial markets and retail finance and the advent of massive financial data sets are radically changing the landscape of financial services, requiring new profiles of experts with strong analytical and computing skills as well as familiarity with Big Data analysis and data-driven modelling, not matched by current MSc and PhD programs. Financial regulators (Bank of England, FCA, ECB) are investing in analytics and modelling to face this challenge. We will develop a novel training and research agenda adapted to these needs by leveraging the considerable expertise of our teams in quantitative modelling in finance and our extensive experience in partnerships with the financial institutions and regulators.

DATA SCIENCE:

Probabilistic algorithms, such as Stochastic gradient descent and Monte Carlo Tree Search, underlie the impressive achievements of Deep Learning methods. Stochastic control provides the theoretical framework for understanding and designing Reinforcement Learning algorithms. Deeper understanding of these algorithms can pave the way to designing improved algorithms with higher predictability and 'explainable' results, crucial for applications.
We will train experts who can blend a deeper understanding of algorithms with knowledge of the application at hand to go beyond pure data analysis and develop data-driven models and decision aid tools
There is a high demand for such expertise in technology, healthcare and finance sectors and great enthusiasm from our industry partners. Knowledge transfer will be enhanced through internships, co-funded studentships and paths to entrepreneurs

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023925/1 01/04/2019 30/09/2027
2748008 Studentship EP/S023925/1 01/10/2022 30/09/2026 Holly Chambers