CoSInES (COmputational Statistical INference for Engineering and Security)

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

There are tremendous demands for advanced statistical methodology to make scientific sense of the deluge of data emerging from the data revolution of the 21st Century. Huge challenges in modelling, computation, and statistical algorithms have been created by diverse and important questions in virtually every area of human activity. CoSInES will create a step change in the use of principled statistical methodology, motivated by and feeding into these challenges.

Much of our research will develop and study generic methods with applicability in a wide-range of applications. We will study high-dimensional statistical algorithms whose performance scales well to high-dimensions and to big data sets. We will develop statistical theory to understand new complex models stimulated from applications. We will produce methodology tailored to specific computational hardware. We will study the statistical and algorithmic effects of mis-match between data and models. We shall also build methodology for statistical inference where privacy constraints mean that the data cannot be directly accessed.

CoSInES willl also focus on two major application domains which will form stimulating and challenging motivation for our research: Data-centric engineering, and Defence and Security. To maximise the impact and speed of translation of our research in these areas, we will closely partner the Alan Turing Institute which is running large programmes in these areas funded respectively by the Lloyd's Register Foundation and GCHQ.

Data is providing a disruptive transformation that is revolutionising the engineering professions with previously unimagined ways of designing, manufacturing, operating and maintaining engineering assets all the way through to their decommissioning. The Data centric engineering programme (DCE) at the Alan Turing Institute is leading in the design and operation of the worlds very first pedestrian bridge to be opened and operated in a major international city that will be completely 3-D printed. Fibre-optic sensors embedded in the structure will provide continuous streams of data measuring the main structural properties of the bridge. Unique opportunities to monitor and control the bridge via "digital twins" are being developed by DCE and this is presenting enormous challenges to existing applied mathematical and statistical modelling of these complex structures where even the bulk material properties are unknown and certainly stochastic in their values. A new generation of numerical inferential methods are being demanded to support this progress.

Within the Defence and Security domain, there are many statistical challenges emerging from the need to process and communicate big and complex data sets, for example within the area of cyber-security. The virtual world has emerged as a dominant global marketplace within which the majority of organisations operate. This has motivated nefarious actors - from "bedroom hackers" to state-sponsored terrorists - to operate in this environment to further their economic or political ambitions. To counter this threat, it is necessary to produce a complete statistical representation of the environment, in the presence of missing data, significant temporal change, and an adversary willing to manipulate socio and virtual systems in order to achieve their goals.

As a second example, to counter the threat of global terrorism, it is necessary for law-enforcement agencies within the UK to share data, whilst rigorously applying data protection laws to maintain individuals' privacy. It is therefore necessary to have mathematical guarantees over such data sharing arrangements, and to formulate statistical methodologies for the "penetration testing" of anonymised data.

Planned Impact

Academic impact of the project will be achieved by standard mechanisms: publication, software development, conference presentations, and highlighting activities on the project website. Academic beneficiaries of this reach will include statisticians working on theory and methodology as well as a wide range of application areas. Academics outside statistics will also benefit from the methodology and software created within the project.

Engineers will benefit from the research in Objective 7 which will create a principled statistical framework for Data-centric Engineering. In turn, the government, commercial companies and the public will benefit from improved reliability of engineering structures and the economies and improved productivity created as a result of the improved scientific understanding accessed through our research. Research in this area will be rapidly disseminated to the Engineering community through the Turing Data-centric Engineering pogramme, through translational activities organised by CoSInES (such as our Impact and Innovation Showcase days), and through the bespoke software.

Through the research in Objective 8, government, commercial companies and the public will benefit from improved cyber-security and the extra security afforded through improved data-sharing efficiency of law-enforcement agencies. Through the Alan Turing Institute's Defence & Security Programme, the output of this research will directly impact the operational sectors of the UK's defence and security function, through the deployment of bespoke software, and the furthering of the statistical knowledge of the UK Government's intelligence analysts. We will also organise Impact and Innovation Showcase days focused in this area.

Publications

10 25 50
publication icon
Andrieu Christophe (2018) Hypocoercivity of Piecewise Deterministic Markov Process-Monte Carlo in arXiv e-prints

publication icon
Rendell Lewis J. (2018) Global consensus Monte Carlo in arXiv e-prints

publication icon
Dunlop Matthew M. (2018) How Deep Are Deep Gaussian Processes? in JOURNAL OF MACHINE LEARNING RESEARCH

publication icon
Chimisov Cyril (2018) Adapting The Gibbs Sampler in arXiv e-prints

publication icon
Chimisov Cyril (2018) Air Markov Chain Monte Carlo in arXiv e-prints

publication icon
Zanella Giacomo (2018) Scalable Importance Tempering and Bayesian Variable Selection in arXiv e-prints

publication icon
Livingstone S (2019) Kinetic energy choice in Hamiltonian/hybrid Monte Carlo in Biometrika

publication icon
Laurini Fabrizio (2019) Evaluation of extremal properties of GARCH(p,q) processes in arXiv e-prints

publication icon
Wang A (2019) Theoretical properties of quasi-stationary Monte Carlo methods in The Annals of Applied Probability

publication icon
Mider Marcin (2019) Simulating bridges using confluent diffusions in arXiv e-prints

publication icon
Schmon Sebastian M (2019) Bernoulli Race Particle Filters in arXiv e-prints

publication icon
Angeli Letizia (2019) Limit theorems for cloning algorithms in arXiv e-prints

publication icon
Middleton Lawrence (2019) Unbiased Smoothing using Particle Independent Metropolis-Hastings in arXiv e-prints

 
Description 20th century computational statistical methodologies (eg based around MCMC, SMC and variants) bring complex problems within the grasp of classical and Bayesian model-based paradigms. However, today's complex problems pose immense challenges for principled statistical methods. CoSInES is working to bridge the gap between Statistical Science and the most challenging inferential problems posed by Data Science. There is a particular focus on problems emanating from Engineering and Security and a number of key application challenges are under investigation, including methodology for "digital twins" within data-centric engineering and anomaly detection in internet traffic signals.

Breakthroughs have been made in a number of areas. Here we highlight some of these:
1. The introduction of non-reversible Markov chain Monte Carlo algorithms and related methods for Bayesian and related inference in a wide range of scientific problems.
2. The development of theory to underpin the use of Markov chain Monte Carlo and related algorithms in Statistics.
3. Computationally efficient methods for sampling from high-dimensional multimodal distributions.
4. The use of diffusion-based algorithms for statistics, for inference problems under privacy constraints, for optimisation by stochastic gradient decent methods in complex problems with expensive to compute objective functions, and for generative modelling in machine learning.
5. The introduction of a principled framework (called probabilistic numerics) for quantifying uncertainty for solutions to Ordinary and Partial Differential Equations, and their use in engineering, especially via digital twins.
Exploitation Route Most of the work in this project is providing generic methodology and underpinning theory for computational statistics which can be applied to virtually all areas where rigorous statistical methodology is used. The more applied work focusing in data-centric engineering and security should have benefits for researchers in this application areas as well as implications for a wide range of engineering projects.
Sectors Construction,Digital/Communication/Information Technologies (including Software),Security and Diplomacy

URL https://www.cosines.org
 
Description Impacts of the work of CoSinES have been broad. Since most of the work is theoretical in nature, most of the impact is currently academic. However we highlight some areas of nn-academic interest here. The paper V. De Bortoli, E. Mathieu, M. Hutchinson, J. Thornton, Y.W. Teh & A. Doucet, "Riemmanian Score-Based Generative Modeling", NeurIPS 2022 was awarded Outstanding Paper Award (one of 13 from 8000 submissions). the paper makes a major breakthroughs in the computational efficiency of generative modelling techniques in machine leaning. This work has had a substantial and rapid impact, for example in being used to discover new synthetic proteins. This work has been reported in the New York Times https://www.nytimes.com/2023/01/09/science/artificial-intelligence-proteins.html?searchResultPosition=1 . Two major success of the grant has been in the area of non-reversible dynamics for MCMC, and in MCMC for multi-modal distributions. Many contributions have emerged from these areas. Saifuddin Syed and Arnaud Doucet published "Non-reversible Parallel Tempering: A Scalable Highly Parallel MCMC Scheme", Journal of the Royal Statistical Society Series B, vol. 84, no. 2, pp. 321--350, 2022, a landmark paper on a new sampling algorithm for multi-modal distributions using non-reversible MCMC techniques in JRSS B. The methodology developed in this paper has been used to generate high-resolution images of the M87 black hole (https://iopscience.iop.org/article/10.3847/2041-8213/abe71d/pdf) and the first picture of the Milky Way monster (https://iopscience.iop.org/article/10.3847/2041-8213/ac6429).
First Year Of Impact 2022
Sector Manufacturing, including Industrial Biotechology
Impact Types Societal

 
Description Intractablelikelihood: New challenges from modern applications (i-like)
Amount £2,369,503 (GBP)
Funding ID EP/K014463/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2013 
End 12/2017
 
Description CoSinES with the Turing Data Centric Engineering programme 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution A postdoc is funded from the Turing DCE programme to work on the interface between CoSinES methodology and engineering problems. This part of the programme is managed by Mark Girolami.
Collaborator Contribution This work has developed principled and scalable Bayesian methods for inference for differential equation models decomposed as finite element methods. It has also developed sequential Monte Carlo methods for these models. the work is being applied to a variety of problems in material science including investigating the properties of 3-d printer metal.
Impact Akyildiz D, Duffin Connor, Sabanis Sotirios, Girolami Mark, (2021). Statistical Finite Elements via Langevin Dynamics. arXiv e-prints, pp. arXiv:2110.11131 Duffin Connor, Cripps Edward, Stemler Thomas, Girolami Mark, (2021). Low-rank statistical finite elements for scalable model-data synthesis. arXiv e-prints, pp. arXiv:2109.04757 Boustati Ayman, Akyildiz D, Damoulas Theodoros, Johansen Adam M., (2020). Generalized Bayesian Filtering via Sequential Monte Carlo. arXiv e-prints, pp. arXiv:2002.09998
Start Year 2020
 
Description CoSinES with the Turing Data Centric Engineering programme 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution A postdoc is funded from the Turing DCE programme to work on the interface between CoSinES methodology and engineering problems. This part of the programme is managed by Mark Girolami.
Collaborator Contribution This work has developed principled and scalable Bayesian methods for inference for differential equation models decomposed as finite element methods. It has also developed sequential Monte Carlo methods for these models. the work is being applied to a variety of problems in material science including investigating the properties of 3-d printer metal.
Impact Akyildiz D, Duffin Connor, Sabanis Sotirios, Girolami Mark, (2021). Statistical Finite Elements via Langevin Dynamics. arXiv e-prints, pp. arXiv:2110.11131 Duffin Connor, Cripps Edward, Stemler Thomas, Girolami Mark, (2021). Low-rank statistical finite elements for scalable model-data synthesis. arXiv e-prints, pp. arXiv:2109.04757 Boustati Ayman, Akyildiz D, Damoulas Theodoros, Johansen Adam M., (2020). Generalized Bayesian Filtering via Sequential Monte Carlo. arXiv e-prints, pp. arXiv:2002.09998
Start Year 2020
 
Title ccpdmp 
Description R package that implements adaptive concave-convex sampling for PDMP samplers. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact None 
URL https://github.com/matt-sutton/ccpdmp