Discovering rare, extreme behaviour in large-scale computational models

Lead Research Organisation: IBM (United Kingdom)
Department Name: Research UK

Abstract

The construction of high-fidelity digital models of complex physical phenomena, and more importantly their deployment as investigation tools for science and engineering, are some of the most critical undertakings of scientific computing today. Without computational models, the study of spatially-irregular, multi-scale, or highly coupled, nonlinear physical systems would simply not be tractable.

Even when computational models are available, however, tuning their physical and geometrical parameters (sometimes referred to as control variables) for optimal exploration and discovery is a colossal endeavour. In addition to the technological challenges inherent to massively parallel computation, the task is complicated by the scientific complexity of large-scale systems, where many degrees of freedom can team up and generate emergent, anomalous, resonant features which get more and more pronounced as the model's fidelity is increased (e.g., in turbulent scenarios). These features may correspond to highly interesting system configurations, but they are often too short-lived or isolated in the control space to be found using brute-force computation alone. Yet, most computational surveys today are guided by random (albeit somewhat educated by instinct) guesses.

The potential for missed phenomenology is simply unquantifiable. In many domains, anomalous solutions could describe life-threatening events such as extreme weather. A digital model of an industrial system may reveal, under special conditions, an anomalous response to the surrounding environment, which could lead to decreased efficiency, material fatigue, and structural failure. Precisely because of their singular and catastrophic nature, as well as infrequency and short life, these configurations are also the hardest to predict. Any improvement in our capacity to locate where anomalous dynamics may unfold could therefore tremendously impact our ability to protect against extreme events. More fundamentally, establishing whether the set of equations implemented in a computational model is at all able to reproduce specific, exotic solutions (such as rare astronomical transients [1]) for certain configuration parameters can expose (or exclude) the manifestation of new physics, and shed light on the laws that govern our Universe.

Recently, the long-lived but sparse attempts [2] to instrument simulations with optimisation algorithms have grown into a mainstream effort. Current trends in Intelligent-Simulation orchestration stress the need to instruct the computational surveys to learn from previous runs, but they do not address the question of which information it would be most valuable to extract. A theoretical formalism to classify the information processed by large computational models is simply absent. The main objective of this project is to develop a roadmap for the definition of such a formalism.

The key question is how one can optimally learn from large computational models. This is a deep, overarching issue affecting experimental as well as computational science, and has been recently proven to be an NP hard problem [3]. Correspondingly, the common approach to simulation data reduction is often pragmatic rather than formal: if solutions with specific properties (such as a certain aerodynamic drag coefficient) are sought, those properties are directly turned into objective functions, taking the control variables as input arguments. This is reasonable when these properties depend only mildly on the input; in the case of anomalous solutions, however, this is often not the case, so one wonders whether more powerful predictors of a simulation's behaviour could be extracted from other, apparently unrelated information contained in the digital model. If so, exposing this information to the machine-learning algorithms could arguably lead to more efficient and exhaustive searches. The investigation of this possibility is the core task that this project aims to undertake.

Planned Impact

Our technological society is predicated on the ability to use the scientific method to predict the future. The computational techniques to study and experiment with physical systems in a virtual, controlled setting have augmented our research toolbox with the game-changing ability to bring to life, in principle, any system satisfying any type of evolution equation. In this sense, computational science is the ultimate Gedankenexperiment machine.

Advances in computational science have therefore the potential to benefit the modelling of physical systems across the spectrum. Numerical models are routinely employed to model new particles and chemical compounds, discover new drugs, and predict traffic patterns and the weather. Digital twinning, or the process of building a computational model that captures all the relevant features of a given physical system, is widely adopted in industry to avoid direct experimentation which would be too expensive or hazardous. All these applications share common tools and practices, which the present research proposes to augment and streamline.

The core innovation outlined in this program is based on the observation that our potential to find elusive phenomena, predict natural disasters, and gain insight into other extreme events through computation is limited only by our ability to discover these regimes within a computational model. This project addresses the searchability question directly, by carrying out an investigation of the key aspect of this process: how can AI tools better understand physical simulations, and how can they optimally drive them? With these general answers available, the acceleration of any computational model becomes possible, and fields as diverse as chemical engineering and aerodynamic design, astrophysics and energy generation can benefit from unprecedented insight at a fraction of the cost.
 
Description The first two years of this fellowship have been spent on fundamental investigations of application-agnostic search strategies for extreme solutions. We have now formulated two key proposals which respond to the criteria set out in the project proposals, and testing is under way.
Exploitation Route The proposed search methods, if successful, can be applied to any phenomenon that can be modelled as a dynamical system. This spans countless sectors including weather, energy, and healthcare.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy,Transport

 
Description Participation in the "Digital Twins of the Earth" roundtable, organized by BEIS
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Title Dataset of simulations of wave dynamics near coastal region with parametrized levee 
Description Dataset of 12006 simulations used in the article E. Bentivegna, "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq Equation" (accepted as a workshop paper at ICLR 2021 SimDL Workshop). The dataset contains 12006 simulations of coastal wave dynamics around a levee, of sinusoidal shape parametrized by its amplitude AMP and wave number N. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset represents a class of simplified coastal-wave dynamics with different levee geometries. It can be used as a testbed for different model-reduction techniques in order to understand the principal modes of wave dynamics in this scneario and speed up levee design. This task is a prototype to understand how large-scale simulations can be better represented and sampled. The dataset was used in the publication "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq equation" (E. Bentivegna, simDL @ ICLR 2021, https://simdl.github.io/files/8.pdf). 
URL https://zenodo.org/record/4728023
 
Title Datasets of solutions of the 1D Korteweg-de Vries equation 
Description Datasets of 50,000 solutions of the 1D Korteweg-de Vries equation, for different amplitudes, phases and frequencies of the initial data. The data has been obtained via direct numerical integration. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This dataset has enabled the first tests of the extreme-event strategy pioneered by this grant. 
URL https://zenodo.org/record/7457629
 
Description Chairing the IBM Professional Interest Community in Dynamical Systems 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I have co-founded the IBM Professional Interest Community in Dynamical Systems, and been appointed as its Chair. I conduct discussions and initiatives within this group, and advocate for the role of Dynamical Systems within the IBM Research agenda.
Year(s) Of Engagement Activity 2022,2023
 
Description Journal Club in Geometric Deep Learning 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact We organized a journal club series on Geometric Deep Learning, bringing together researchers from several IBM research labs around the world to discuss publications and listen to external speakers on this emerging topic.
Year(s) Of Engagement Activity 2022
 
Description Leading the IBM Open Source Science (OSSci) Interest Group in Climate & Sustainability 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I lead a thematic Interest Group within IBM's Open Source Science initiative. This activity aims to build networks of practitioners in different disciplines, interested in using open-source software and data to advance science. More at https://opensource.science.
Year(s) Of Engagement Activity 2022,2023
URL https://opensource.science