Discovering rare, extreme behaviour in large-scale computational models

Lead Research Organisation: IBM (United Kingdom)

Department Name: Research UK

Abstract

The construction of high-fidelity digital models of complex physical phenomena, and more importantly their deployment as investigation tools for science and engineering, are some of the most critical undertakings of scientific computing today. Without computational models, the study of spatially-irregular, multi-scale, or highly coupled, nonlinear physical systems would simply not be tractable.

Even when computational models are available, however, tuning their physical and geometrical parameters (sometimes referred to as control variables) for optimal exploration and discovery is a colossal endeavour. In addition to the technological challenges inherent to massively parallel computation, the task is complicated by the scientific complexity of large-scale systems, where many degrees of freedom can team up and generate emergent, anomalous, resonant features which get more and more pronounced as the model's fidelity is increased (e.g., in turbulent scenarios). These features may correspond to highly interesting system configurations, but they are often too short-lived or isolated in the control space to be found using brute-force computation alone. Yet, most computational surveys today are guided by random (albeit somewhat educated by instinct) guesses.

The potential for missed phenomenology is simply unquantifiable. In many domains, anomalous solutions could describe life-threatening events such as extreme weather. A digital model of an industrial system may reveal, under special conditions, an anomalous response to the surrounding environment, which could lead to decreased efficiency, material fatigue, and structural failure. Precisely because of their singular and catastrophic nature, as well as infrequency and short life, these configurations are also the hardest to predict. Any improvement in our capacity to locate where anomalous dynamics may unfold could therefore tremendously impact our ability to protect against extreme events. More fundamentally, establishing whether the set of equations implemented in a computational model is at all able to reproduce specific, exotic solutions (such as rare astronomical transients [1]) for certain configuration parameters can expose (or exclude) the manifestation of new physics, and shed light on the laws that govern our Universe.

Recently, the long-lived but sparse attempts [2] to instrument simulations with optimisation algorithms have grown into a mainstream effort. Current trends in Intelligent-Simulation orchestration stress the need to instruct the computational surveys to learn from previous runs, but they do not address the question of which information it would be most valuable to extract. A theoretical formalism to classify the information processed by large computational models is simply absent. The main objective of this project is to develop a roadmap for the definition of such a formalism.

The key question is how one can optimally learn from large computational models. This is a deep, overarching issue affecting experimental as well as computational science, and has been recently proven to be an NP hard problem [3]. Correspondingly, the common approach to simulation data reduction is often pragmatic rather than formal: if solutions with specific properties (such as a certain aerodynamic drag coefficient) are sought, those properties are directly turned into objective functions, taking the control variables as input arguments. This is reasonable when these properties depend only mildly on the input; in the case of anomalous solutions, however, this is often not the case, so one wonders whether more powerful predictors of a simulation's behaviour could be extracted from other, apparently unrelated information contained in the digital model. If so, exposing this information to the machine-learning algorithms could arguably lead to more efficient and exhaustive searches. The investigation of this possibility is the core task that this project aims to undertake.

Planned Impact

Our technological society is predicated on the ability to use the scientific method to predict the future. The computational techniques to study and experiment with physical systems in a virtual, controlled setting have augmented our research toolbox with the game-changing ability to bring to life, in principle, any system satisfying any type of evolution equation. In this sense, computational science is the ultimate Gedankenexperiment machine.

Advances in computational science have therefore the potential to benefit the modelling of physical systems across the spectrum. Numerical models are routinely employed to model new particles and chemical compounds, discover new drugs, and predict traffic patterns and the weather. Digital twinning, or the process of building a computational model that captures all the relevant features of a given physical system, is widely adopted in industry to avoid direct experimentation which would be too expensive or hazardous. All these applications share common tools and practices, which the present research proposes to augment and streamline.

The core innovation outlined in this program is based on the observation that our potential to find elusive phenomena, predict natural disasters, and gain insight into other extreme events through computation is limited only by our ability to discover these regimes within a computational model. This project addresses the searchability question directly, by carrying out an investigation of the key aspect of this process: how can AI tools better understand physical simulations, and how can they optimally drive them? With these general answers available, the acceleration of any computational model becomes possible, and fields as diverse as chemical engineering and aerodynamic design, astrophysics and energy generation can benefit from unprecedented insight at a fraction of the cost.

Funded Value:

£1,106,091

Funded Period:

Feb 21 - Nov 25

Funder:

UKRI FLF

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

MR/T041862/1

Principal Investigator:

Eloisa Bentivegna

Research Topic:

Unclassified

Organisations

People	ORCID iD
Eloisa Bentivegna (Principal Investigator / Fellow)	http://orcid.org/0000-0003-1229-1653

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Bentivegna E (2022) Identifying Extreme Regimes in Climate-Scale Digital Twins: a Roadmap

Bentivegna E (2025) From architecture to atmospheric sensitivity: studying forecast uncertainty with Prithvi-WxC

Bentivegna E (2024) Eloisa Bentivegna Quantifying Uncertainties for Extreme Weather Events with Prithvi WxC Foundation Model

Bentivegna E (2023) A Variational Condition for Minimal-Residual Latent Representations

Bentivegna, E (2021) Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq equation

Bertram, L (2021) Fast or efficient? Strategy selection in the game Entropy Mastermind

Fathi A (2023) Towards operational methane emission detection from oil and gas facilities through multi-modal sensing and advanced dispersion and atmospheric modeling

Fathi A (2024) Site-scale methane plume simulation and validation from oil and gas facilities through advanced dispersion, atmospheric modeling, and scientific machine learning

Mukkavilli SK (2023) High-Resolution WRF Wind Field Dataset for GHG Applications: Navigating Computational Challenges in Energy Research

Nasim I (2024) Learning Reduced Order Dynamics via Geometric Representations

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Collaboration
Intellectual Property
Engagement Activities


Description	The first two years of this fellowship have been spent on fundamental investigations of application-agnostic search strategies for extreme solutions. We have now formulated two key proposals which respond to the criteria set out in the project proposals, and testing is under way. During 2024, extensive testing of this formalism was carried out. The formalism appears to work well in a broad range of applications. We are now increasing the complexity of these applications to test both the formalism and its computability further.
Exploitation Route	The proposed search methods can be applied to any phenomenon that can be modelled as a dynamical system. This spans countless sectors including weather, energy, and healthcare.
Sectors	Aerospace Defence and Marine Agriculture Food and Drink Chemicals Digital/Communication/Information Technologies (including Software) Education Electronics Energy Environment Financial Services and Management Consultancy Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology Security and Diplomacy Transport


Description	This project has highlighted the possibility to develop a common approach to the definition and identification of anomalies in many areas of science. Within the FLF network, it has established itself as a focal point for activities related to anomalies, whether they are strictly technical or related to public engagement and broader cultural impact. The absence of a scientific community around this theme is one of the issues that the project aims to pursue; creating a vibrant network of domain specialists who study anomalies in their own field is a substantial part of our effort.
First Year Of Impact	2023
Sector	Energy,Environment,Healthcare
Impact Types	Cultural Societal


Description	Participation in the "Digital Twins of the Earth" roundtable, organized by BEIS
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee


Description	Future Leaders Fellows Development Network -- Flexible Creative Fund
Amount	£5,938 (GBP)
Funding ID	FCF0002
Organisation	United Kingdom Research and Innovation
Sector	Public
Country	United Kingdom
Start	04/2024
End	07/2024


Title	Dataset of simulations of wave dynamics near coastal region with parametrized levee
Description	Dataset of 12006 simulations used in the article E. Bentivegna, "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq Equation" (accepted as a workshop paper at ICLR 2021 SimDL Workshop). The dataset contains 12006 simulations of coastal wave dynamics around a levee, of sinusoidal shape parametrized by its amplitude AMP and wave number N.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	The dataset represents a class of simplified coastal-wave dynamics with different levee geometries. It can be used as a testbed for different model-reduction techniques in order to understand the principal modes of wave dynamics in this scneario and speed up levee design. This task is a prototype to understand how large-scale simulations can be better represented and sampled. The dataset was used in the publication "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq equation" (E. Bentivegna, simDL @ ICLR 2021, https://simdl.github.io/files/8.pdf).
URL	https://zenodo.org/record/4728023


Title	Datasets of solutions of the 1D Korteweg-de Vries equation
Description	Datasets of 50,000 solutions of the 1D Korteweg-de Vries equation, for different amplitudes, phases and frequencies of the initial data. The data has been obtained via direct numerical integration.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	This dataset has enabled the first tests of the extreme-event strategy pioneered by this grant.
URL	https://zenodo.org/record/7457629


Title	Prithvi-WxC weather ensemble
Description	An ensemble of 100 Foundation Models, based on the Prithvi-WxC architecture (https://huggingface.co/ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M), representing different architectural variations.
Type Of Material	Computer model/algorithm
Year Produced	2024
Provided To Others?	No
Impact	The ensemble clarified the effect of architectural changes on the model's forecasts.


Description	Collaboration with IBM Research Brazil
Organisation	IBM Research - Brazil
Country	Brazil
Sector	Private
PI Contribution	My team collaborates with the Physics-Informed AI team in IBM Brazil, sharing expertise and work cycles to improve our internal understanding of key technologies, as well as their extension and implementation into software.
Collaborator Contribution	My collaborators have developed a key software solution (simulAI, https://github.com/IBM/simulai) that simplifies the adoption of common PIAI algorithms.
Impact	Two publications, and the simulAI package.
Start Year	2021


Description	Collaboration with NASA
Organisation	National Aeronautics and Space Administration (NASA)
Country	United States
Sector	Public
PI Contribution	Contribution of novel theoretical framework for extreme-event modelling.
Collaborator Contribution	Collaboration network to aggregate expertise in weather extremes, from theoretical to operational, from domain specific (environmental, atmospheric) to HPC and AI.
Impact	The collaboration has completed the first model ensemble study for Prithvi-WxC, publishing the details at the 2024 AGU and the 2025 EGU meetings.
Start Year	2024


Description	Collaboration with Oak Ridge Leadership Computing Facility
Organisation	Oak Ridge National Laboratory
Country	United States
Sector	Public
PI Contribution	User feedback on use of supercomputing facilities to develop large-scale models.
Collaborator Contribution	Compute time and support towards the development of a weather and climate AI model. Inclusion in relevant hackathons and related activities.
Impact	The computing support enabled the ensemble study published at the AGU/EGU meetings.
Start Year	2024


Description	Collaboration with Watson Research Center
Organisation	IBM
Department	IBM T. J. Watson Research Center, Yorktown Heights
Country	United States
Sector	Private
PI Contribution	The collaboration unites all IBM researchers interested in Physics-Informed Artificial Intelligence. My team has contributed the expertise coming from all the literature review and experiments carried out under the project, where PIAI is a key ingredient for the prediction of extreme events.
Collaborator Contribution	The other members of the collaboration have contributed time to work on common PIAI development (both theoretical and computational).
Impact	Funding for a meshing software license required by one of the project applications.
Start Year	2021


Title	ESTIMATING EMISSION SOURCE LOCATION FROM SATELLITE IMAGERY
Description	In an approach for estimating emission source location from satellite plume data, a processor creates a dataset of plume concentration data. A processor down samples the dataset to an array at satellite resolution. A processor partitions the array into two separate datasets according to a preset proportion. A processor trains two machine learning models on at least one of the two separate datasets, wherein a first machine learning model of the two machine learning models is for identifying a presence of a plume and a second machine learning model of the two machine learning models is for identifying a source position and magnitude of the plume. A processor applies the two machine learning models to new concentration data.
IP Reference	US2023418999
Protection	Patent / Patent application
Year Protection Granted	2023
Licensed	No


Title	SUPER RESOLVED SATELLITE IMAGES VIA PHYSICS CONSTRAINED NEURAL NETWORK
Description	A method, computer program product and system to generate higher resolution geospatial images is provided. A processor receives time sequenced spatial data images at a first resolution. A processor determines from the plurality of spatial data images physics laws applicable to the spatial data images. A processor subdivides each of the plurality of spatial data images into a plurality of small spatial region images. A processor solves each of the physics laws in each of the small spatial region images. A processor trains a neural network to apply each of the physics laws to each small spatial region image by applying a regional physics law loss function. A processor determines the most applicable regional physics law based on the difference between the small spatial region image and the image predicted for that region by the physics law. A processor generates a second higher-resolution image than the first resolution.
IP Reference	US2024005065
Protection	Patent / Patent application
Year Protection Granted	2024
Licensed	No


Description	Anomaly Day 2023
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	I organized a workshop focused on anomalies, hosting around 30 other Future Leaders Fellows at the IBM offices in York Rd, London. This has generated an ongoing discussion in the network, around anomalies and how we can co-develop interdisciplinary solutions to discover and forecast them.
Year(s) Of Engagement Activity	2023


Description	Anomaly Day 2024 @ Hursley
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The day involved a lecture on anomalies from a philosophy-of-science perspective, as well as presentation on IBM Research and open discussions around these topics. A tour of the site and local museum of computing was also included. This day enabled us to understand each other's perspectives and interests around anomalies. The audience was composed of Future Leaders Fellows and some of their teams.
Year(s) Of Engagement Activity	2024


Description	Cambridge Unsteady Flow Symposium
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The symposium brought together internationally leading experts in Computational Fluid Dynamics (CFD) and promoted discussions on numerical methods for unsteady flows. This is a class of physics problems underpinning extreme-event phenomenology in certain domains, such as weather and climate.
Year(s) Of Engagement Activity	2024
URL	https://www.cufs.co.uk


Description	Chairing the IBM Professional Interest Community in Dynamical Systems
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I have co-founded the IBM Professional Interest Community in Dynamical Systems, and been appointed as its Chair. I conduct discussions and initiatives within this group, and advocate for the role of Dynamical Systems within the IBM Research agenda.
Year(s) Of Engagement Activity	2022,2023


Description	Illustration/narration workshop around anomalies
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	This activity was an interactive workshop with a professional illustrators, who helped four participating FLFs create visuals of their research and use them to understand key concepts better. This also supported communicating these concept better to wider audiences.
Year(s) Of Engagement Activity	2024


Description	Journal Club in Geometric Deep Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	We organized a journal club series on Geometric Deep Learning, bringing together researchers from several IBM research labs around the world to discuss publications and listen to external speakers on this emerging topic.
Year(s) Of Engagement Activity	2022


Description	Leading the IBM Open Source Science (OSSci) Interest Group in Climate & Sustainability
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I lead a thematic Interest Group within IBM's Open Source Science initiative. This activity aims to build networks of practitioners in different disciplines, interested in using open-source software and data to advance science. More at https://opensource.science.
Year(s) Of Engagement Activity	2022,2023
URL	https://opensource.science

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications