📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Discovering rare, extreme behaviour in large-scale computational models

Lead Research Organisation: IBM (United Kingdom)
Department Name: Research UK

Abstract

The construction of high-fidelity digital models of complex physical phenomena, and more importantly their deployment as investigation tools for science and engineering, are some of the most critical undertakings of scientific computing today. Without computational models, the study of spatially-irregular, multi-scale, or highly coupled, nonlinear physical systems would simply not be tractable.

Even when computational models are available, however, tuning their physical and geometrical parameters (sometimes referred to as control variables) for optimal exploration and discovery is a colossal endeavour. In addition to the technological challenges inherent to massively parallel computation, the task is complicated by the scientific complexity of large-scale systems, where many degrees of freedom can team up and generate emergent, anomalous, resonant features which get more and more pronounced as the model's fidelity is increased (e.g., in turbulent scenarios). These features may correspond to highly interesting system configurations, but they are often too short-lived or isolated in the control space to be found using brute-force computation alone. Yet, most computational surveys today are guided by random (albeit somewhat educated by instinct) guesses.

The potential for missed phenomenology is simply unquantifiable. In many domains, anomalous solutions could describe life-threatening events such as extreme weather. A digital model of an industrial system may reveal, under special conditions, an anomalous response to the surrounding environment, which could lead to decreased efficiency, material fatigue, and structural failure. Precisely because of their singular and catastrophic nature, as well as infrequency and short life, these configurations are also the hardest to predict. Any improvement in our capacity to locate where anomalous dynamics may unfold could therefore tremendously impact our ability to protect against extreme events. More fundamentally, establishing whether the set of equations implemented in a computational model is at all able to reproduce specific, exotic solutions (such as rare astronomical transients [1]) for certain configuration parameters can expose (or exclude) the manifestation of new physics, and shed light on the laws that govern our Universe.

Recently, the long-lived but sparse attempts [2] to instrument simulations with optimisation algorithms have grown into a mainstream effort. Current trends in Intelligent-Simulation orchestration stress the need to instruct the computational surveys to learn from previous runs, but they do not address the question of which information it would be most valuable to extract. A theoretical formalism to classify the information processed by large computational models is simply absent. The main objective of this project is to develop a roadmap for the definition of such a formalism.

The key question is how one can optimally learn from large computational models. This is a deep, overarching issue affecting experimental as well as computational science, and has been recently proven to be an NP hard problem [3]. Correspondingly, the common approach to simulation data reduction is often pragmatic rather than formal: if solutions with specific properties (such as a certain aerodynamic drag coefficient) are sought, those properties are directly turned into objective functions, taking the control variables as input arguments. This is reasonable when these properties depend only mildly on the input; in the case of anomalous solutions, however, this is often not the case, so one wonders whether more powerful predictors of a simulation's behaviour could be extracted from other, apparently unrelated information contained in the digital model. If so, exposing this information to the machine-learning algorithms could arguably lead to more efficient and exhaustive searches. The investigation of this possibility is the core task that this project aims to undertake.

Planned Impact

Our technological society is predicated on the ability to use the scientific method to predict the future. The computational techniques to study and experiment with physical systems in a virtual, controlled setting have augmented our research toolbox with the game-changing ability to bring to life, in principle, any system satisfying any type of evolution equation. In this sense, computational science is the ultimate Gedankenexperiment machine.

Advances in computational science have therefore the potential to benefit the modelling of physical systems across the spectrum. Numerical models are routinely employed to model new particles and chemical compounds, discover new drugs, and predict traffic patterns and the weather. Digital twinning, or the process of building a computational model that captures all the relevant features of a given physical system, is widely adopted in industry to avoid direct experimentation which would be too expensive or hazardous. All these applications share common tools and practices, which the present research proposes to augment and streamline.

The core innovation outlined in this program is based on the observation that our potential to find elusive phenomena, predict natural disasters, and gain insight into other extreme events through computation is limited only by our ability to discover these regimes within a computational model. This project addresses the searchability question directly, by carrying out an investigation of the key aspect of this process: how can AI tools better understand physical simulations, and how can they optimally drive them? With these general answers available, the acceleration of any computational model becomes possible, and fields as diverse as chemical engineering and aerodynamic design, astrophysics and energy generation can benefit from unprecedented insight at a fraction of the cost.
 
Description The first two years of this fellowship have been spent on fundamental investigations of application-agnostic search strategies for extreme solutions. We have now formulated two key proposals which respond to the criteria set out in the project proposals, and testing is under way.
During 2024, extensive testing of this formalism was carried out. The formalism appears to work well in a broad range of applications. We are now increasing the complexity of these applications to test both the formalism and its computability further.
Exploitation Route The proposed search methods can be applied to any phenomenon that can be modelled as a dynamical system. This spans countless sectors including weather, energy, and healthcare.
Sectors Aerospace

Defence and Marine

Agriculture

Food and Drink

Chemicals

Digital/Communication/Information Technologies (including Software)

Education

Electronics

Energy

Environment

Financial Services

and Management Consultancy

Healthcare

Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

Security and Diplomacy

Transport

 
Description This project has highlighted the possibility to develop a common approach to the definition and identification of anomalies in many areas of science. Within the FLF network, it has established itself as a focal point for activities related to anomalies, whether they are strictly technical or related to public engagement and broader cultural impact. The absence of a scientific community around this theme is one of the issues that the project aims to pursue; creating a vibrant network of domain specialists who study anomalies in their own field is a substantial part of our effort.
First Year Of Impact 2023
Sector Energy,Environment,Healthcare
Impact Types Cultural

Societal

 
Description Participation in the "Digital Twins of the Earth" roundtable, organized by BEIS
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Description Future Leaders Fellows Development Network -- Flexible Creative Fund
Amount £5,938 (GBP)
Funding ID FCF0002 
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 04/2024 
End 07/2024
 
Title Dataset of simulations of wave dynamics near coastal region with parametrized levee 
Description Dataset of 12006 simulations used in the article E. Bentivegna, "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq Equation" (accepted as a workshop paper at ICLR 2021 SimDL Workshop). The dataset contains 12006 simulations of coastal wave dynamics around a levee, of sinusoidal shape parametrized by its amplitude AMP and wave number N. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset represents a class of simplified coastal-wave dynamics with different levee geometries. It can be used as a testbed for different model-reduction techniques in order to understand the principal modes of wave dynamics in this scneario and speed up levee design. This task is a prototype to understand how large-scale simulations can be better represented and sampled. The dataset was used in the publication "Towards an Operationally Meaningful, Explainable Emulator for the Boussinesq equation" (E. Bentivegna, simDL @ ICLR 2021, https://simdl.github.io/files/8.pdf). 
URL https://zenodo.org/record/4728023
 
Title Datasets of solutions of the 1D Korteweg-de Vries equation 
Description Datasets of 50,000 solutions of the 1D Korteweg-de Vries equation, for different amplitudes, phases and frequencies of the initial data. The data has been obtained via direct numerical integration. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This dataset has enabled the first tests of the extreme-event strategy pioneered by this grant. 
URL https://zenodo.org/record/7457629
 
Title Prithvi-WxC weather ensemble 
Description An ensemble of 100 Foundation Models, based on the Prithvi-WxC architecture (https://huggingface.co/ibm-nasa-geospatial/Prithvi-WxC-1.0-2300M), representing different architectural variations. 
Type Of Material Computer model/algorithm 
Year Produced 2024 
Provided To Others? No  
Impact The ensemble clarified the effect of architectural changes on the model's forecasts. 
 
Description Collaboration with IBM Research Brazil 
Organisation IBM Research - Brazil
Country Brazil 
Sector Private 
PI Contribution My team collaborates with the Physics-Informed AI team in IBM Brazil, sharing expertise and work cycles to improve our internal understanding of key technologies, as well as their extension and implementation into software.
Collaborator Contribution My collaborators have developed a key software solution (simulAI, https://github.com/IBM/simulai) that simplifies the adoption of common PIAI algorithms.
Impact Two publications, and the simulAI package.
Start Year 2021
 
Description Collaboration with NASA 
Organisation National Aeronautics and Space Administration (NASA)
Country United States 
Sector Public 
PI Contribution Contribution of novel theoretical framework for extreme-event modelling.
Collaborator Contribution Collaboration network to aggregate expertise in weather extremes, from theoretical to operational, from domain specific (environmental, atmospheric) to HPC and AI.
Impact The collaboration has completed the first model ensemble study for Prithvi-WxC, publishing the details at the 2024 AGU and the 2025 EGU meetings.
Start Year 2024
 
Description Collaboration with Oak Ridge Leadership Computing Facility 
Organisation Oak Ridge National Laboratory
Country United States 
Sector Public 
PI Contribution User feedback on use of supercomputing facilities to develop large-scale models.
Collaborator Contribution Compute time and support towards the development of a weather and climate AI model. Inclusion in relevant hackathons and related activities.
Impact The computing support enabled the ensemble study published at the AGU/EGU meetings.
Start Year 2024
 
Description Collaboration with Watson Research Center 
Organisation IBM
Department IBM T. J. Watson Research Center, Yorktown Heights
Country United States 
Sector Private 
PI Contribution The collaboration unites all IBM researchers interested in Physics-Informed Artificial Intelligence. My team has contributed the expertise coming from all the literature review and experiments carried out under the project, where PIAI is a key ingredient for the prediction of extreme events.
Collaborator Contribution The other members of the collaboration have contributed time to work on common PIAI development (both theoretical and computational).
Impact Funding for a meshing software license required by one of the project applications.
Start Year 2021
 
Title ESTIMATING EMISSION SOURCE LOCATION FROM SATELLITE IMAGERY 
Description In an approach for estimating emission source location from satellite plume data, a processor creates a dataset of plume concentration data. A processor down samples the dataset to an array at satellite resolution. A processor partitions the array into two separate datasets according to a preset proportion. A processor trains two machine learning models on at least one of the two separate datasets, wherein a first machine learning model of the two machine learning models is for identifying a presence of a plume and a second machine learning model of the two machine learning models is for identifying a source position and magnitude of the plume. A processor applies the two machine learning models to new concentration data. 
IP Reference US2023418999 
Protection Patent / Patent application
Year Protection Granted 2023
Licensed No
 
Title SUPER RESOLVED SATELLITE IMAGES VIA PHYSICS CONSTRAINED NEURAL NETWORK 
Description A method, computer program product and system to generate higher resolution geospatial images is provided. A processor receives time sequenced spatial data images at a first resolution. A processor determines from the plurality of spatial data images physics laws applicable to the spatial data images. A processor subdivides each of the plurality of spatial data images into a plurality of small spatial region images. A processor solves each of the physics laws in each of the small spatial region images. A processor trains a neural network to apply each of the physics laws to each small spatial region image by applying a regional physics law loss function. A processor determines the most applicable regional physics law based on the difference between the small spatial region image and the image predicted for that region by the physics law. A processor generates a second higher-resolution image than the first resolution. 
IP Reference US2024005065 
Protection Patent / Patent application
Year Protection Granted 2024
Licensed No
 
Description Anomaly Day 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I organized a workshop focused on anomalies, hosting around 30 other Future Leaders Fellows at the IBM offices in York Rd, London. This has generated an ongoing discussion in the network, around anomalies and how we can co-develop interdisciplinary solutions to discover and forecast them.
Year(s) Of Engagement Activity 2023
 
Description Anomaly Day 2024 @ Hursley 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The day involved a lecture on anomalies from a philosophy-of-science perspective, as well as presentation on IBM Research and open discussions around these topics. A tour of the site and local museum of computing was also included. This day enabled us to understand each other's perspectives and interests around anomalies. The audience was composed of Future Leaders Fellows and some of their teams.
Year(s) Of Engagement Activity 2024
 
Description Cambridge Unsteady Flow Symposium 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The symposium brought together internationally leading experts in Computational Fluid Dynamics (CFD) and promoted discussions on numerical methods for unsteady flows. This is a class of physics problems underpinning extreme-event phenomenology in certain domains, such as weather and climate.
Year(s) Of Engagement Activity 2024
URL https://www.cufs.co.uk
 
Description Chairing the IBM Professional Interest Community in Dynamical Systems 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I have co-founded the IBM Professional Interest Community in Dynamical Systems, and been appointed as its Chair. I conduct discussions and initiatives within this group, and advocate for the role of Dynamical Systems within the IBM Research agenda.
Year(s) Of Engagement Activity 2022,2023
 
Description Illustration/narration workshop around anomalies 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This activity was an interactive workshop with a professional illustrators, who helped four participating FLFs create visuals of their research and use them to understand key concepts better. This also supported communicating these concept better to wider audiences.
Year(s) Of Engagement Activity 2024
 
Description Journal Club in Geometric Deep Learning 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact We organized a journal club series on Geometric Deep Learning, bringing together researchers from several IBM research labs around the world to discuss publications and listen to external speakers on this emerging topic.
Year(s) Of Engagement Activity 2022
 
Description Leading the IBM Open Source Science (OSSci) Interest Group in Climate & Sustainability 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I lead a thematic Interest Group within IBM's Open Source Science initiative. This activity aims to build networks of practitioners in different disciplines, interested in using open-source software and data to advance science. More at https://opensource.science.
Year(s) Of Engagement Activity 2022,2023
URL https://opensource.science