Blueprinting AI for Science at Exascale - Phase II (BASE-II)
Lead Research Organisation:
Science and Technology Facilities Council
Department Name: Scientific Computing Department
Abstract
Advances in Artificial Intelligence (AI) are transforming the world we live in today. The innovations are driving two, interconnected aspects: They augment our knowledge, for example, we understand the behaviour of a virus better and faster than we did a decade ago. This improved understanding fuels innovations, improving the quality of our life, such as better vaccines, or better batteries for our mobile phones or our electric vehicles. The role AI and thus of computing is rather crucial for such advancements.
The desire to improve our knowledge on fundamentals, and thus to improve the quality of our life, has become central to our existence. Better and faster understanding leads to better and faster innovations being developed. This essential desire, in turn, demands computations to be performed at a faster rate than ever before - not only to understand very large datasets better, but also to perform very complex simulations, at least at a rate 50 times faster than most powerful computers we have on the planet today --- era of exascale computing. Exascale computers will be able to perform billion billion calculations per second.
The general challenge is to have relevant software technologies ready when such exascale computing becomes a reality, and it is a significant challenge to the international community.
This proposal aims to develop a software suite and relevant software designs to serve as blueprints for using AI for scientific discoveries at exascale --- Blueprinting AI for Science at Exascale (BASE-II). This project is a continuation of our previous work, carried out as part of Phase I, namely, Benchmarking for AI for Science at Exascale (BASE-I). In Phase I, we gathered an essential set of requirements from various scientific communities, which underpins our work in this phase,
The resulting software and designs will cover the following:
a) Facilitate better understanding of the interplay between different AI algorithms, and AI hardware systems across a range of scientific problems. We will be achieving this through a set of AI benchmarks, against which different AI software can be verified,
b) Facilitating incredibly complex simulations using AI: Although exascale systems will facilitate complex simulations (which are essential for mimicking realistic cases), we will accelerate them using AI. This can result in remarkable speedups (e.g., from days to seconds). Such a transformation can provide a massive leap in scientific discoveries.
c) Harmonising the efforts of scientific communities and of vendors through better partnerships: Exascale systems will have complex hardware capabilities, which may be difficult for scientists to understand. Equally, hardware system manufacturers working on the design of exascale systems, do not always understand the underpinning science. This unharmonised effort or non-synchronised advancements, hitherto has been sub-optimal. We intend to build better software / hardware through better partnerships, which we refer to as hardware-software co-design.
d) The success of AI is primarily due to a technology called, deep learning, which inherently relies on very large volumes of data. With technological advances, we can foresee that in the exascale era, the data volumes will not only be huge but also will be multi-modal. Understanding these extremely large-scale datasets will remain key to ensuring that AI can be conducted at exascale.
e) Finally, the community, whether scientific, or academic or industry, will need additional software technologies, or more specifically, an ecosystem of software tools to help with exascale computing. To this end, we will be producing a software toolbox.
We will also be conducting various knowledge exchange activities, such as, workshops, training events and in-field placements to ensure multi-directional flow of information and knowledge across relevant stakeholders and communities.
The desire to improve our knowledge on fundamentals, and thus to improve the quality of our life, has become central to our existence. Better and faster understanding leads to better and faster innovations being developed. This essential desire, in turn, demands computations to be performed at a faster rate than ever before - not only to understand very large datasets better, but also to perform very complex simulations, at least at a rate 50 times faster than most powerful computers we have on the planet today --- era of exascale computing. Exascale computers will be able to perform billion billion calculations per second.
The general challenge is to have relevant software technologies ready when such exascale computing becomes a reality, and it is a significant challenge to the international community.
This proposal aims to develop a software suite and relevant software designs to serve as blueprints for using AI for scientific discoveries at exascale --- Blueprinting AI for Science at Exascale (BASE-II). This project is a continuation of our previous work, carried out as part of Phase I, namely, Benchmarking for AI for Science at Exascale (BASE-I). In Phase I, we gathered an essential set of requirements from various scientific communities, which underpins our work in this phase,
The resulting software and designs will cover the following:
a) Facilitate better understanding of the interplay between different AI algorithms, and AI hardware systems across a range of scientific problems. We will be achieving this through a set of AI benchmarks, against which different AI software can be verified,
b) Facilitating incredibly complex simulations using AI: Although exascale systems will facilitate complex simulations (which are essential for mimicking realistic cases), we will accelerate them using AI. This can result in remarkable speedups (e.g., from days to seconds). Such a transformation can provide a massive leap in scientific discoveries.
c) Harmonising the efforts of scientific communities and of vendors through better partnerships: Exascale systems will have complex hardware capabilities, which may be difficult for scientists to understand. Equally, hardware system manufacturers working on the design of exascale systems, do not always understand the underpinning science. This unharmonised effort or non-synchronised advancements, hitherto has been sub-optimal. We intend to build better software / hardware through better partnerships, which we refer to as hardware-software co-design.
d) The success of AI is primarily due to a technology called, deep learning, which inherently relies on very large volumes of data. With technological advances, we can foresee that in the exascale era, the data volumes will not only be huge but also will be multi-modal. Understanding these extremely large-scale datasets will remain key to ensuring that AI can be conducted at exascale.
e) Finally, the community, whether scientific, or academic or industry, will need additional software technologies, or more specifically, an ecosystem of software tools to help with exascale computing. To this end, we will be producing a software toolbox.
We will also be conducting various knowledge exchange activities, such as, workshops, training events and in-field placements to ensure multi-directional flow of information and knowledge across relevant stakeholders and communities.
Organisations
- Science and Technology Facilities Council (Lead Research Organisation, Project Partner)
- Rosalind Franklin Institute (Collaboration, Project Partner)
- Culham Centre for Fusion Energy (Collaboration)
- Oak Ridge National Laboratory (Collaboration, Project Partner)
- Diamond Light Source (Collaboration, Project Partner)
- Argonne National Laboratory (Collaboration, Project Partner)
- University of California, San Diego (UCSD) (Collaboration)
- EDF Energy R&D UK Centre Limited (Project Partner)
- University of California, San Diego (Project Partner)
- Graphcore (Project Partner)
- DDN (DataDirect Network) (International) (Project Partner)
- Cerebras Systems (Project Partner)
- University of Cambridge (Project Partner)
- The Alan Turing Institute (Project Partner)
- United Kingdom Atomic Energy Authority (Project Partner)
- DiRAC (Distributed Res utiliz Adv Comp) (Project Partner)
- StackHPC Limited (Project Partner)
- Boston Ltd (Project Partner)
- IBM (United Kingdom) (Project Partner)
- British Antarctic Survey (Project Partner)
- nVIDIA (Project Partner)
Publications
Leng K
(2024)
Deep Learning Evidence for Global Optimality of Gerver's Sofa
in Symmetry
Cha J
(2025)
Discovering fully semantic representations via centroid- and orientation-aware feature learning
in Nature Machine Intelligence
Leng K
(2024)
Zero coordinate shift: Whetted automatic differentiation for physics-informed operator learning
in Journal of Computational Physics
| Description | Since the programme been funded: * The notion of AI being helpful in science * The notion of Ai solutions targeted towards a family of problems of similar kind (patterns or blueprints) * The utility of benchmarking for quantifying the ability of AI or ML to complete a range of scientific tasks. * The funding has enabled the establishment of the Scientific Machine Learning group within STFC to address various important scientific problems, overcoming the challenges across various scientific domains. * We were able to demonstrate the significance of AI for Science across various facilities within STFC, such as ISIS Neutron and Muon Source, Diamond Light Source, Central Laser Facility, RAL-Space, UK Astronomy Technology Centre and Extreme Photonics Application Centre. * The works resulting from this award have enabled scientists for obtaining key findings across various domains, from being able to predict damages to optical components in a laser facility to understanding how neutron scatters on specific materials to enabling improved imaging in modern microscopes. * The funding has also provided an international standing for the group, and for the STFC on AI for Science, which in turn has enabled a number of international collaborations, particularly with the DOE laboratories in the US (AIRS, AIRS-II) and India (BioImaging) * The award has been instrumental in securing additional funding (e.g., NPRAISE), which not only helped developing more techniques, but also enabled better networking and collaborations for better outcome for the community. |
| Exploitation Route | The outcomes of this funding includes scientific publications, software for scientists, AI benchmarks and some fundamental research outcomes in machine learning. As such, • * The outcomes are useful across respective domains to further studying or advancing their studies in various scientific domains, * Scientists and researchers can understand the role of AI in Science, and build on those examples, * Scientists and researchers can use various software developed under this funding (they are progressively being made public), as blueprints for develop additional techniques, * Academics, and students can make use of the training materials around ML for Science for further understanding or developing teaching materials, * Computer scientists and ML practitioners can develop improved techniques based on the outcomes here. * Hardware manufacturers and ML software developers can use the benchmarks to better understand their systems and software. |
| Sectors | Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Education Environment Healthcare Pharmaceuticals and Medical Biotechnology |
| Description | The overarching idea of finding patterns of blueprints from our benchmarking work has been tremendously useful beyond academic impact. These include (but not limited to) * The notion of understanding that AI can serve different types of problems across experimental sciences (among the STFC and EPSRC community) * The notion of benchmarking as a measure of ability of AI to do tasks that are non-scientific (e.g., LLMs) * The notion of upskilling scientists on AI or ML and benchmarking * How AI can be used to drive science across STFC facilities. * Influence of the work on the international space, especially on the collaboration front. |
| First Year Of Impact | 2024 |
| Sector | Digital/Communication/Information Technologies (including Software),Education |
| Impact Types | Cultural Societal Economic |
| Description | National Platform for RTPs on AI for Science and Engineering (NPRAISE) |
| Amount | £1,701,275 (GBP) |
| Funding ID | EP/Y530633/1 |
| Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 01/2024 |
| End | 12/2027 |
| Title | PINN-PDE-compatibility Checking Tool |
| Description | We shed light on a pitfall and an opportunity in physics-informed neural networks (PINNs). We prove that a multilayer perceptron (MLP) only with ReLU (Rectified Linear Unit) or ReLU-like Lipschitz activation functions will always lead to a vanished Hessian. Such a network-imposed constraint contradicts any second- or higher-order partial differential equations (PDEs). Therefore, a ReLU-based MLP cannot form a permissible function space for the approximation of their solutions. Inspired by this pitfall, we prove that a linear PDE up to the n-th order can be strictly satisfied by an MLP with activation functions when the weights of its output layer lie on a certain hyperplane, as called the out-layer-hyperplane. An MLP equipped with the out-layer-hyperplane becomes "physics-enforced", no longer requiring a loss function for the PDE itself (but only those for the initial and boundary conditions). Such a hyperplane exists not only for MLPs but for any network architecture tailed by a fully-connected hidden layer. To our knowledge, this should be the first PINN architecture that enforces point-wise correctness of PDEs. We show a closed-form expression of the out-layer-hyperplane for second-order linear PDEs, which can be generalised to higher-order nonlinear PDEs. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | This technique has paved a way for further research into PINNs and PDEs, and development of additional techniques (see others, such as Zero Cross Shift). |
| URL | https://github.com/stfc-sciml/pinn-pde-compatibility |
| Title | Padding-free Convolution based on Preservation of Differential Characteristics of Kernels |
| Description | Convolution is a fundamental operation in image processing and machine learning. Aimed primarily at maintaining image size, padding is a key ingredient of convolution, which, however, can introduce undesirable boundary effects. We present a non-padding-based method for size-keeping convolution based on the preservation of differential characteristics of kernels. The main idea is to make convolution over an incomplete sliding window "collapse" to a linear differential operator evaluated locally at its central pixel, which no longer requires information from the neighbouring missing pixels. While the underlying theory is rigorous, our final formula turns out to be simple: the convolution over an incomplete window is achieved by convolving its nearest complete window with a transformed kernel. This formula is computationally lightweight, involving neither interpolation or extrapolation nor restrictions on image and kernel sizes. Our method favours data with smooth boundaries, such as high-resolution images and fields from physics. Our experiments include: i) filtering analytical and non-analytical fields from computational physics and, ii) training convolutional neural networks (CNNs) for the tasks of image classification, semantic segmentation and super-resolution reconstruction. In all these experiments, our method has exhibited visible superiority over the compared ones. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | This has resulted in the technique being used in various benchmarks in the BASE-II space, and resulting publications (see publications). |
| URL | https://github.com/stfc-sciml/DifferentialConv2d |
| Title | Zero Coordinate Shift (ZCS) for Automatic Differentiation |
| Description | Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves" whereby reverse-mode AD becomes directly utilisable. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions (physical parameters). ZCS is easy to implement with current deep learning libraries; our own implementation is achieved by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently reduced GPU memory consumption and wall time for training by an order of magnitude, and such reduction factor scales with the number of functions. As a low-level optimisation technique, ZCS imposes no restrictions on data, physics (PDE) or network architecture and does not compromise training results from any aspect. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | The technique has been integrated into a widely used research tool, namely, DeepXDE for wider consumption. |
| URL | https://github.com/stfc-sciml/ZeroCoordinateShift |
| Description | ANL & BASE2 |
| Organisation | Argonne National Laboratory |
| Country | United States |
| Sector | Public |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Investigator time, staff time from the collaborator for discussions, training sessions, data sets, case studies, participation at BASE-II specific events. |
| Impact | The outcomes have been multi-disciplinary, resulting in publications (see those sections), collaboration and support in further grants, namely, AIRS and NPRAISE. |
| Start Year | 2022 |
| Description | Diamond-BASEII |
| Organisation | Diamond Light Source |
| Country | United Kingdom |
| Sector | Private |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Investigator time, staff time from the collaborator for discussions, data sets, case studies, participation at BASE-II specific events, and hosting of BASE-II staff in Diamond. |
| Impact | The outcomes have been multi-disciplinary, resulting in publications (see those sections), collaboration and support in further grants, namely, AIRS and NPRAISE. |
| Start Year | 2022 |
| Description | ORNL-BASEII |
| Organisation | Oak Ridge National Laboratory |
| Country | United States |
| Sector | Public |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Investigator time, staff time from the collaborator for discussions, training sessions, data sets, case studies, participation at BASE-II specific events. |
| Impact | The outcomes have been multi-disciplinary, resulting in publications (see those sections), software, collaboration and support in further grants, namely, AIRS and NPRAISE. |
| Start Year | 2022 |
| Description | RFI & BASEII |
| Organisation | Rosalind Franklin Institute |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Staff time from the collaborator for discussions, training sessions, data sets, case studies, participation at BASE-II specific events. |
| Impact | The outcomes have been discussion of further collaboration but none has been materialised yet. |
| Start Year | 2022 |
| Description | SDSC-BASEII |
| Organisation | University of California, San Diego (UCSD) |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Investigator time, compute time, and access to object storage system space. |
| Impact | The outcomes have been multi-disciplinary, resulting in further collaborations and publications that we are still working todate. |
| Start Year | 2022 |
| Description | UKAEA & BASEII |
| Organisation | Culham Centre for Fusion Energy |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Development of benchmarks, case studies, real-world use utility of those cases, code development, identification of open datasets, staff time on our side that would benefit both sides of the collaboration. |
| Collaborator Contribution | Staff time from the collaborator for discussions, training sessions, data sets, case studies, participation at BASE-II specific events. |
| Impact | The outcomes have been multi-disciplinary, resulting in discussions, that led to collaborations with direct financial contribution from the UKAEA (for example, the FAIRMAST project), resulting in publications - not directly attributed to this grant, but arisen as a result of this collaboration. The collaboration has further developed us to be invited into broader discussions and contributions UKAEA's mission on fusion energy. |
| Start Year | 2022 |
| Description | Kick Off Workshop |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Professional Practitioners |
| Results and Impact | This launch event, which was organised as a working group of experts on the 11th and 12th May 2023 at the University of Leicester; Outputs from the day available on YouTube; 30 invited representatives attended from UKAEA, Graphcore, DDN, Alan Turing, Warwick Economics and Development. The workshop was helped developing an action plan for the project. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.youtube.com/channel/UCxezLkXCAxqJBIQjT0OceVQ |
| Description | Public Seminar: Benchmarking LLMs on large-scale systems |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | This was a talk for a broader audience by Ana Gainaru (from ORNL), which took place on the 7th Feb 2024, 4-5pm GMT. The talk was focussed on how various aspects of LLMs can be benchmarked, quantified and used for scientific purposes. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Public Seminar: Benchmarking storage performance of LLMs |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Benchmarking storage performance of LLMs offered by Jean-Thomas Acquaviva (DDN) on the 24th Jan 2024, 4-5pm GMT. The talk focussed on benchmarking aspects of the storage side. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Public Seminar: Data-parallelism based deep learning approach for the model order reduction of parametrized partial differential equations |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | The talk was focussed on data-parallelism based deep learning for the model order reduction of parametrized partial differential equations. The event took place in September 2023, and was offered by Dr. Nirav Shah (University of Cambridge). It was very useful seminar in debating whether reduced order models are potential benchmark candidates (or not). |
| Year(s) Of Engagement Activity | 2023 |
| Description | Public Seminar: On foundation models and autonomous discovery for biological systems engineering: Peptides, Proteins, Pathways, and Beyond |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | The talk was delivered by Arvind Ramanathan (Argonne), in May 2023, and focussed on AI for accelerating the development of autonomous labs. This seminar was key to setting up further collaborations with the DOE labs. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Public Seminar: Surrogate Physical Models |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | This was a seminar offered by - Frederik De Ceuster (Institute of Astronomy, KU Leuven), on the 5th June 2024, 4pm to 5pm. The talk covered how ML surrogates can be utilised to complement or replace HPC-driven simulations, especially in the context of large-scale cosmological simulations. The talk was attend by more than 50 people, and there were detailed discussion of potential uses cases. |
| Year(s) Of Engagement Activity | 2024 |
