Peta-5: A National Facility for Petascale Data Intensive Computation and Analytics
Lead Research Organisation:
University of Cambridge
Department Name: Physics
Abstract
The Peta-5 proposal from the University of Cambridge brings together 15 world-leading HPC system and application experts from 10 different institutions to lead the creation of a breakthrough HPC and data analytics capability that will deliver significant National impact to the UK research, industry and health sectors.
Peta-5 aims to make a significant contribution towards the establishment and sustainability of a new EPSRC Tier 2 HPC network. The Cambridge Tier 2 Centre working in collaboration with other Tier 1, Tier 2 and Tier 3 stakeholders aims to form a coherent, coordinated and productive National e-Infrastructure (Ne-I) ecosystem. This greatly strengthened computational research support capability will enable a significant increase in computational and data centric research outputs, driving growth in both academic research discovery and the wider UK knowledge economy.
The Peta-5 system will be one of the largest heterogeneous data intensive HPC systems available to EPSRC research in the UK. In order to create the critical mass in terms of system capability and capacity needed to make an impact at National level Cambridge have pooled funding and equipment resources from the University, STFC DiRAC and this EPSRC Tier 2 proposal to create a total capital equipment value of £11.5M; the request to EPSRC is £5M. The University will guarantee to cover all operational costs of the system for 4 years from the service start date, with the option to run for a fifth year to be discussed. Cambridge will ensure that 80% of the EPSRC funded element of Peta-5 is deployed on EPSRC research projects, with 65% of the EPSRC funded element of Peta-5 being made available to any UK EPSRC funded project free of charge by use of a light weight resource allocation committee, 15% going to Cambridge EPSRC research and 20% being sold to UK industry to drive the UK knowledge economy.
The Peta-5 system will be the most capable HPC system in operation in the UK when it enters service in May 2017. In total Peta-5 will provide 3 petaflops (PF) of sustained performance derived from 3 heterogeneous compute elements, 1PF Intel X86, 1PF Intel KNL and 1PF NIVIDIA Pascal GPU (Peta-1) connected via a Pb/s HPC fabric (Peta-2) to an extreme I/O solid state storage pool (Peta-3), a petascale data analytics (Machine Learning + Hadoop) pool (Peta-4) and a large 15 PB tiered storage solution (Peta-5), all under a single execution environment. This creates a new HPC capability in the UK specifically designed to meet the requirements of both affordable petascale simulation and data intensive workloads combined with complex data analytics. It is the combination of these features which unlocks a new generation of computational science research.
The core science justification for the Peta-5 service is based on three broad science themes: Materials Science and Computational Chemistry; Computational Engineering and Smart Cities; Health Informatics. These themes were chosen as they represent significant EPSRC research areas, which demonstrate large benefit from the data intensive HPC capability of Peta-5. The service will clearly be valuable for many other areas of heterogeneous computing and Data Intensive science. Hence a fourth horizontal thematic of "Heterogeneous - Data Intensive Science" is included. Initial theme allocation in the RAC will be: Materials 30%, Engineering 30%, Health, 20%, Heterogeneous - Data Intensive 20%.
The Peta-5 facility will drive research discovery and impact at national level, creating the largest and most cost effective petascale HPC resource in the UK, bringing petascale simulation within the reach of a wide range of research projects and UK companies. Also Peta-5 is the first UK HPC system specifically designed for large scale machine learning and data analytics, combining the areas of HPC and Big Data, promising to unlock both knowledge and economic benefit from the Big Data revolution.
Peta-5 aims to make a significant contribution towards the establishment and sustainability of a new EPSRC Tier 2 HPC network. The Cambridge Tier 2 Centre working in collaboration with other Tier 1, Tier 2 and Tier 3 stakeholders aims to form a coherent, coordinated and productive National e-Infrastructure (Ne-I) ecosystem. This greatly strengthened computational research support capability will enable a significant increase in computational and data centric research outputs, driving growth in both academic research discovery and the wider UK knowledge economy.
The Peta-5 system will be one of the largest heterogeneous data intensive HPC systems available to EPSRC research in the UK. In order to create the critical mass in terms of system capability and capacity needed to make an impact at National level Cambridge have pooled funding and equipment resources from the University, STFC DiRAC and this EPSRC Tier 2 proposal to create a total capital equipment value of £11.5M; the request to EPSRC is £5M. The University will guarantee to cover all operational costs of the system for 4 years from the service start date, with the option to run for a fifth year to be discussed. Cambridge will ensure that 80% of the EPSRC funded element of Peta-5 is deployed on EPSRC research projects, with 65% of the EPSRC funded element of Peta-5 being made available to any UK EPSRC funded project free of charge by use of a light weight resource allocation committee, 15% going to Cambridge EPSRC research and 20% being sold to UK industry to drive the UK knowledge economy.
The Peta-5 system will be the most capable HPC system in operation in the UK when it enters service in May 2017. In total Peta-5 will provide 3 petaflops (PF) of sustained performance derived from 3 heterogeneous compute elements, 1PF Intel X86, 1PF Intel KNL and 1PF NIVIDIA Pascal GPU (Peta-1) connected via a Pb/s HPC fabric (Peta-2) to an extreme I/O solid state storage pool (Peta-3), a petascale data analytics (Machine Learning + Hadoop) pool (Peta-4) and a large 15 PB tiered storage solution (Peta-5), all under a single execution environment. This creates a new HPC capability in the UK specifically designed to meet the requirements of both affordable petascale simulation and data intensive workloads combined with complex data analytics. It is the combination of these features which unlocks a new generation of computational science research.
The core science justification for the Peta-5 service is based on three broad science themes: Materials Science and Computational Chemistry; Computational Engineering and Smart Cities; Health Informatics. These themes were chosen as they represent significant EPSRC research areas, which demonstrate large benefit from the data intensive HPC capability of Peta-5. The service will clearly be valuable for many other areas of heterogeneous computing and Data Intensive science. Hence a fourth horizontal thematic of "Heterogeneous - Data Intensive Science" is included. Initial theme allocation in the RAC will be: Materials 30%, Engineering 30%, Health, 20%, Heterogeneous - Data Intensive 20%.
The Peta-5 facility will drive research discovery and impact at national level, creating the largest and most cost effective petascale HPC resource in the UK, bringing petascale simulation within the reach of a wide range of research projects and UK companies. Also Peta-5 is the first UK HPC system specifically designed for large scale machine learning and data analytics, combining the areas of HPC and Big Data, promising to unlock both knowledge and economic benefit from the Big Data revolution.
Planned Impact
As an innovative HPC service for data intensive science, Peta-5 will impact significantly on the research communities who make use of its resources. However, in addition to the expected science outcomes (e.g. papers in high-impact, peer-reviewed journals; keynote presentations at international conferences, etc.), Peta-5 will deliver impact in a number of other key areas:
1)Peta-5 will create one of the most powerful academic UK supercomputer facilities.
2)Peta-5 will provide the most cost effective petascale simulation capability in the UK providing unrivalled price performance. This unlocks sustainable HPC for academia and industry, demonstrating affordable petascale simulation capability. This is a game-changing capability widening access and opening new possibilities out of reach for many research projects or company budgets.
3)Peta-5 is currently the only HPC system in the UK aimed at data intensive computing, combining state of the art extreme I/O solid state storage technologies with emerging machine learning and data analytics frameworks. This provides a new capability for tackling the largest "Big Data" problems in UK research and industry.
In particular Peta-5 will:-
1)Enable new petascale academic research projects
Cambridge will pro-actively seek UK academic usage of the Peta-5 system by opening the system up to UK EPSRC researchers free of charge, with strong user support and low inertia application processes and particular emphasis on new users. Cambridge are well-connected to all levels of the Ne-I and via its involvement in many existing HPC academic networks will promote the uptake of the Peta-5 system.
2)Enable industrial use of petascale HPC capability
Cambridge have a long-established and successful industry engagement activity called CORE. CORE will proactively seek industry HPC use cases, promoting the use of HPC and advanced data analytics to drive industrial R&D.
3)Enable new extreme I/O and high performance data analytics capability
The Peta-5 architecture provides new extreme I/O capability combined with emerging machine learning and data analytics capability at a scale not available anywhere else in the UK. This will enable UK research projects and industry to develop new approaches to solving the largest "Big Data" problems addressed to date.
4)Cambridge have a specific partnership with the Alan Turing institute (ATI) to develop novel big data analytic methods and solutions to implement on the Peta-5 system. The ATI will then help disseminate the capability and train both academic and industrial beneficiaries.
5)Enable new advances in health informatics
Peta-5 will provide the advanced data analytics technologies and data safe havens for interdisciplinary research in health informatics, linking leading EPSRC research projects in this domain with the ATI, Addenbrookes and Genomics England (GEL). This combination of linkage and capability will result in ground breaking health informatics capability with potential use within the clinical setting. Partners such as Addenbrookes and GEL provide a direct route to patient health outcomes from the methods developed from the interdisciplinary research undertaken on Peta-5. Such outcomes can then be adopted nationally.
1)Peta-5 will create one of the most powerful academic UK supercomputer facilities.
2)Peta-5 will provide the most cost effective petascale simulation capability in the UK providing unrivalled price performance. This unlocks sustainable HPC for academia and industry, demonstrating affordable petascale simulation capability. This is a game-changing capability widening access and opening new possibilities out of reach for many research projects or company budgets.
3)Peta-5 is currently the only HPC system in the UK aimed at data intensive computing, combining state of the art extreme I/O solid state storage technologies with emerging machine learning and data analytics frameworks. This provides a new capability for tackling the largest "Big Data" problems in UK research and industry.
In particular Peta-5 will:-
1)Enable new petascale academic research projects
Cambridge will pro-actively seek UK academic usage of the Peta-5 system by opening the system up to UK EPSRC researchers free of charge, with strong user support and low inertia application processes and particular emphasis on new users. Cambridge are well-connected to all levels of the Ne-I and via its involvement in many existing HPC academic networks will promote the uptake of the Peta-5 system.
2)Enable industrial use of petascale HPC capability
Cambridge have a long-established and successful industry engagement activity called CORE. CORE will proactively seek industry HPC use cases, promoting the use of HPC and advanced data analytics to drive industrial R&D.
3)Enable new extreme I/O and high performance data analytics capability
The Peta-5 architecture provides new extreme I/O capability combined with emerging machine learning and data analytics capability at a scale not available anywhere else in the UK. This will enable UK research projects and industry to develop new approaches to solving the largest "Big Data" problems addressed to date.
4)Cambridge have a specific partnership with the Alan Turing institute (ATI) to develop novel big data analytic methods and solutions to implement on the Peta-5 system. The ATI will then help disseminate the capability and train both academic and industrial beneficiaries.
5)Enable new advances in health informatics
Peta-5 will provide the advanced data analytics technologies and data safe havens for interdisciplinary research in health informatics, linking leading EPSRC research projects in this domain with the ATI, Addenbrookes and Genomics England (GEL). This combination of linkage and capability will result in ground breaking health informatics capability with potential use within the clinical setting. Partners such as Addenbrookes and GEL provide a direct route to patient health outcomes from the methods developed from the interdisciplinary research undertaken on Peta-5. Such outcomes can then be adopted nationally.
Organisations
- University of Cambridge (Lead Research Organisation)
- NVIDIA Limited (UK) (Project Partner)
- Science and Technology Facilities Council (Project Partner)
- The Alan Turing Institute (Project Partner)
- University College London (Project Partner)
- University of Edinburgh (Project Partner)
- Genomics England (Project Partner)
- The University of Texas at Austin (Project Partner)
- Dell Corporation Ltd (Project Partner)
Publications
Constantinou S
(2022)
Characterising Atmospheres of Cloudy Temperate Mini-Neptunes with JWST
Cosgrove P
(2020)
Neutron clustering as a driver of Monte Carlo burn-up instability
in Annals of Nuclear Energy
Cosgrove P
(2020)
Neutron clustering as a driver of Monte Carlo burn-up instability
Craske J
(2019)
The entrainment and energetics of turbulent plumes in a confined space
in Journal of Fluid Mechanics
Crispin-Ortuzar M
(2023)
Integrated radiogenomics models predict response to neoadjuvant chemotherapy in high grade serous ovarian cancer.
in Nature communications
Cruz CHB
(2021)
Virus-inspired designs of antimicrobial nanocapsules.
in Faraday discussions
Dalladay-Simpson P
(2024)
Distinct vibrational signatures and complex phase behavior in metallic oxygen
in Matter and Radiation at Extremes
Darby J
(2020)
Ab Initio Prediction of Metal-Organic Framework Structures
in Chemistry of Materials
Deakin T
(2021)
Analyzing Reduction Abstraction Capabilities
Della Pia F
(2022)
B 1 - B 2 phase transition of ferropericlase at planetary interior conditions
in Physical Review B
Della Pia F
(2022)
DMC-ICE13: Ambient and high pressure polymorphs of ice from diffusion Monte Carlo and density functional theory.
in The Journal of chemical physics
Dobrisan A
(2023)
Analysis of the behaviour of retaining structures through a novel data interpretation approach
in Soils and Foundations
Dong J
(2021)
Influences of microparticle radius and microchannel height on SSAW-based acoustophoretic aggregation.
in Ultrasonics
Dong Z
(2021)
GPU-Accelerated Discontinuous Galerkin Methods on Polytopic Meshes
in SIAM Journal on Scientific Computing
Duan J
(2023)
An unsteady RANS study of thermal striping in a T-junction with sodium streams mixing at different temperatures
in Frontiers in Energy Research
Eghdami A
(2022)
Branching structure of genealogies in spatially growing populations and its implications for population genetics inference.
in Journal of physics. Condensed matter : an Institute of Physics journal
Emond S
(2020)
Accessing unexplored regions of sequence space in directed enzyme evolution via insertion/deletion mutagenesis.
in Nature communications
Engel M
(2018)
Force-Induced Unravelling of DNA Origami
in ACS Nano
Engel MC
(2020)
Measuring Internal Forces in Single-Stranded DNA: Application to a DNA Force Clamp.
in Journal of chemical theory and computation
Ermanis K
(2023)
Interrogating the Crucial Interactions at Play in the Chiral Cation-Directed Enantioselective Borylation of Arenes.
in ACS catalysis
Ermanis K
(2020)
A Computational and Experimental Investigation of the Origin of Selectivity in the Chiral Phosphoric Acid Catalyzed Enantioselective Minisci Reaction
in Journal of the American Chemical Society
Espinosa JR
(2020)
Liquid network connectivity regulates the stability and composition of biomolecular condensates with many components.
in Proceedings of the National Academy of Sciences of the United States of America
Espinosa JR
(2019)
Breakdown of the law of rectilinear diameter and related surprises in the liquid-vapor coexistence in systems of patchy particles.
in The Journal of chemical physics
Espinosa JR
(2023)
On the possible locus of the liquid-liquid critical point in real water from studies of supercooled water using the TIP4P/Ice model.
in The Journal of chemical physics
Farmakis P
(2020)
WENO schemes on unstructured meshes using a relaxed a posteriori MOOD limiting approach
in Computer Methods in Applied Mechanics and Engineering
Farr SE
(2021)
Nucleosome plasticity is a critical element of chromatin liquid-liquid phase separation and multivalent nucleosome interactions.
in Nature communications
Farrar EHE
(2020)
Computational Studies of Chiral Hydroxyl Carboxylic Acids: The Allylboration of Aldehydes.
in The Journal of organic chemistry
Feldmann S
(2021)
Charge Carrier Localization in Doped Perovskite Nanocrystals Enhances Radiative Recombination.
in Journal of the American Chemical Society
Ferdinand JR
(2021)
Cytokine absorption during human kidney perfusion reduces delayed graft function-associated inflammatory gene signature.
in American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons
Fertitta E
(2021)
Study of disorder in pulsed laser deposited double perovskite oxides by first-principle structure prediction
in npj Computational Materials
Foguet C
(2022)
Genetically personalised organ-specific metabolic models in health and disease.
in Nature communications
Description | The PETA-5 installation is complete and the performance of the system exceeds the original planned performance. The facility is now known as CSD3. excellent useofthe facility has been made with a widening participation especially from new research areas utilising the University of Cambridge's time allocation. The Pandemic has led to a focussing of effort on simulations and analysis of relevance to the pandemic. |
Exploitation Route | The findings are only preliminary based on the installation rather than the use of the system, However the experienced gain in building such a system to budget, on time and with excellent performance will be of interest to others involved in HPC or data centre installation. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | https://www.hpc.cam.ac.uk/CSD3/csd3-platform |
Title | Data supporting 'Numerical Investigation of full helicopter with and without the ground effect' |
Description | In the present work, the aerodynamic performance of the full helicopter PSP in hover flight is investigated using a simplified concept of multiple reference frame (MRF) technique in the context of high-order Monotone Upstream Centred Scheme for Conservation Laws (MUSCL) cell-centred finite volume method. The predictions were obtained for two ground distances and several collective pitch angle at tip Mach number of 0.585. The calculations were made for both out-of-ground-effect (OGE) and in-ground-effect (IGE) cases and compared with experimental data in terms of pressure distribution and integrated thrust and torque and vortex system. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://cord.cranfield.ac.uk/articles/dataset/Data_supporting_Numerical_Investigation_of_full_helico... |
Title | Research data supporting "Computational Investigation of Copper Phosphides as Conversion Anodes for Lithium-Ion Batteries" |
Description | |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://www.repository.cam.ac.uk/handle/1810/308174 |
Title | Research data supporting "High-throughput discovery of high-temperature conventional superconductors" |
Description | Crystal structures of the materials listed in Table. 1 of "High-throughput discovery of high-temperature conventional superconductors", generated using ab initio random structure searching (AIRSS). These are the structures as found to exhibit high-Tc superconductivity after an initial geometry optimization at the listed pressure. They are provided in the CASTEP .cell format and can be easily converted to a number of different formats using the C2x software (https://www.c2x.org.uk/). |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://www.repository.cam.ac.uk/handle/1810/326388 |
Title | Research data supporting 'Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy' |
Description | This file is part of the supporting data for the manuscript 'Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy', and contains scripts and code for running all-atom and coarse-grained simulations of proteins described in the manuscript. The included README file outlines the structure of the archive and contains Lammps installation instructions for running the remaining code supplied. There are five directories in this archive. Four of these correspond to figures in the results section of the manuscript, and each one contains a separate README file detailing its contents. They include Gromacs and Lammps scripts with example simulation set-ups to run simulations corresponding to figures 2 [potentials of mean force of all-atom models], 4 [radius of gyration simulations of coarse-grained models], 5 [direct-coexistence simulations used to determine the phase diagrams] and 6 [multiphase equilibria] of the manuscript. The directories contain input scripts and corresponding parameter and simulation configuration files, as well as example simulation output to benchmark against. The final directory, 'all-model-parameters', contains Lammps parameter files not only for the Mpipi potential, but also for all the other models we have benchmarked in the paper. These parameter files can be used instead of the Mpipi parameters in the simulations provided in the other directories. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://www.repository.cam.ac.uk/handle/1810/329039 |
Title | Research data supporting 'Quantum-mechanical exploration of the phase diagram of water' |
Description | We provide DFT input files, example ice configurations studied, a Mathematica notebook used to collate the results, and numerical results of the free-energy computations. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://www.repository.cam.ac.uk/handle/1810/315122 |
Title | Scientific OpenStack |
Description | The software has been developed as part of the SKA Science Data Processor Platform. It adds new functionality to the OpenStack platform to enable high performance workflows and other monitoring. The software has been added back into the main OpenStack repository for general use. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | This software has wide applicability outside of radio astronomy and the SKA project for which it is developed and at least one company, StackHPC are taking this forward in a commercial context |
URL | http://ska-sdp.org |