Future-proof massively-parallel execution of multi-block applications

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

For many years, increasing the clock frequency of microprocessors has led to steady improvements in performance of computer applications. This gave an almost free performance boost to the speed of applications without having to re-write software for each new generation of processors. However, increasing the performance of processors in this manner led to an unsustainable increase in energy consumption. Thus, to gain higher performance chip developers now rely on multiple cores operating in parallel. The latest CPUs have up to 10 cores, each with a vector unit producing up to 8 single precision floating point results per clock cycle, while the latest graphics processors (GPUs) have up to 2688 much simpler cores operating in groups of 32.

This move into manycore computing has led to considerable hardware innovation, and it is likely that the next 10 years will see further rapid evolution in computer architectures. This poses huge challenges to application developers who naturally wish to concentrate on their engineering and scientific applications and how best to model them, without having to worry about the details of modern computer architectures. To address this, there are a range of efforts within scientific computing to develop high-level software packages or frameworks so that the application developer can specify what they want to be computed at a high level, and then the package takes care of the implementation details.

Building on prior EPSRC-funded research to develop a framework called OP2 for unstructured grid applications, this proposal aims to develop a future-proof extension called OPS to handle the needs of multi-block structured grid applications. Developers' applications can be written in FORTRAN or C, using a carefully-designed application programming interface (API), and then OPS generates customised code for the implementation on different hardware target platforms.

As well as customising for the different hardware, two other optimisation approaches will be adopted. One is the use of ``tiling'' to overlap the execution of parallel loops which are usually executed sequentially. This improves both performance and energy efficiency by reusing data within the cache, cutting down on the number of times data is moved between the processor and the main memory. This is something which is becoming increasingly important on modern architectures because the energy cost and time taken for data movement is much greater than for floating point operations.

The other optimisation is the use of run-time optimisation for applications which execute for a long time. The backend implementations are parameterised, with parameters controlling aspects such as the number of threads in a thread block, or the size of a ``tile'' in the tiling optimisation. The optimal values for these parameters are not known a priori, and it could significantly affect the performance. By dynamically varying the values, and timing the consequential changes in performance, we can implement heuristics to iteratively improve the parameter values during the execution.

The new OPS framework will be assessed, both for performance and ease-of-use, by applying it to two important academic CFD codes, ROTOR developed at Bristol by Prof. Chris Allen, and SBLI developed by at Southampton by Prof. Neil Sandham. As well as being important codes in their own right, these are also representative of the needs of other codes within CCP12 (Computational Engineering), the UK Turbulence Consortium, and the UK Applied Aerodynamics Consortium.

Publications

10 25 50

publication icon
McIntosh-Smith S (2014) Supercomputing

publication icon
Reguly I (2018) Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS in IEEE Transactions on Parallel and Distributed Systems

publication icon
Reguly I (2019) Improving resilience of scientific software through a domain-specific approach in Journal of Parallel and Distributed Computing

publication icon
Reguly, I.Z. (2014) The OPS Domain Specific Abstraction for Multi-Block Structured Grid Computations in WOLFHPC: Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing

 
Description In this award we explored whether is it possible to develop new scientific software in such a way that the software would run quickly and efficiently on a diverse range of new supercomputers. This wasn't possible previously, with specialised versions of software being optimised for each type of supercomputer. The current approach is very time consuming, expensive, and error prone. We have now demonstrated that it is possible to develop scientific software which is "performance portable" across a wide variety of very diverse supercomputer architectures, such as GPUs from Nvidia or AMD, or CPUs from Intel or ARM.
Exploitation Route Others could use similar techniques to those developed by us to create their own performance portable software applications. For example, anyone using CFD-like techniques should be able to exploit our findings, delivering much higher fidelity CFD simulations, or much quicker simulations at the same resolution than before. They will also be able to exploit GPUs using our new techniques.
Sectors Aerospace, Defence and Marine,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Demonstrated that performance portable applications are possible, and so caused Intel to parter with us and fund a new parallel computing centre in Bristol. This has also led to recent collaborations with Rolls-Royce and a new EPSRC Prosperity Partnership proposal submission in February 2018, with academic partners including a different group at Oxford, Cambridge, Edinburgh (EPCC) and Warwick.
First Year Of Impact 2015
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description ASiMoV prosperity partnership 
Organisation Centre Modelling and Simulation (CFMS)
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation Rolls Royce Group Plc
Country United Kingdom 
Sector Private 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Edinburgh
Department Edinburgh Parallel Computing Centre (EPCC)
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation University of Warwick
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description ASiMoV prosperity partnership 
Organisation Zenotech
Country United Kingdom 
Sector Private 
PI Contribution Contributing advanced computer architecture research to the ASiMoV prosperity partnership.
Collaborator Contribution Investigating full jet-engine simulation at high enough accuracy to do the full design virtually. Rolls-Royce is the lead, other universities bringing expertise in different parts of the problem.
Impact Project only just begun, so no outputs yet.
Start Year 2018
 
Description Intel Parallel Computing Center (IPCC) 
Organisation Intel Corporation
Country United States 
Sector Private 
PI Contribution Working with Intel, using the techniques developed as part of this award, we have ported our software to Intel's next generation, massively parallel computer architectures, such as the Xeon Phi.
Collaborator Contribution Intel have provided funding, personnel, advice, code porting assistance, and access to pre-release hardware.
Impact Papers, presentations, invitations for research visits, internships etc.
Start Year 2014
 
Description Mini-app consortium 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Joint publications, workshops, meetings, both in the UK and internationally.
Collaborator Contribution Joint publications, workshops, meetings, both in the UK and internationally.
Impact Joint papers, creation of mini-app consortium, annual UK Many-core Developer workshops.
Start Year 2012
 
Title ROTORSIM 
Description ROTORSIM is a structured grid, multiblock, compressible finite-volume computational fluid dynamics (CFD) code developed by Prof. Christian Allen, head of the Bristol Aerodynamics research group. This award has contributed a new design for the code to make it highly performance portable, and uses OpenCL so that it can exploit modern GPUs and Intel Xeon Phi to achieve much higher performance on next generation supercomputers. 
Type Of Technology Software 
Year Produced 2016 
Impact ROTORSIM is also being used as one of the new benchmarks in the FP7 Mont Blanc European Exascale research project, making it one of the first applications to run at large scale on an ARM-based supercomputer. ROTORSIM is one of the few applications to run at extremely large scale on the Titan supercomputer at Oak Ridge national laboratory in the USA. ROTORSIM also helped form the basis of the first phase of the Intel Parallel Computing Center at the University of Bristol.