Future-proof massively-parallel execution of multi-block applications

Lead Research Organisation: University of Oxford

Department Name: Oxford e-Research Centre

Abstract

For many years, increasing the clock frequency of microprocessors has led to steady improvements in performance of computer applications. This gave an almost free performance boost to the speed of applications without having to re-write software for each new generation of processors. However, increasing the performance of processors in this manner led to an unsustainable increase in energy consumption. Thus, to gain higher performance chip developers now rely on multiple cores operating in parallel. The latest CPUs have up to 10 cores, each with a vector unit producing up to 8 single precision floating point results per clock cycle, while the latest graphics processors (GPUs) have up to 2688 much simpler cores operating in groups of 32.

This move into manycore computing has led to considerable hardware innovation, and it is likely that the next 10 years will see further rapid evolution in computer architectures. This poses huge challenges to application developers who naturally wish to concentrate on their engineering and scientific applications and how best to model them, without having to worry about the details of modern computer architectures. To address this, there are a range of efforts within scientific computing to develop high-level software packages or frameworks so that the application developer can specify what they want to be computed at a high level, and then the package takes care of the implementation details.

Building on prior EPSRC-funded research to develop a framework called OP2 for unstructured grid applications, this proposal aims to develop a future-proof extension called OPS to handle the needs of multi-block structured grid applications. Developers' applications can be written in FORTRAN or C, using a carefully-designed application programming interface (API), and then OPS generates customised code for the implementation on different hardware target platforms.

As well as customising for the different hardware, two other optimisation approaches will be adopted. One is the use of ``tiling'' to overlap the execution of parallel loops which are usually executed sequentially. This improves both performance and energy efficiency by reusing data within the cache, cutting down on the number of times data is moved between the processor and the main memory. This is something which is becoming increasingly important on modern architectures because the energy cost and time taken for data movement is much greater than for floating point operations.

The other optimisation is the use of run-time optimisation for applications which execute for a long time. The backend implementations are parameterised, with parameters controlling aspects such as the number of threads in a thread block, or the size of a ``tile'' in the tiling optimisation. The optimal values for these parameters are not known a priori, and it could significantly affect the performance. By dynamically varying the values, and timing the consequential changes in performance, we can implement heuristics to iteratively improve the parameter values during the execution.

The new OPS framework will be assessed, both for performance and ease-of-use, by applying it to two important academic CFD codes, ROTOR developed at Bristol by Prof. Chris Allen, and SBLI developed by at Southampton by Prof. Neil Sandham. As well as being important codes in their own right, these are also representative of the needs of other codes within CCP12 (Computational Engineering), the UK Turbulence Consortium, and the UK Applied Aerodynamics Consortium.

Planned Impact

The Dominic Tildesley report on ``A Strategic Vision for UK e-Infrastructure'' includes many examples of the importance of computational modelling in a wide range of industries as well as in government, and it emphasises the importance of software, including the quote:

``This continuing growth in price/performance and power-efficiency comes at a cost: the new systems will be based on complex multi-core and accelerator-based architectures which are much more challenging to program than are today's systems, requiring a revolution in software design and development.''

The importance and challenge of manycore computing is also reflected in its inclusion as one of EPSRC's five main priorities within the ICT area.

Historically, the UK has played a leading role in computational modelling, relative to its size, particularly in areas such as aeronautical CFD and weather prediction. It is very important that this position is maintained, and one of the key ways of achieving this is to ensure that the computational modellers can spend their time developing better models, not worrying about the details of novel computer architectures.

At the same time, we need to train a new generation of scientific computing experts who do understand thoroughly the details of novel computer architectures and how best to exploit them. If £100M is being spent annually in the UK on HPC hardware (a very conservative estimate), then the cost savings are very substantial if software improvements such as tiling and run-time optimisation can deliver a factor 2 increase in performance.

Our research will develop open-source software giving support specifically for multi-block structured-grid applications in engineering and sceince, but more generally it will contribute towards the domain of manycore parallel computing which is vital for the health for the country's capability in computational modelling which underlies so much of modern engineering and science.

Funded Value:

£280,146

Funded Period:

Dec 13 - Feb 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K038494/1

Principal Investigator:

Mike Giles

Research Subject:

Info. & commun. Technol. (50%)

Tools, technologies & methods (50%)

Research Topic:

High Performance Computing (50%)

Parallel Computing (50%)

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
Mike Giles (Principal Investigator)
Gihan Mudalige (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Giles M (2014) GPU Implementation of Finite Difference Solvers

Giles MB (2014) Trends in high-performance computing for engineering calculations. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

Jammy S (2016) Block-structured compressible Navier-Stokes solution using the OPS high-level abstraction in International Journal of Computational Fluid Dynamics

Mudalige G (2016) Auto-vectorizing a large-scale production unstructured-mesh CFD application

Mudalige G (2019) Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation in Journal of Parallel and Distributed Computing

Mudalige G (2015) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers

Reguly I (2015) Vectorizing unstructured mesh computations for many-core architectures in Concurrency and Computation: Practice and Experience

Reguly I (2019) Improving resilience of scientific software through a domain-specific approach in Journal of Parallel and Distributed Computing

Reguly I (2014) The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations

Reguly I (2018) Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS in IEEE Transactions on Parallel and Distributed Systems

Key Findings
Impact Summary
Further Funding
Engagement Activities


Description	The objective of this work was to demonstrate an approach to the creation of future-proof software through separating the specification of what is to be computed from the details of the implementation which achieves this. Flexible code generation techniques were then used to create a number of different back-end implementations for different computer architectures, such as GPUs or many-core CPUs. AWE funded related work on a series of "mini-apps" which demonstrated there is no significant performance penalty in following our flexible approach rather than hand-crafting separate implementations for different platforms.
Exploitation Route	AWE is now considering whether to adopt this approach in their own software development process. In addition, a UK company specialising in mathematical software is building on these ideas in developing their own software for a particular class of applications. The software itself is available on Github under an open source license: https://github.com/OP-DSL/OPS
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Security and Diplomacy
URL	http://www.oerc.ox.ac.uk/projects/ops


Description	AWE is now considering whether to adopt this approach in their own software development process. In addition, a UK company specialising in mathematical software is building on these ideas in developing their own software for a particular class of applications. The software itself is available on Github under an open source license: https://github.com/OP-DSL/OPS
First Year Of Impact	2013
Sector	Security and Diplomacy
Impact Types	Economic


Description	AWE (2014)
Amount	£24,984 (GBP)
Organisation	Atomic Weapons Establishment
Sector	Private
Country	United Kingdom
Start	06/2014
End	11/2014


Description	AWE (2015)
Amount	£24,725 (GBP)
Organisation	Atomic Weapons Establishment
Sector	Private
Country	United Kingdom
Start	07/2015
End	12/2015


Description	Rolls-Royce (2014)
Amount	£29,960 (GBP)
Organisation	Rolls Royce Group Plc
Sector	Private
Country	United Kingdom
Start	10/2014
End	12/2014


Description	Rolls-Royce (2015)
Amount	£36,396 (GBP)
Organisation	Rolls Royce Group Plc
Sector	Private
Country	United Kingdom
Start	01/2015
End	12/2015


Description	CUDA Programming on NVIDIA GPUs
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	One week course on CUDA programming on NVIDIA GPUs, available to both academics and non-academics. Lots of the students have since gone on to use CUDA programming in their research.
Year(s) Of Engagement Activity	2008,2009,2010,2011,2012,2013,2014
URL	http://people.maths.ox.ac.uk/gilesm/cuda/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications