Future-proof massively-parallel execution of multi-block applications

Lead Research Organisation: University of Oxford
Department Name: Oxford e-Research Centre

Abstract

For many years, increasing the clock frequency of microprocessors has led to steady improvements in performance of computer applications. This gave an almost free performance boost to the speed of applications without having to re-write software for each new generation of processors. However, increasing the performance of processors in this manner led to an unsustainable increase in energy consumption. Thus, to gain higher performance chip developers now rely on multiple cores operating in parallel. The latest CPUs have up to 10 cores, each with a vector unit producing up to 8 single precision floating point results per clock cycle, while the latest graphics processors (GPUs) have up to 2688 much simpler cores operating in groups of 32.

This move into manycore computing has led to considerable hardware innovation, and it is likely that the next 10 years will see further rapid evolution in computer architectures. This poses huge challenges to application developers who naturally wish to concentrate on their engineering and scientific applications and how best to model them, without having to worry about the details of modern computer architectures. To address this, there are a range of efforts within scientific computing to develop high-level software packages or frameworks so that the application developer can specify what they want to be computed at a high level, and then the package takes care of the implementation details.

Building on prior EPSRC-funded research to develop a framework called OP2 for unstructured grid applications, this proposal aims to develop a future-proof extension called OPS to handle the needs of multi-block structured grid applications. Developers' applications can be written in FORTRAN or C, using a carefully-designed application programming interface (API), and then OPS generates customised code for the implementation on different hardware target platforms.

As well as customising for the different hardware, two other optimisation approaches will be adopted. One is the use of ``tiling'' to overlap the execution of parallel loops which are usually executed sequentially. This improves both performance and energy efficiency by reusing data within the cache, cutting down on the number of times data is moved between the processor and the main memory. This is something which is becoming increasingly important on modern architectures because the energy cost and time taken for data movement is much greater than for floating point operations.

The other optimisation is the use of run-time optimisation for applications which execute for a long time. The backend implementations are parameterised, with parameters controlling aspects such as the number of threads in a thread block, or the size of a ``tile'' in the tiling optimisation. The optimal values for these parameters are not known a priori, and it could significantly affect the performance. By dynamically varying the values, and timing the consequential changes in performance, we can implement heuristics to iteratively improve the parameter values during the execution.

The new OPS framework will be assessed, both for performance and ease-of-use, by applying it to two important academic CFD codes, ROTOR developed at Bristol by Prof. Chris Allen, and SBLI developed by at Southampton by Prof. Neil Sandham. As well as being important codes in their own right, these are also representative of the needs of other codes within CCP12 (Computational Engineering), the UK Turbulence Consortium, and the UK Applied Aerodynamics Consortium.

Planned Impact

The Dominic Tildesley report on ``A Strategic Vision for UK e-Infrastructure'' includes many examples of the importance of computational modelling in a wide range of industries as well as in government, and it emphasises the importance of software, including the quote:

``This continuing growth in price/performance and power-efficiency comes at a cost: the new systems will be based on complex multi-core and accelerator-based architectures which are much more challenging to program than are today's systems, requiring a revolution in software design and development.''

The importance and challenge of manycore computing is also reflected in its inclusion as one of EPSRC's five main priorities within the ICT area.

Historically, the UK has played a leading role in computational modelling, relative to its size, particularly in areas such as aeronautical CFD and weather prediction. It is very important that this position is maintained, and one of the key ways of achieving this is to ensure that the computational modellers can spend their time developing better models, not worrying about the details of novel computer architectures.

At the same time, we need to train a new generation of scientific computing experts who do understand thoroughly the details of novel computer architectures and how best to exploit them. If £100M is being spent annually in the UK on HPC hardware (a very conservative estimate), then the cost savings are very substantial if software improvements such as tiling and run-time optimisation can deliver a factor 2 increase in performance.

Our research will develop open-source software giving support specifically for multi-block structured-grid applications in engineering and sceince, but more generally it will contribute towards the domain of manycore parallel computing which is vital for the health for the country's capability in computational modelling which underlies so much of modern engineering and science.

Publications

10 25 50

publication icon
Giles MB (2014) Trends in high-performance computing for engineering calculations. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

publication icon
Jammy S (2016) Block-structured compressible Navier-Stokes solution using the OPS high-level abstraction in International Journal of Computational Fluid Dynamics

publication icon
Reguly I (2018) Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS in IEEE Transactions on Parallel and Distributed Systems

publication icon
Reguly I (2019) Improving resilience of scientific software through a domain-specific approach in Journal of Parallel and Distributed Computing

 
Description Already AWE is learning from us the potential for future-proof software through code generation.
Exploitation Route AWE is funding related work to explore the potential for their applications.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Security and Diplomacy

URL http://www.oerc.ox.ac.uk/projects/ops
 
Description AWE is funding us to perform some additional related work to explore the potential for their applications.
First Year Of Impact 2013
Sector Security and Diplomacy
Impact Types Economic

 
Description AWE (2014)
Amount £24,984 (GBP)
Organisation Atomic Weapons Establishment 
Sector Private
Country United Kingdom
Start 06/2014 
End 11/2014
 
Description AWE (2015)
Amount £24,725 (GBP)
Organisation Atomic Weapons Establishment 
Sector Private
Country United Kingdom
Start 07/2015 
End 12/2015
 
Description Rolls-Royce (2014)
Amount £29,960 (GBP)
Organisation Rolls Royce Group Plc 
Sector Private
Country United Kingdom
Start 10/2014 
End 12/2014
 
Description Rolls-Royce (2015)
Amount £36,396 (GBP)
Organisation Rolls Royce Group Plc 
Sector Private
Country United Kingdom
Start 01/2015 
End 12/2015
 
Description CUDA Programming on NVIDIA GPUs 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact One week course on CUDA programming on NVIDIA GPUs, available to both academics and non-academics.

Lots of the students have since gone on to use CUDA programming in their research.
Year(s) Of Engagement Activity 2008,2009,2010,2011,2012
URL http://people.maths.ox.ac.uk/gilesm/cuda/