Multiscale Modelling of Magnetised Plasma Turbulence

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Parallel Computing Centre

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
publication icon
Jackson A (2015) Optimising Performance through Unbalanced Decompositions in IEEE Transactions on Parallel and Distributed Systems

publication icon
Knight P (2012) CENTORI: A global toroidal electromagnetic two-fluid plasma turbulence code in Computer Physics Communications

 
Description The grant EP/H00212X/1 held at the University of Edinburgh, was part of a broader scientific project on the computational modelling of plasma turbulence in magnetised plasmas. The science was largely carried out under the linked grants EP/H002081/1 and EP/H002189/1. As detailed in the original proposal, the role of the project members from EPCC at the University of Edinburgh is to advise the project on the high performance computing (HPC) related aspects of the broader scientific project. This report covers the HPC optimisation work, where the EPCC effort funded under grant EP/H00212X/1 was deeply involved.



EPCC software development effort was concentrated on the applications GS2 and CENTORI, which were utilised for the plasma turbulence simulations carried out under grant EP/H002081/1. GS2 and CENTORI are both parallelised in MPI and highly suitable for exploitation on a modern HPC platform such as the HECToR system. The major application exploiting the HECToR resource to deliver

scientific outcomes under EP/H002081/1 was the GS2 code, which solves gyrokinetic equations self-consistently with Maxwell's equations to model microturbulence in magnetised plasmas. CENTORI is a two-fluid electromagnetic turbulence code, that was recently developed to study magnetically confined fusion plasmas on energy confinement time scales.



The most impressive HPC achievement delivered by this project is a performance enhancement that exceeds a factor of four for running a typical GS2 benchmark on 4096 processors. This resulted from a number of optimisations, which working together have dramatically improved the scalability of GS2 and made the code more suitable for the next generation of HPC platforms, including the UK's forthcoming HPC service Archer.





GS2 work



In this section we provide more details on the work on GS2 during the life time of the project grants. For further detail consult the references and web-links at the end of this summary.



Early on in the project, indirect addressing was identified as a bottleneck affecting GS2's performance on contemporary hardware, such as the HECToR system [1]. Indirect addressing is deployed in various parts of the source code, including the performance critical data redistribution routines. The use of indirect addressing allows GS2 to be extremely flexible with its data layout. The plasma physicists is free to choose from a number of data layouts that which is most suitable for his/her problem. This project secured additional funding from the HECToR dCSE support program to support the software development effort to tackle this issue.



In collaboration with the developer funded by the HECToR dCSE grant, the project designed, implemented and tested revised data redistribution routines into the GS2 source. These revised routines significantly reduce the use of indirect addressing in the most performance critical parts of the code. The final redistribution routines were committed to the GS2 source repository, and where between 40% and 50% faster than the originals [2], roughly achieving the performance improvement that was anticipated during the initial exploratory work.



During the work on the data redistribution routines, further performance limiting issues in data communications became apparent. The original code aimed for computational load balance between MPI processes, but for some process counts this requirement forced extremely large data transfers that were saturating the communication network. An option was added to GS2, to make a modest sacrifice in computational load balance, that could result in substantially more efficient communications. In particular on large process counts performance improvements as high as 15% were achieved [2, 3]. This interesting optimisation approach may be applicable to many other HPC applications, in particular those using spectral methods to solve non-linear problems. A journal publication on this work is in preparation and an advanced draft is publicly available [3].



In the most recent optimisation work carried out under this project, a variety of performance tools were used to target more advanced physics simulations that include GS2's advanced collision operator, developed by Barnes et al [4]. The optimisation work removed several bottlenecks that were severely hampering the scalability of GS2: (i) reducing unnecessary computational work that was being carried out in expensive initialisation routines; (ii) exploiting sub-communicators to reduce the need for global communications in velocity space integrals associated with the collision operator; and (iii) velocity space integrals in all parts of the program were further improved by introducing in-place MPI communication calls that removed the need for buffer packing. These optimisations are particularly valuable at larger processor counts, and have achieved a speed-up exceeding four for running a typical GS2 benchmark problem on 4096 cores. Further details on this work are available in slides from a recent group meeting [5].





CENTORI work



CENTORI was recently developed. The software development effort provided by EPCC ensured an efficient implementation of the computational engine and the MPI parallelisation of the application [6].





Further information:



[1] Joachim Hein, Xu Guo, "Upgrading the FFTs in GS2", HECToR dCSE project final report, http://www.hector.ac.uk/cse/distributedcse/reports/GS2/GS2.pdf



[2] Adrian Jackson, "Improved Data Distribution Routines for Gyrokinetic Plasma

Simulations", HECToR dCSE project final report, http://www.hector.ac.uk/cse/distributedcse/reports/GS202/GS202.pdf



[3] Adrian Jackson, Joachim Hein, Colin M. Roach, "Optimising Performance Through Unbalanced Decompositions", http://arxiv.org/abs/1205.2509



[4] M. Barnes, I.G. Abel, W. Dorland, D.R. Ernst, G.W. Hammett, P. Ricci, B.N. Rogers, A.A. Schekochihin, and T. Tatsuno, Physics of Plasmas 16, 072107 (2009)



[5] D. Dickinson, C.M. Roach, A. Jackson, J. Hein, "Report on recent upgrades to GS2", http://gyrokinetics.sourceforge.net/wikifiles/CMR/GS2DEVMEETING_21May2013/DDICKINSON.pdf



[6] P.J. Knight, A. Thyagaraja, T.D. Edwards, J. Hein, M. Romanelli, K.G. McClements, Computer Physics Communications 183 (2012) 2346-2363
Exploitation Route The goal of simulating plasma turbulence is to suggest routes to optimise the performance of future plasma confinement devices. Understanding and limiting turbulence is crucial to obtain long confinement times. It is hoped that in the future fusion plasmas can be exploited for energy production in a fusion power plant. The improved computational efficiency of GS2 will result in a reduced consumption of HPC resources. For a given resource level this allows plasma physicists studying micro turbulence to use larger and more realistic models in their simulation of magnetic confined fusion plasmas.
Sectors Digital/Communication/Information Technologies (including Software),Energy,Environment

 
Description This project formed part of a bigger project consisting out of three linked project proposals, with the aim of investigating turbulence in magnetised fusion plasmas. The focus of the here described project was on the high performance computing (HPC) related aspects of the overall project. As discussed in the key findings, the project was instrumental in gaining a better understanding of the performance obstacles of the key application GS2 on contemporary HPC architectures, when deploying thousands of compute cores. This improved understanding of GS2 has been utilised to improve performance of the data redistribution routines. A modified data decomposition, allowing for a slight increase in load imbalance, substantially reduced the communication costs of the application and improved the overall performance. This approach may be applicable to many other HPC applications which are deploying spectral methods. The decomposition techniques, which could have an impact in computational science beyond plasma physics, are described in detail in a recent research publication. The project also contributed to a significant performance boost of a factor larger than four for a test simulation utilising several thousand compute cores, which included GS2's advanced collision operator. The improvements have been committed to the GS2 source and are currently of benefit to all users of the GS2 code family on HPC platforms in the UK and elsewhere. The user community of the GS2 code family includes scientists interested in magnetically confined fusion plasmas as well as astrophysical plasmas. The improved, faster application allows for a more efficient use of HPC resources and a more realistic simulation of micro turbulence in magnetised plasmas. The GS2 code family is open source and free to download. The code improvements are therefore of benefit beyond the original project groups. The GS2 code, improved under this project, was deployed in one of the linked projects to perform first principles calculations of plasma turbulence. These calculations contributed significantly to the understanding of the basic mechanisms that may be responsible for the formations of localised regions of excellent confinement in a fusion plasma. Predictions from these calculations are currently tested experimentally. An improved understanding of these localised regions may be exploited to improve the performance of magnetic confinement devices and increase the chances of utilising fusion plasmas.
 
Description HECToR Distributed CSE Support
Amount £57,997 (GBP)
Organisation University of Edinburgh 
Department High-End Computing Terascale Resource (HECToR)
Sector Academic/University
Country United Kingdom
Start 03/2011 
End 01/2012