Parallel Scalability of Elliptic Solvers in Weather and Climate Prediction

Lead Research Organisation: University of Bath
Department Name: Mathematical Sciences

Abstract

The UK Met Office is one of the world leaders in weather and climate prediction, and the Met Office's global forecast model is used by many other centres worldwide to drive their individual local area models. However, many short scale phenomena as well as important longterm dynamics are still difficult to predict accurately due to the limited spatial resolution of global models and the additional errors introduced by local area models. Novel computing architectures with more than 10^5 cores provide a chance to push these boundaries and to keep the UK Met Office at the forefront of developments.

Decades of experience with numerical weather and climate prediction have produced a good understanding of the core dynamics inherent in atmospheric flow and of their stable and accurate numerical approximations. As outlined in the call, the Met Office's Unified Model uses lattitude-longitude grids and achieves high efficiency on parallel computers with up to 1000 cores. However, (artificial) grid clustering at the poles renders these grids impractical for large-scale computations, and so one of the core tasks in this NERC Programme is the search for suitable alternative grids. Several separate proposals address this issue. However, the equations governing atmospheric flow form a time-dependent system of differential equations which strongly couple the solution everywhere on the globe (the famous "butterfly effect"). Most current atmospheric dynamics models use semi-implicit time discretisation schemes which provide some global coupling of the equations at each time step. This prevents the system from becoming unstable and as a consequence it allows for larger time steps than fully explicit schemes, which include no global coupling. Since the cost of the forecast is proportional to the number of time steps, a scheme that allows for larger time steps (with satisfactory accuracy) seems preferable. But these benefits come at a price, especially in the context of large-scale problems and on massively parallel architectures. An elliptic system for the pressure has to be solved in each time step, leading to a very large, ill-conditioned algebraic system, the solution of which is difficult to parallelise efficiently. There are two main factors that make the scaling of this elliptic solve to large problem sizes and to large processor numbers difficult: algorithmic scalability and parallel scalability. Since the solution operator for the elliptic equation couples the pressures globally, only multilevel iterative solvers which use a hierarchy of discretisations on grids of varying resolution allow optimal, linear growth in cost (algorithmic scalability). But in a massively parallel computing environment, where global communication is costly, it is necessary to implement these solvers well, keeping most of the communication local, to ensure that the computational cost continues to scale optimally to 100K or more processors (parallel scalability).

This proposal addresses this problem and will thus facilitate the best possible decisions on the design of the Met Office's future dynamical core, thus guaranteeing the UK's competitiveness in this key societal/technological challenge. An optimal scalability of semi-implicit schemes has not been achieved in atmospheric flow up to now, but success of the Project Partners, IWR Heidelberg and Lawrence Livermore National Lab, on simpler model elliptic problems shows that it is possible. The PIs experience over the years in obtaining optimal scalability of elliptic solvers on the most current architectures in various application areas, most notably for elliptic problems from atmospheric flow discretised on latitude-longitude grids up to 256 cores, as well as his status as one of the world's leading theoretical analysts of multilevel iterative elliptic solvers and his links to other world leading groups in this field, mean that that he is ideally equipped to achieve this goal.

Planned Impact

The proposed research will have enormous impact on all those fields where semi-implicit schemes are needed to treat the diffusive parts of the system and where solving elliptic systems is the most costly computational component of the analysis.

Who? The focus of the research is on applying existing and on developing new highly efficient parallel techniques for solving elliptic systems that arise when using a semi-implicit approach to global modelling of atmospheric flow, ensuring optimal scalability on modern multicore architectures. A number of very important areas would benefit directly from the outcomes:
1. The meteorological and climatological forecasting community, in particular the UK Met Office and other users of the UM and related tools .
2. The oceanographic community who deal with similar models that require scalable elliptic solvers.
3. The water resources management sector, where weather and climate prediction has always played a major role.
4. The military, where accurate weather predictions are crucial to the safety of the troops and to strategic decisions.
5. Other environmental sectors, in particular the subsurface waste management or the carbon capture and storage sectors.
Large elliptic systems are at the heart of many environmental models. They usually constitute the bottleneck in large scale computations, particularly on massively parallel computers, and thus all the areas above would benefit directly from the proposed research.

How? In computational approaches to model physical systems, semi-implicit treatment of diffusive processes is used to ensure the the stability of the numerical scheme. This approach leads to large elliptic systems that often constitute the most expensive part in the simulation and classical solvers have been observed not to scale well beyond 1000s of processors on massively parallel architectures. Future simulation and forecasting tools, such as the Met Office's dynamical core, will have to address this issue in a more satisfactory manner. Similar comments could be made regarding many of the other areas mentioned above. Thus, when our methods are applicable we expect them to reduce computation times significantly and bring many important problems within reach of analysis. The benefit will be in removing the computational bottleneck of parallel elliptic solvers which are at the heart of many parallel forecasting and simulation tools in engineering and environmental applications.

What? The primary beneficiary of this research will be the weather and climate prediction sector. Transfer of the benefits of the research will be achieved by working closely with the Met Office and with other similar institutions such as the ECMWF in Reading. Nils Wedi from the ECMWF will act as one of the advisers of the project. The University of Bath and the PI in particular have an excellent and effective established relationship with the Met Office. The entire project will be carried out in close collaboration with the other project teams on this programme and with the Met Office. The PDRA and the PI will visit the Met Office regularly, thus keeping them informed constantly about the progress. Regular progress reports will also be posted on the programme Wiki and given at regular programme meetings. In Phase 2 of the programme the methods developed and tested in this project will be considered for inclusion in the next generation dynamical core at the Met Office and thus become also available to the weather and climate forecasting community as a whole through the many links and clients the Met Office has worldwide. In addition we will also disseminate our results at appropriate international meetings.

Measures of success. The key measure will be adoption of semi-implicit schemes in the new dynamical core and their implementation in Phase 2 of the programme. But through our links with the project partners we hope to also see a wider use of the methods which we develop.

Publications

10 25 50
 
Description As part of the NERC programme on a Next Generation Dynamical Core for the Met Office, Gung-Ho, we carried out a comprehensive study of the parallel scalability of elliptic solvers which are central in semi-implicit time stepping methods for atmospheric flow. We found that both Krylov iterative methods as well as multigrid methods scale almost optimally up to the largest numbers of processors currently available. In terms of absolute performance, a matrix-free, geometric multigrid solver, tailored to the anisotropic nature of atmospheric flow, performed best both in terms of robustness and efficiency. In this first phase of the project we mainly focussed on simplified model problems which were sufficiently complex to genuinely establish whether elliptic solvers were massively scalable, but at the same time simple enough to test a wide and representative range of solvers in the two years of the project. Towards the end of Phase 1 we added more and more of the physical features in the eventual dynamical core.
Exploitation Route We have published all our results in international journals and presented them in several relevant fora. Through some key events, such as the Newton Institute Programme in 2013, the GungHo programme is known world-wide in all major meteorological centres and people are following our progress. As part of this, our results on the massive scalability of elliptic solvers - doubted prior to this project by many in the area - have lead to quite some interest also from other groups. Triggered by this, other teams are now also pursuing similar approaches. More directly, all people involved in the GungHo project are taking it forward also in their own projects and users of the final software that we are developing will be able to benefit directly from it.
Sectors Environment

 
Description Our findings in Phase 1 of this project have centrally informed decisions about the next generation dynamical core at the Met Office. The solvers we tested and in particular the geometric multigrid solver that we designed and that performed best, will now form the solver in the code developed in Phase 2 and beyond. It has also been implemented and tested within the current Met Office Dynamical Core, EndGAME, and will potentially also be used operationally in the very near future. A secondment of Eike Mueller is planned to ensure this impact can be achieved. It has also had a huge impact on developers of other atmospheric flow solvers and even ocean modellers.
Sector Environment
Impact Types Policy & public services

 
Description Programme Grant
Amount £229,800 (GBP)
Funding ID NE/K006754/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 10/2013 
End 11/2015
 
Title DUNE implementation of tensor-product multigrid solver 
Description Implementation of a geometric tensor-product multigrid solver in the DUNE C++ library for grid based applications. The code generalises other implementations of the algorithm since it supports more generals grids and more realistic pressure equations encountered in atmospheric modelling. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The code was used to obtain results in the following paper: Dedner, A., Müller, E. and Scheichl, R., 2016. Efficient multigrid preconditioners for atmospheric flow simulations at high aspect ratio. International Journal for Numerical Methods in Fluids, 80(1), pp.76-102 
URL https://bitbucket.org/em459/tensorproductmultigrid
 
Title Multi-GPU implementation of tensor-product multigrid algorithm 
Description This multi-GPU implementation of a tensor-product multigrid solver was used to solve a simplified pressure correction equation and test the performance of the solver on multi-GPU clusters. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Since the Met Office is considering using chip architectures similar to GPUs for their next generation forecast model, the results obtained with this code will have an impact on the ultimate choice of solver algorithm. The code was used to produce results for the following two publications: Müller, E., Guo, X., Scheichl, R. and Shi, S., 2013. "Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs". Computing and Visualization in Science, 16(2), pp.41-58. Müller, E.H., Scheichl, R. and Vainikko, E., 2015. "Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters". Parallel Computing, 50, pp.53-69. 
URL https://bitbucket.org/em459/ellipticsolvergpu
 
Title Multigrid solver in ENDGame 
Description Implementation of a tensor-product multigrid solver in the Met Office ENDGame code base. The multigrid algorithm is used to solve the pressure equation in implicit time stepping methods, this step is one of the computational bottlenecks in the forecast model. The code was tested in operational configurations of the Unified Model. 
Type Of Technology Software 
Year Produced 2013 
Impact It is planned to use this solver in operational runs of the Met Office Unified Model in the future. As already demonstrated in several test runs, this can reduce the time spent in the implicit solver by a factor of at least two. Ultimately this has the potential of increasing the Met Office's forecast capabilities since it will allow the production of more accurate forecasts in a shorter time. 
 
Title Tensor product multigrid for atmospheric equations 
Description Fortran 90 implementation of a matrix-free tensor-product multigrid solver for the pressure correction equation on structured grids. For first tests of the feasibility of the multigrid method a simplified model equation was solved. This equation reproduces key characteristics of the full equation encountered in atmospheric models. The code also has the option to run a standard, single-level method which is similar to what is currently used by the Met Office, thus allowing the comparison of the two methods. 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact This code was used to compare matrix-free geometric implementations to matrix-based AMG solvers. Those tests were used to inform the choice of solver algorithm in the project, and this will have an impact on the choice of solver for the future LFRic model, which is currently implemented at the Met Office. Similar solvers were implemented in the Met Office ENDGame Dynamical core and it is planned for those new multigrid solvers to be used in operational applications in the future. The software was also crucial to obtain results in the following paper: Müller, E.H. and Scheichl, R., 2014. Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction. Quarterly Journal of the Royal Meteorological Society, 140(685), pp.2608-2624. 
URL https://bitbucket.org/em459/tensorproductmultigrid