libhpc Stage II: A Long-term Solution for the Usability, Maintainability and Sustainability of HPC Software

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Libhpc aims to provide a framework for building, running and maintaining future-proof, sustainable HPC applications. Lack of reusability and portability has been a major barrier to adoption in HPC code, leading to significant loss in what is often the most creative output of a research project and one with the greatest potential to benefit the wider research community and the general public. Libhpc gives developers of HPC code a means of capturing and sharing their creativity with the whole HPC community. By providing a means for expressing application structure abstractly, libhpc enables HPC developers to share equivalent implementations for differing architectures via libhpc repositories. The same abstract structure also produces highly re-usable applications, which are easily adapted at and even during runtime to execute on widely different platforms, architectures and resource groups. We believe that libhpc will give the HPC community a "best-practice" reference for navigating the transition to virtual infrastructures which is now one of the principal requirements of both researchers and infrastructure providers, and could play a critical role in the development of an integrated e-infrastructure policy, supporting both HPC and data science, In libhpc2, we propose to create a production-ready environment from the prototype framework developed in libhpc. The libhpc 2 environment will include package-based installs, admin interfaces, a community component repository and DSL support; it will also support internet-scale networking. libhpc2 will also place a strong emphasis on stakeholder engagement, community building and skills development through responsive documentation, workshops and hackdays. We will aim to introduce libhpc 2 to two target domains, fluid dynamics and bioinformatics, through "hands-on" hack-days in which we will componentise domain software and publish it to libhpc repositories. We will also hold a workshop for infrastructure providers to ensure that infrastructure requirements are met and to encourage uptake and deployment on national infrastructures.

Planned Impact

Libhpc2 will have impact on the following groups:
Developers
* Spectral/hp Element group, Imperial College London - Led by Professor Spencer Sherwin, the group have developed the Nektar++ Finite Element code in collaboration with Dr Mike Kirby at the University of Utah and are partners in the libhpc 2 project. They support a base of both academic and non-academic users and recognised the significant potential that libhpc offers to their code and its users from our joint work in the first phase of the libhpc project.

End-users
* National Heart and Lung Institute - The NHLI collaborate closely with Professor Spencer 's group to apply the Nektar++ methods to model vitally important areas of heart physiology including aortic blood flow and cardiac electrophysiology.
* McLaren Racing, develop world-class engineering solutions in a competitive racing environment. Through Professor Sherwin they are looking for easy optimisation of large-scale computations across multiple hardware resources and this has been demonstrated through the Libhpc/Nektar++ work undertaken so far.

A further group of identified end-user beneficiaries are from the Bioinformatics domain:
* Dr Sarah Butcher and the Imperial College London Bioinformatics Support Service provide support and computational resources to a wide group of bioinformatics researchers, students and research projects. The ability to provide tools to simplify the specification of complex pipelines while making more effective use of their resources is of great interest.
* Dr David Aanensen at Imperial College London develops pipelines to support use cases in infectious disease epidemiology and can gain from the ability to access faster execution and heterogeneous resources.
* Life Technologies provide Next Generation Sequencing technologies for which cost is decreasing and data volumes are increasing. In addition, the market is expanding from one where the end users are specialists with computing knowledge to include a much wider group of end users who may not have the skills to manage and operate large-scale HPC resources. Libhpc presents an ideal solution to these challenges with the framework's support for simplified job specification and the ability to target remote, heterogeneous resources.

e-Infrastructure Operators
* As well as representing a pathway to a large class of users Dr Butcher, as manager of the Imperial College London Bioinformatics Support Service, is responsible for maintaining an eco-system of methods and machines that provide the key infrastructure for the large community of Imperial College Life Science workers.
* The Edinburgh Parallel Computing Centre is a leading European centre of expertise in advanced research, technology transfer and the provision of supercomputer services to academia and business.
* EPSRC and others have charged the Software Sustainability Institute with ensuring that software underpinning the UK's scientific research is developed in the most productive and sustainable manner. The SSI could clearly play a leading role in the development and maintenance of the UK research software repository or method cloud.
 
Description The libhpc framework model has been designed to support a range of individuals involved in running computationally intensive scientific software. It aims to help scientists, researchers and domain developers to more easily and efficiently specify the computing tasks that they want to achieve and to make use of a variety of different types of computing platform to undertake these tasks. It further aims to support computer scientists and operators of computing infrastructure in making platforms more easily accessible to the user community. With the emergence of new models of computing such as Infrastructure-as-a-Service clouds and remotely accessible cluster infrastructure such as local and national HPC services, users have more opportunity than ever to access High Performance Computing (HPC) capabilities but doing so is often challenging, time-consuming and requires extensive communication with a range of entities. Libhpc stage II has made aspects of the framework available as a series of open source services that address these challenges.

Templates and Profiles for Scientific Software (TemPSS) (https://github.com/london-escience/tempss) is a service that allows application developers to provide their users with templates - visual, interactive trees that represent the decision space of all the different configuration parameters for a complex scientific code. Templates are displayed via a web-based interface and end users can set up configurations for their computing tasks by populating a template tree with their desired values. The web interface provides in-line documentation and validation of the parameters a user provides. Partially or fully populated templates can be saved as profiles that can be shared between users and developed collaboratively by a range of experts in different aspects of the computational task(s) to be undertaken.

libhpc-cf - the libhpc coordination forms library (https://github.com/london-escience/libhpc-cf) - is a Python software library for using functional-style operators (coordination forms) and software components to define computational tasks. Using libhpc-cf provides greater flexibility than traditional workflow environments with scope to specify more complex orchestration processes. Significantly, the component and coordination form structures used by the library can also have multiple implementations suiting different underlying computing platforms or different usage scenarios.

The libhpc deployer (https://github.com/london-escience/libhpc-deployer) is a Python software library that provides a straightforward, metadata-focused interface for running jobs using batch-style scientific High Performance Computing applications on different types of computing platforms. At present, PBS-based clusters, OpenStack private cloud computing platforms and the Amazon EC2 public cloud are supported as target platforms with support for further platforms planned in the future.

A series of end-to-end domain-specific web-based applications have also been developed with the project partners who provided use cases. In several instances, the core software processes in the above tools have been built as part of or alongside these use-case applications. Nekkloud is a web-based environment for running jobs via the Nektar++ spectral/hp element framework. It offers a straightforward user interface for specifying and running Nektar++ jobs on cluster and public or private cloud platforms. MD-PYpe provides a similar tool for the GROMACS molecular dynamics software allowing users to build pipelines of GROMACS processes through a visual interface. MA-PYpe allows the running of a Sanger Institute microbial sequence assembly pipeline offering end users in a variety of different environments a straightforward means of accessing this pipeline in a much more straightforward manner than currently possible. MA-PYpe supports running sequencing tasks on cloud resources or local HPC clusters and further tool, BioPYpe supports a simplified approach to designing and running bioinformatics pipelines.
Exploitation Route The outputs of libhpc are applicable to a wide range of potential users. The Nekkloud and TemPSS tools are already being taken forward through an EPSRC Impact Acceleration Account award to make them available to users of the Nektar++ software. The other web-based tools target specific applications and user groups and we see particular potential for the MA-PYpe tool which makes a Sanger Institute genome assembly pipeline available to a much wider range of potential users. Other sectors where the tools may be applied and provide a useful system for running complex scientific processes on different types of computing infrastructure have been demonstrated by some of the Nektar++ use cases shown through Nekkloud.

This work has been taken forward through a project "Simplifying High Performance Computing Access for the Nektar++ Framework" funded under Imperial College London's EPSRC Impact Acceleration Account. Aspects of the libhpc II work relating to handling of constraints between configuration parameters in HPC codes are also being investigated further in a short project project funded under the Platform for Research in Simulation Methods EPSRC platform grant at Imperial College London.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Construction,Electronics,Energy,Environment,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport

URL http://www.imperial.ac.uk/london-e-science/projects/libhpc/
 
Description EPSRC Impact Acceleration Account - Imperial College London
Amount £78,409 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 12/2016
 
Description EPSRC RSE Fellowship
Amount £639,259 (GBP)
Funding ID EP/R025460/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2018 
End 01/2023
 
Title Libhpc Deployer Library 
Description The libhpc deployer is a Python software library for running batch-style HPC applications on different underlying computing platforms. At present the library supports running jobs on PBS-based HPC clusters, OpenStack private clouds and the Amazon EC2 public cloud service. The library uses YAML metadata for describing target platforms, applications and jobs. The library has a plugin-style design that allows support for additional target platforms to be added. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The declarative style of the YAML metadata used to configure the library provides a simplified means for developers and end-user scientists and researchers to specify their jobs and switch much more easily between cluster and cloud platforms. For example, where users wish to undertake some jobs on a local cluster platform and take advantage of cloud infrastructure for other aspects of their workload, the libhpc deployer provides support for straightforwardly selecting a different target platform on a per-job basis. 
URL https://github.com/london-escience/libhpc-deployer
 
Title TemPSS - Templates and Profiles for Scientific Software 
Description TemPSS (Templates and Profiles for Scientific Software) is a software library for representing the complex decision space of configuration parameters that are present in many High Performance Computing applications. It helps with the task of setting up job configurations for running jobs using an HPC application. The library provides a means of presenting a visual, interactive tree setting out these parameters in a manner that makes it easier for end users, method developers, computer scientists and computing infrastructure operators to collaborate on efficiently configuring HPC codes to run on different computing platforms. The TemPSS software includes a reference web-based interface demonstrating the use of the system but the tool is also designed to be integrated into other HPC tools or applications. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact TemPSS makes use of the concepts of software parameter templates and application profiles. Templates represent the parameter decision space of an application while profiles provide instantiations of a template with parameters specific to a particular job or task to be accomplished with the target application. Significantly, TemPSS allows the creation of partial profiles - incomplete sets of parameters - and supports collaboration between different entities who can each partially instantiate a profile with the parameters that they understand how to specify optimally. An end-user can then be presented with an almost complete profile that they can finalise to run the target application. In addition to being available as a generic open source tool (https://github.com/london-escience/tempss) that can be configured to support a wide range of scientific software, a series of TemPSS templates have been built to support different solvers in the Nektar++ spectral/hp element framework. This work has been undertaken as part of the libhpc Stage II project in collaboration with the Nektar++ team, partners in the libhpc II project. A deployment of the tool configured with these Nektar++ templates has been released to the Nektar++ user community and is available on the Nektar++ website (http://www.nektar.info/tempss). 
URL https://github.com/london-escience/tempss
 
Title libhpc-cf: Libhpc Coordination Forms Library 
Description The libhpc-cf coordination forms library is a Python software library that allows creation of software component metadata and offers a set of "coordination forms" - functional style operators for coordinating the flow of control and data between the software components represented by component metadata. In practice, this means that developers can take existing software tools or libraries and wrap them as software components using the features provided by the libhpc-cf library. These components can then be linked to form more advanced processes by applying coordination forms to them. Coordination forms allow the specification of richer orchestration than traditional workflow languages and the library is extensible to allow the addition of further coordination forms. Coordination forms can vary from basic structures, such as specifying that a set of dependent components are processed sequentially or that a set of independent components may be processed in parallel, to advanced domain specific languages. A further key benefit of both components and coordination forms specified within the libhpc-cf library is that both can have multiple implementations allowing different approaches to be used to undertake a given task depending on the type of target computing platform to be used or the problem being addressed. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The libhpc-cf library provides a modern Python implementation of a general approach pioneered by Darlington, Guo et al. in their 1995 paper "Functional Skeletons for Parallel Coordination". The library offers a demonstration of the differences and power of coordination forms when compared to existing workflow languages or systems. The ability to handle multiple implementations of software components and the orchestration processes used to control them is of particular use in modern heterogeneous computing environments. 
URL https://github.com/london-escience/libhpc-cf