Sulis: An EPSRC platform for ensemble computing delivered by HPC Midlands+

Lead Research Organisation: University of Warwick
Department Name: Physics

Abstract

Computer simulation and modelling is increasingly seen as the third pillar of modern science, alongside theory and experiment. Increasingly powerful research computing facilities are required for this activity. Traditionally, the case for these facilities has been made through a scientific need to model larger physical systems, or simulate with increased fidelity. Such calculations benefit from larger and more powerful computers by exploiting ever-larger numbers of computational processing units (cores) within a single calculation.

Sulis will support alternative and complementary ways of exploiting parallelism, specifically high throughput computing. Here the focus is on calculations of modest size, i.e. comparable to those which could be executed on a typical high-end workstation PC in a few days, but replicated thousands of times each running with different inputs or model parameters to solve a single problem. Working through this "ensemble" of calculations could easily take decades on a single multi-core PC, or many months with university level facilities. Sulis will allow researchers to complete workflows such as this in less than a week and hence apply their expertise to a broader range of problems and be reactive to availability of new input data.

There are many computational tasks which fit into this "ensemble computing" model. One pertinent example is uncertainty quantification (UQ). Rather than simulate a single and likely imperfect model, UQ approaches generate ensembles of possible models and simulates them all. This allows predictions to be made statistically. The most likely outcome of the simulated process can be inferred from the ensemble of outputs, along with a confidence level based on the variability over the outputs. The latter is essential if using simulation as a design or decision making tool. A similar concept may be familiar from weather forecasting - models do not make absolute predictions but instead predict a probability of rain based on the fraction of simulations in which this occurs. This approach is applicable to a range of problems in the physical sciences, such as predicting material properties, yield of chemical processes, the motion of bacteria, fusion plasma stability etc.

Other ensemble computing workflows include optimisation problems. Here each of the simulations independently searches a subset of the inputs/parameters for a model, reducing the time taken to locate viable solutions. This is essential, for instance, in studying disordered materials, far closer to the real world than the ideal perfect crystals assumed when seeking only a single solution. Ensemble computing is also used to generate, sample or process large datasets, often for subsequent use as inputs to train modern machine learning algorithms. For this reason Sulis will include a high-capacity multi-petabyte data storage capacity, exploiting modern solid-state storage technologies to reduce bottlenecks arising from reading and writing of data. It will also include a large number of graphics processing units (GPUs) - accelerator devices themselves now ubiquitous for machine-learning applications.

A focus on ensemble computing raises challenges to researchers and software engineers. With thousands of simulations, the probability that at least one will fail is substantial. Software must be resilient to this failure. Similarly, managing the input and output of so many calculations can overload traditional data storage subsystems, requiring users to work with database technology rarely encountered by researchers outside of computer science departments. Hence a key feature of the Sulis service will be Research Software Engineering (RSE) support to assist and train users in tackling these problems, future-proofing the competitiveness of UK researchers to the challenges of computing at ever larger scales.

Planned Impact

The Sulis service will generate and enhance impact via the step change in ensemble computing capability that it will provide to its users, and the cohort of researchers who will receive training and experience in the associated scientific methodology and specialist research software engineering techniques. It will advance the objectives of EPSRC's E-infrastructure strategy by servicing a current unmet need for a substantial high-throughput service. It will impact future computational methodology and software development projects by providing a dedicated platform for ensemble computing as a microcosm for future exascale resources. It will ensure the Midlands remains active at the forefront of UK academic HPC activity, strengthening the region through retention and expansion of key knowledge and skilled personnel, new participation in Tier-2 from previously unrepresented HEIs (e.g. Coventry University - a full partner in the present proposal) and engagement with newer members of the expanded Midlands Innovation Group.

Beyond the academic community, impacts to industry and economy will follow from the research enabled by the service. Our proposal is informed by user requirements linked to the EPSRC research portfolio and strategy, each with clear pathways to impact. These include research into a future hydrogen-fuelled economy where disordered materials will act as energy storage media, reduction of risk in development of new nuclear power generation capacity, development of nanoscale devices for waste heat recovery, CO2 processing for climate change mitigation, solid-state heat pumping for cooling without carbon emmissions, and low energy routes to materials synthesis via crystallisation control. Such projects will have a clear impact toward meeting the ambitious Net-zero 2050 targets for carbon emissions.

Sulis will also support research with economic impact via industrial products and processes. Particular projects include development of new commercially relevant chemical processes using reaction discovery algorithms, computational discovery of new medicines (an existing Prosperity Partnership), and creation of databases for selection of non-crystalline metal-ion battery materials. The service will additionally accelerate impact from existing government investments in data science, including projects based at the Alan Turing Institute, and within the Midlands region which will (for example) enhance academic knowledge distillation via neural network based integration of scientific literature.

Publications

10 25 50