Baskerville: a national accelerated compute resource

Lead Research Organisation: University of Birmingham
Department Name: School of Computer Science

Abstract

Modern science demands high-performance computing platforms that support a diverse range of activities, from quantum mechanical simulations of high-performance electronic materials, large scale molecular dynamics simulations, to data-driven machine learning-based analysis of high-resolution, high-content scientific images. We propose a science-based, service oriented, flexible, and fully-featured mid-level national facility for high performance computing - Baskerville (named for John Baskerville, the enlightenment-era Birmingham industrialist) - that supports a wide range of applications and interaction modes using a new technological platform featuring the new generation of NVIDIA A100 GPUs to provide a state-of-art-accelerated facility that will facilitate new types of research that are impractical or impossible on the existing national infrastructure. We will be amongst the first to receive the next generation multi-GPU systems with NVLINK interconnect through a commitment from our technology partners to supply us on their early shipment programme.

Our consortium of partners, led by the University of Birmingham, brings together three major research facilities- the Rosalind Franklin Institute, the Alan Turing Institute, and Diamond Light Source. Collectively the partners are involved in EPSRC activity worth more than £550m. Birmingham hosts several national centres and facilities including the National Centre for Nuclear Robotics, the National Buried Infrastructure Facility, and the UK National Quantum Technology Hubs for Sensors and Metrology, and Sensing and Timing. The Franklin and its partners are developing next generation technologies for studying life, building a new generation of scientific instruments. The Turing is the UK national centre for Data Science and Artificial Intelligence, coordinating and catalysing research across the country. Diamond Light Source is the UK's national synchrotron light source, and is an essential tool in a huge range of scientific applications, from the development of the next generation of advanced materials for aerospace, to studies of the structure of proteins. This new system will therefore benefit a broad range of EPSRC researchers working with these facilities and beyond, with the system available to all EPSRC-funded researchers.

The new facility will be hosted in Birmingham's purpose-built, water-cooled datacentre which enables the entire system to be kept at optimal temperatures without needing air conditioning, dramatically reducing its running costs and energy usage. High-speed links to the Harwell campus where Diamond and the Franklin are based will enable rapid transfer of large datasets and we will put in place automated pipelines for data transfer and processing that allow researchers to take full advantage of the technology.

Planned Impact

This proposal aims to put in place and exploit a new architecture (IBM POWER9) for Tier-2 High Performance Computing that is especially suitable for large scale data analytics, machine learning, and simulation. This investment will increase the research capacity and capability of the UK in an area of rapidly increasing need, and will be of particular benefit to machine learning/artificial intelligence researchers, experimental scientists collecting and analysing large datasets, industrial researchers performing complex simulation, and the staff who will be trained to support the system.

Businesses and the economy will benefit from across a wide range of sectors, including:

1. Industrial users of the advanced facilities at the Rosalind Franklin Institute and Diamond Life Source include pharmaceutical companies, who use the state of the art technologies at Diamond and the Franklin such as cryo-electron tomography and correlative light-electron microscopy to image complex biological samples as part of their drug development programmes; and advanced manufacturing industries such as aerospace who perform tomographic imaging experiment of advanced materials to understand how they behave at a microstructural level when under stress.

2. Organisation who collaborate with the Alan Turing Institute on problems on data science and artificial intelligence, from sectors as diverse as finance, technology, healthcare, manufacturing, government, transport, energy, and agriculture.

3. The University of Birmingham collaborates with many organisations on EPSRC projects in sectors that include quantum technology, nuclear robotics, civil engineering, advanced manufacturing, and cyber security.

4. External organisations collaborating on EPSRC projects who access the system through the RAP calls.

They will benefit from the facility by being able to access it at no cost to accelerate their research and development activities. Organisations working with the facility's partners will be able to gain access through the partner resource allocation, whilst industrial collaborators on other EPSRC projects will be able to gain access through the RAP call.

Society and the economy will also benefit from the training and development of skilled people in areas where there is a national shortage. The research software engineers and the systems engineer employed to deliver our service will benefit from working alongside, learning from, and being mentored by experienced and skilled people. They will gain further benefit from the networking opportunities available to them through the Alan Turing Institute's network of research software engineers which will give them the opportunity to engage with the wider UK community, creating opportunities for them to develop their skills and careers.

The facility will also generate impact though collaboration, by enabling the consortium partners to work together through workshops and hackathons that address challenges posed by one of the consortium partners or system users, taking advantage of our unique computational infrastructure to solve important problems and transfer knowledge and expertise between the consortium partners and wider user community.

Publications

10 25 50