Large memory HPC infrastructure to underpin world-class biological research
Lead Research Organisation:
University of Edinburgh
Department Name: The Roslin Institute
Abstract
Modern biological research uses computers at nearly every stage. As the instruments that we use to generate the data improve, the amount of data to be analysed has increased greatly, such that we need increasingly bigger and more powerful computers in order to be able to make sense of the data that we have available.
For some projects, the increase in the data is simply a question of size and the problem can be addressed by breaking the data up into blocks and analysing each block on a separate processor. However, for other projects, the analyses that we need to perform require us to do comparisons across and between all the data points meaning that we can't split the problem up quite so easily. Whilst there are clever (but complicated) ways of moving data around between different processors to try to address these particular problems, they do not scale well, and this can mean that some analyses are simply not possible using "standard" high-performance computing equipment. These projects need access to specialised computer systems that allow large numbers of processors to share the same large block of memory at the same time. Our aim is to provide access to one of these specialised computer systems.
Having access to such a system will allow BBSRC-funded researchers at Edinburgh to (a) analyse existing data sets more efficiently and to a greater depth, (b) design experiments and analyse data sets that would not otherwise be practical or even possible and (c) get hands-on experience of using a flexibly configured super-computer system and therefore allow them to understand and test which parts of their analyses might benefit from running on this, or an alternatively configured super-computer system e.g. one with a greater number of processors per unit of available memory.
For some projects, the increase in the data is simply a question of size and the problem can be addressed by breaking the data up into blocks and analysing each block on a separate processor. However, for other projects, the analyses that we need to perform require us to do comparisons across and between all the data points meaning that we can't split the problem up quite so easily. Whilst there are clever (but complicated) ways of moving data around between different processors to try to address these particular problems, they do not scale well, and this can mean that some analyses are simply not possible using "standard" high-performance computing equipment. These projects need access to specialised computer systems that allow large numbers of processors to share the same large block of memory at the same time. Our aim is to provide access to one of these specialised computer systems.
Having access to such a system will allow BBSRC-funded researchers at Edinburgh to (a) analyse existing data sets more efficiently and to a greater depth, (b) design experiments and analyse data sets that would not otherwise be practical or even possible and (c) get hands-on experience of using a flexibly configured super-computer system and therefore allow them to understand and test which parts of their analyses might benefit from running on this, or an alternatively configured super-computer system e.g. one with a greater number of processors per unit of available memory.
Technical Summary
We request funds for a high-performance shared memory system to support work on genetic, genomic, epidemiology, evolutionary biology, anti-microbial resistance and many other aspects of BBSRC-funded high-priority science. Most modern research organisations have access to high-performance computing systems, with TB-scale memory and 20-30 cores per individual node. However, fundamentally these are individual nodes joined together in physical close proximity but not as a coherent whole.
At the very top end of the computing spectrum, UK national resources such as ARCHER provide access to super-computer architectures where the memory across the entire system is addressable by all nodes and processes. However, the "memory scalability" of such systems is limited as across-node memory accesses are limited by coherency issues. Biological large-compute problems tend to require true shared memory and thus do not fit well on ARCHER-style systems which are better suited to "particle" based problem domains.
An alternative super-computer configuration is to have huge memory configuration shared across a more limited set of nodes. This provides the genuine shared memory access with a number of cores that makes sense in the context of our biological data shapes. The system we here propose sits in this conceptual space, offering flexible memory access shared across the entire system which permits the machine to operate in both "task farming" and "parallel processing" modes simultaneously.
The new system would consist of an HPE Superdome Flex shared memory system incorporating 32 sockets/704 cores (Intel Xeon 6152 2.1GHz 22 core), 12TB of RAM, with a RAID disk array with 480TB (raw) or with 240TB (raw) capacity and sixteen 1.6TB Flash NVMe cards. This configuration gives us maximum flexibility across the range of anticipated use cases and a solid I/O infrastructure to ensure that data bottlenecks are minimised.
At the very top end of the computing spectrum, UK national resources such as ARCHER provide access to super-computer architectures where the memory across the entire system is addressable by all nodes and processes. However, the "memory scalability" of such systems is limited as across-node memory accesses are limited by coherency issues. Biological large-compute problems tend to require true shared memory and thus do not fit well on ARCHER-style systems which are better suited to "particle" based problem domains.
An alternative super-computer configuration is to have huge memory configuration shared across a more limited set of nodes. This provides the genuine shared memory access with a number of cores that makes sense in the context of our biological data shapes. The system we here propose sits in this conceptual space, offering flexible memory access shared across the entire system which permits the machine to operate in both "task farming" and "parallel processing" modes simultaneously.
The new system would consist of an HPE Superdome Flex shared memory system incorporating 32 sockets/704 cores (Intel Xeon 6152 2.1GHz 22 core), 12TB of RAM, with a RAID disk array with 480TB (raw) or with 240TB (raw) capacity and sixteen 1.6TB Flash NVMe cards. This configuration gives us maximum flexibility across the range of anticipated use cases and a solid I/O infrastructure to ensure that data bottlenecks are minimised.
Planned Impact
The immediate impact of this proposal will be delivered to the academic community by publication of papers in learned journals, presentations at conferences, workshops and seminars. The availability of the computing resource will permit the execution of more numerous and more detailed analyses than would otherwise be the case. Outputs from this project will also be delivered in the form of targeted and relevant training materials and courses which will provide academic users with the necessary skills with which to take full advantage of the high-end computing resources being made available here and elsewhere.
The resource requested will make a not insignificant difference to the UK science infrastructure and research capacity and will enhance the reputation of UK scientists and their work. Many of the applicants are frequently invited to present courses and talks on the international stage and every opportunity will be taken in doing so to highlight the advanced capabilities available to BBSRC scientists. Access to the compute resource will lead to greater collaborations with international colleagues and an enhanced reputation for UK science overall.
The greater depth of research analyses that the computing resource permits will lead to more comprehensive studies being performed, with better resolution available. That greater resolution should result in better understanding and subsequently improved analytical or diagnostic tools which will lead to benefits for industrial partners and subsequently for processors, retailers and users/consumers through the delivery of a higher quality product which costs less, and is more environmentally friendly, healthier and suited to individual requirements of stakeholders in the supply chain.
Greater understanding of biological processes will also lead to improved policy-making and informed government, particularly in the field of animal and human health. The proposed resource is ideally suited to the analysis of large numbers of individuals' sequence data sets and for epidemiological studies which are critical for those policy considerations.
All members of society involved in or dependent upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications of the impacts outlined above. The application of the outcomes by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact. The knowledge will feed into educational programs. Thus, there will be a positive impact on the wider general population too.
The resource requested will make a not insignificant difference to the UK science infrastructure and research capacity and will enhance the reputation of UK scientists and their work. Many of the applicants are frequently invited to present courses and talks on the international stage and every opportunity will be taken in doing so to highlight the advanced capabilities available to BBSRC scientists. Access to the compute resource will lead to greater collaborations with international colleagues and an enhanced reputation for UK science overall.
The greater depth of research analyses that the computing resource permits will lead to more comprehensive studies being performed, with better resolution available. That greater resolution should result in better understanding and subsequently improved analytical or diagnostic tools which will lead to benefits for industrial partners and subsequently for processors, retailers and users/consumers through the delivery of a higher quality product which costs less, and is more environmentally friendly, healthier and suited to individual requirements of stakeholders in the supply chain.
Greater understanding of biological processes will also lead to improved policy-making and informed government, particularly in the field of animal and human health. The proposed resource is ideally suited to the analysis of large numbers of individuals' sequence data sets and for epidemiological studies which are critical for those policy considerations.
All members of society involved in or dependent upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications of the impacts outlined above. The application of the outcomes by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact. The knowledge will feed into educational programs. Thus, there will be a positive impact on the wider general population too.