UK Crop Diversity Bioinformatics Resource

Lead Research Organisation: National Institute of Agricultural Botany
Department Name: Centre for Research

Abstract

We are requesting a High-Performance Computing (HPC) cluster that will be dedicated to support and expand the work that of a consortium of six leading UK research organisations with a focus on the characterisation and utilisation of novel genetic diversity for both the improvement of current crops as well as the breeding of novel varieties. The platform will be open to researchers at the participating institutions and their extensive network of collaborators. We will also develop and deploy common genomics workflows relevant to the common interests of the partners in crop diversity informatics.

Developments in sequencing and genotyping technologies together with advances in environmental monitoring and characterisation are leading to rapid changes in the opportunities that are available to evaluate and utilise genetic diversity in crop plants and their wild relatives to adapt to the changing landscape of agriculture. This is driven by the increasing impact of climate change as well as by evolving agricultural practices and commodity demand. UK Research Organisations host a wide range of key expertise and information resources that have the potential to play a critical role in ensuring that this potential is fully realised. A major factor in developing the research infrastructure to support and nurture such activities is the provision of access to suitable computational resources that are tailored to the key data resources and analytical tools that that play a central role. A consortium of UK Research Organisations including NIAB, Royal Botanic Gardens, Kew (RBGKew), the Natural History Museum (NHM), the James Hutton Institute (JHI), Royal Botanic Gardens, Edinburgh (RBGE), and Scotland's Rural College (SRUC) have identified a need for additional computing resources beyond what they already host individually, centred on BBSRC-funded work around crop diversity. An high-performance computing (HPC) resource focused on the common needs of our organisations and tailored to the software and data resources which are relevant to this mission will deliver efficiencies in the use of resources, both in terms of computing hardware and systems administration. In addition, it will act as a focus for sharing data, developing new methods of analysis and delivering training, and a platform for developing new collaborative programmes of innovative science. Within the partner organisations alone, the platform will support the work of more than 400 scientists including a large proportion of early career researchers and PhD students. We propose to host the facility at JHI's data centre.

Technical Summary

We will purchase a Dell High-Performance Computing Cluster. Hosted at the James Hutton Institute, with remote access available to the other partners, this shared resource will significantly enhance the computational capabilities of six leading UK plant science research focused organisations who are constrained by existing shortfalls in access to high-performance infrastructure.

The cluster provides:
- 2,064 compute cores, with 11 TB (combined total) memory. 40 dual-CPU nodes will run 24-thread Intel Xeon Silver 4116 processors at 2.1Ghz, with 12x 8GB RDIMMs (192 GB) for optimal memory bandwidth performance (6 channels per processor). 1.2 TB of SSD-based storage is available for local caching of data. Two additional nodes are dedicated to high-memory (3 TB) and GPU processing respectively.

- A 1.3 PB (pre-RAID) parallel file system, to be used for both primary and working/scratch data. Running as a four-node BeeGFS-on-ZFS array (192 TB per node) with additional attached storage per node (144 TB), performance is expected to exceed 5GB/s for sequential data transfers. File and directory metadata lookup performance will be optimised by dedicating a further SSD-based node exclusively to this task.

- 840 TB of disk-to-disk backup capacity, to be located in a separate building. ZFS snapshot technology will provide access to daily, weekly, or monthly backups of primary data.

- High-speed 25Gbe networking for all compute and storage nodes, along with a separate 1Gbe network for management/monitoring. Appropriate switching gear and cabling is included. The platform is completed by the necessary housing racks and UPS power units.

The system will run almost exclusively using open-source Linux solutions, including: CentOS (operating system); BeeGFS (file system); SGE or SLURM (job scheduling); bioconda (informatics software), alongside Singularity (containers for more complex deployments); Ansible (configuration management); and Ganglia/Nagios (monitoring).

Planned Impact

The impact of the proposed capital investment is high and will deliver a number of benefits to the plant diversity and agricultural research sector nationally and internationally.

Direct beneficiaries:

Partner organisations
The consortium of six partners covers a leading group of scientists that have placed the UK as a global leader in genetics and genomics for plant diversity research. The combined research platform hosts more than 400 scientists including postdocs and a large number of PhD students.

Commercial private sector
The UK plant genetics and breeding sector will benefit enormously from this investment. Collaborative R&D as a result of this work will allow industries to more rapidly characterise plant material at the phenotypic and genetic level, develop molecular markers (benefit within 5 years).

Crop growing sectors in the UK
UK industry will benefit as it will be able to access resources and R&D protocols they may otherwise not be able to implement. Longer term, it is anticipated that UK partners will make significant use of this resource and the knowledge generated from pre-competitive work. This may lead to further competitive work funded by other research bodies (e.g. Innovate UK and AHDB). Advancing genomic resources in agricultural and horticultural crops and their pathogens is a key aim of the AHDB.
(Benefit within 5-10 years).

Indirect beneficiaries:
The application of genomics technologies and the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefit within 10-12 years)

Government, public and policy
The public will benefit, not only from the improved position of UK agribusiness (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term, the public also benefits through increased food security and sustainability, as a result of scientific improvements on horticultural and arable crops. This feeds into many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides).
(Benefit within 5-10 years).

Publications

10 25 50

publication icon
Bohra A (2022) Reap the crop wild relatives for breeding future crops. in Trends in biotechnology

publication icon
Carballo J (2021) Eragrostis curvula, a Model Species for Diplosporous Apomixis. in Plants (Basel, Switzerland)

publication icon
Walker BE (2023) Evidence-based guidelines for automated conservation assessments of plant species. in Conservation biology : the journal of the Society for Conservation Biology

 
Description This award funds the purchase of a High-Performance Computing (HPC) cluster that will be dedicated to support and expand the work of a consortium of six leading UK research organisations with a focus on the characterisation and utilisation of novel genetic diversity, for both the improvement of current crops as well as the breeding of novel varieties.
Exploitation Route The platform will be open to researchers at the participating institutions and their extensive network of collaborators. We will also develop and deploy common genomics workflows relevant to the common interests of the partners in crop diversity informatics.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)

 
Description The deployment of this HPC platform has supported the work that NIAB is taking forward for the Cambridge Cop Centre.
First Year Of Impact 2019
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Kick off event of the Omics Project - key note speaker 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Key note speaker at the Omicas project kick-off workshop in Cali (Colombia)
Year(s) Of Engagement Activity 2019
URL https://www.omicas.co/noticias/recomendaciones-y-agenda-del-taller-anual-omicas