UK Crop Diversity Bioinformatics Resource

Lead Research Organisation: National Institute of Agricultural Botany
Department Name: Centre for Research

Abstract

We are requesting a High-Performance Computing (HPC) cluster that will be dedicated to support and expand the work that of a consortium of six leading UK research organisations with a focus on the characterisation and utilisation of novel genetic diversity for both the improvement of current crops as well as the breeding of novel varieties. The platform will be open to researchers at the participating institutions and their extensive network of collaborators. We will also develop and deploy common genomics workflows relevant to the common interests of the partners in crop diversity informatics.

Developments in sequencing and genotyping technologies together with advances in environmental monitoring and characterisation are leading to rapid changes in the opportunities that are available to evaluate and utilise genetic diversity in crop plants and their wild relatives to adapt to the changing landscape of agriculture. This is driven by the increasing impact of climate change as well as by evolving agricultural practices and commodity demand. UK Research Organisations host a wide range of key expertise and information resources that have the potential to play a critical role in ensuring that this potential is fully realised. A major factor in developing the research infrastructure to support and nurture such activities is the provision of access to suitable computational resources that are tailored to the key data resources and analytical tools that that play a central role. A consortium of UK Research Organisations including NIAB, Royal Botanic Gardens, Kew (RBGKew), the Natural History Museum (NHM), the James Hutton Institute (JHI), Royal Botanic Gardens, Edinburgh (RBGE), and Scotland's Rural College (SRUC) have identified a need for additional computing resources beyond what they already host individually, centred on BBSRC-funded work around crop diversity. An high-performance computing (HPC) resource focused on the common needs of our organisations and tailored to the software and data resources which are relevant to this mission will deliver efficiencies in the use of resources, both in terms of computing hardware and systems administration. In addition, it will act as a focus for sharing data, developing new methods of analysis and delivering training, and a platform for developing new collaborative programmes of innovative science. Within the partner organisations alone, the platform will support the work of more than 400 scientists including a large proportion of early career researchers and PhD students. We propose to host the facility at JHI's data centre.

Technical Summary

We will purchase a Dell High-Performance Computing Cluster. Hosted at the James Hutton Institute, with remote access available to the other partners, this shared resource will significantly enhance the computational capabilities of six leading UK plant science research focused organisations who are constrained by existing shortfalls in access to high-performance infrastructure.

The cluster provides:
- 2,064 compute cores, with 11 TB (combined total) memory. 40 dual-CPU nodes will run 24-thread Intel Xeon Silver 4116 processors at 2.1Ghz, with 12x 8GB RDIMMs (192 GB) for optimal memory bandwidth performance (6 channels per processor). 1.2 TB of SSD-based storage is available for local caching of data. Two additional nodes are dedicated to high-memory (3 TB) and GPU processing respectively.

- A 1.3 PB (pre-RAID) parallel file system, to be used for both primary and working/scratch data. Running as a four-node BeeGFS-on-ZFS array (192 TB per node) with additional attached storage per node (144 TB), performance is expected to exceed 5GB/s for sequential data transfers. File and directory metadata lookup performance will be optimised by dedicating a further SSD-based node exclusively to this task.

- 840 TB of disk-to-disk backup capacity, to be located in a separate building. ZFS snapshot technology will provide access to daily, weekly, or monthly backups of primary data.

- High-speed 25Gbe networking for all compute and storage nodes, along with a separate 1Gbe network for management/monitoring. Appropriate switching gear and cabling is included. The platform is completed by the necessary housing racks and UPS power units.

The system will run almost exclusively using open-source Linux solutions, including: CentOS (operating system); BeeGFS (file system); SGE or SLURM (job scheduling); bioconda (informatics software), alongside Singularity (containers for more complex deployments); Ansible (configuration management); and Ganglia/Nagios (monitoring).

Planned Impact

The impact of the proposed capital investment is high and will deliver a number of benefits to the plant diversity and agricultural research sector nationally and internationally.

Direct beneficiaries:

Partner organisations
The consortium of six partners covers a leading group of scientists that have placed the UK as a global leader in genetics and genomics for plant diversity research. The combined research platform hosts more than 400 scientists including postdocs and a large number of PhD students.

Commercial private sector
The UK plant genetics and breeding sector will benefit enormously from this investment. Collaborative R&D as a result of this work will allow industries to more rapidly characterise plant material at the phenotypic and genetic level, develop molecular markers (benefit within 5 years).

Crop growing sectors in the UK
UK industry will benefit as it will be able to access resources and R&D protocols they may otherwise not be able to implement. Longer term, it is anticipated that UK partners will make significant use of this resource and the knowledge generated from pre-competitive work. This may lead to further competitive work funded by other research bodies (e.g. Innovate UK and AHDB). Advancing genomic resources in agricultural and horticultural crops and their pathogens is a key aim of the AHDB.
(Benefit within 5-10 years).

Indirect beneficiaries:
The application of genomics technologies and the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefit within 10-12 years)

Government, public and policy
The public will benefit, not only from the improved position of UK agribusiness (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term, the public also benefits through increased food security and sustainability, as a result of scientific improvements on horticultural and arable crops. This feeds into many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides).
(Benefit within 5-10 years).

Publications

10 25 50

publication icon
Walker BE (2023) Evidence-based guidelines for automated conservation assessments of plant species. in Conservation biology : the journal of the Society for Conservation Biology

 
Description This award funds the purchase of a High-Performance Computing (HPC) cluster that will be dedicated to support and expand the work of a consortium of six leading UK research organisations with a focus on the characterisation and utilisation of novel genetic diversity, for both the improvement of current crops as well as the breeding of novel varieties.
Exploitation Route The platform will be open to researchers at the participating institutions and their extensive network of collaborators. We will also develop and deploy common genomics workflows relevant to the common interests of the partners in crop diversity informatics.
Sectors Agriculture

Food and Drink

Digital/Communication/Information Technologies (including Software)

 
Description The deployment of this HPC platform has supported the work that NIAB is taking forward for the Cambridge Cop Centre.
First Year Of Impact 2019
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title Systematics of the avian family Alaudidae using multilocus and genomic data 
Description SNP datasets (gzipped VCF files) and tree output (newick-formatted trees in .trees files) associated with the publication Systematics of the avian family Alaudidae using multilocus and genomic data by Alström et al. (2023). In the VCF files, samples have intermediary sequence names. Below is a key (sorted per name in VCF file) to taxon and ID. Further information can be found in Dataset SM1 in the original publication. VCF_name Taxon ID A_hamertoni_hamertoni_1982_3_71 Alaemon hamertoni hamertoni NHMUK 1982.3.71 A_hamertoni_tertia_1982_3_12 Alaemon hamertoni tertia NHMUK 1982.3.12 A_phoenicura_1949_25_4559 Ammomanes phoenicura NHMUK 1949.25.4559 Ala_alaudipes Alaemon alaudipes alaudipes DZUG U5653 Alauda_arvensis_KZ Alauda arvensis dulcivox DZUG U0587 Alaudala_seebohmi Alaudala cheleensis seebohmi DZUG U5316 AmmocincU4892_S121 Ammomanes cinctura DZUG U4892 AmmodesU5745_S28 Ammomanes deserti DZUG U5745 AmmodesU5748_S106 Galerida cristata magna DZUG U5748 C_duponti_margaritae_1952_51_107 Chersophilus duponti margaritae NHMUK 1952.51.107 Cal_brachydactyla Calandrella brachydactyla rubiginosa DZUG U5655 CalbarlU5662_S32 Calendulauda erythroclamys patae DZUG U5662 CalsabotaU2344_S33 Calendulauda sabota suffusca DZUG U2344 CheralboU5652_S52 Chersomanes albofasciata DZUG U5652 E_leucotis_BMNH_1952.25.36 Eremopterix leucotis NHMUK 1952.25.36 ErealpeU4610_S144 Eremophila alpestris brandti DZUG U4610 Eremalauda_dunni Eremalauda eremodites DZUG U4578 Eremopterix_hova_FMNH449163 Eremopterix hova FMNH 449163 G_deva_1949_whi_1_8153 Galerida deva NHMUK 1949.whi.1.8153 G_malabarica_1949_whi_1_7738 Galerida malabarica NHMUK 1949.whi.1.7738 HetarchU2811_S34 Heteromirafra archeri ("sidamoensis") DZUG U2811 Heteromirafra_ruddi Heteromirafra ruddi DZUG U5661 Lullula_arborea_U544 Lullula arborea UWBM 64680 M_a_africana_BMNH_1927.5.26.5 Corypha [Mirafra] africana africana NHMUK 1927.5.26.5 M_africana_athi_1951_13_2495 Corypha [Mirafra] africana athi NHMUK 1951.13.2495 M_africana_athi_1968_48_2 Corypha [Mirafra] africana athi NHMUK 1968.48.2 M_albicauda_1916_12_1_840 Mirafra albicauda NHMUK 1916.12.1.840 M_ashi_1982_3_4 Corypha [Mirafra] ashi NHMUK 1982.3.4 M_collaris_BMNH_1923.8.7.2633 Amirafra [Mirafra] collaris NHMUK 1923.8.7.2633 M_collaris_BMNH_1923.8.7.2634 Amirafra [Mirafra] collaris NHMUK 1923.8.7.2634 M_cordofanica_1932_8_6_261 Mirafra cordofanica NHMUK 1932.8.6.261 M_cordofanica_1932_8_6_262 Mirafra cordofanica NHMUK 1932.8.6.262 M_gilletti_1982_3_10 Calendulauda [Mirafra] gilletti arorihensis NHMUK 1982.3.10 M_gilletti_1982_3_9 Calendulauda [Mirafra] gilletti degodiensis NHMUK 1982.3.9 M_hypermetra_BMNH_1912.12.23.317 Corypha [Mirafra] hypermetra hypermetra NHMUK 1912.12.23.317 M_rufa_lynesi_1922_12_8_1553 Calendulauda [Mirafra] rufa rufa NHMUK 1922.12.8.1553 M_rufa_nigriticola_1932_8_6_253 Calendulauda [Mirafra] rufa nigriticola NHMUK 1932.8.6.253 M_somalica_somalica_1919_10_6_27 Corypha [Mirafra] somalica somalica NHMUK 1919.10.6.27 MelcalaU0583_S124 Melanocorypha calandra psammochroa DZUG U0583 MelyeltU0580_S35 Melanocorypha yeltoniensis DZUG U0580 Mirafra_alb_ZMB_2000.9488 Mirafra albicauda ZMB 2000.9488 Mirafra_albi_ZMB_49.225 Mirafra albicauda ZMB 49.225 (holotype) Mirafra_degodiensis Calendulauda [Mirafra] gilletti degodiensis DZUG U2808 Mirafra_rufocinnamomea_U5657 Amirafra [Mirafra] rufocinnamomea fischeri FMNH 484672 (= DZUG U5657) MirchenU5660_S37 Mirafra cheniana DZUG U5660 MirchenU5759_S36 Mirafra cheniana DZUG U5759 MirfascU5758_S60 Corypha [Mirafra] fasciolata DZUG U5758 Panurus Panurus biarmicus 1ET92164|SAMN13107499 Ramphocoris Ramphocoris clotbey CEFE Rhcl1 (= DZUG U5651) Spizocorys_fringillaris_U5659 Spizocorys fringillaris DZUG U5659 Spizocorys_obbiensis_1908_5_28_104 Spizocorys obbiensis NHMUK 1908.5.28.104 For convenience, the same data sorted per taxon: VCF_name Taxon ID Ala_alaudipes Alaemon alaudipes alaudipes DZUG U5653 A_hamertoni_hamertoni_1982_3_71 Alaemon hamertoni hamertoni NHMUK 1982.3.71 A_hamertoni_tertia_1982_3_12 Alaemon hamertoni tertia NHMUK 1982.3.12 Alauda_arvensis_KZ Alauda arvensis dulcivox DZUG U0587 Alaudala_seebohmi Alaudala cheleensis seebohmi DZUG U5316 M_collaris_BMNH_1923.8.7.2633 Amirafra [Mirafra] collaris NHMUK 1923.8.7.2633 M_collaris_BMNH_1923.8.7.2634 Amirafra [Mirafra] collaris NHMUK 1923.8.7.2634 Mirafra_rufocinnamomea_U5657 Amirafra [Mirafra] rufocinnamomea fischeri FMNH 484672 (= DZUG U5657) AmmocincU4892_S121 Ammomanes cinctura DZUG U4892 AmmodesU5745_S28 Ammomanes deserti DZUG U5745 A_phoenicura_1949_25_4559 Ammomanes phoenicura NHMUK 1949.25.4559 Cal_brachydactyla Calandrella brachydactyla rubiginosa DZUG U5655 M_gilletti_1982_3_10 Calendulauda [Mirafra] gilletti arorihensis NHMUK 1982.3.10 M_gilletti_1982_3_9 Calendulauda [Mirafra] gilletti degodiensis NHMUK 1982.3.9 Mirafra_degodiensis Calendulauda [Mirafra] gilletti degodiensis DZUG U2808 M_rufa_nigriticola_1932_8_6_253 Calendulauda [Mirafra] rufa nigriticola NHMUK 1932.8.6.253 M_rufa_lynesi_1922_12_8_1553 Calendulauda [Mirafra] rufa rufa NHMUK 1922.12.8.1553 CalbarlU5662_S32 Calendulauda erythroclamys patae DZUG U5662 CalsabotaU2344_S33 Calendulauda sabota suffusca DZUG U2344 CheralboU5652_S52 Chersomanes albofasciata DZUG U5652 C_duponti_margaritae_1952_51_107 Chersophilus duponti margaritae NHMUK 1952.51.107 M_a_africana_BMNH_1927.5.26.5 Corypha [Mirafra] africana africana NHMUK 1927.5.26.5 M_africana_athi_1951_13_2495 Corypha [Mirafra] africana athi NHMUK 1951.13.2495 M_africana_athi_1968_48_2 Corypha [Mirafra] africana athi NHMUK 1968.48.2 M_ashi_1982_3_4 Corypha [Mirafra] ashi NHMUK 1982.3.4 MirfascU5758_S60 Corypha [Mirafra] fasciolata DZUG U5758 M_hypermetra_BMNH_1912.12.23.317 Corypha [Mirafra] hypermetra hypermetra NHMUK 1912.12.23.317 M_somalica_somalica_1919_10_6_27 Corypha [Mirafra] somalica somalica NHMUK 1919.10.6.27 Eremalauda_dunni Eremalauda eremodites DZUG U4578 ErealpeU4610_S144 Eremophila alpestris brandti DZUG U4610 Eremopterix_hova_FMNH449163 Eremopterix hova FMNH 449163 E_leucotis_BMNH_1952.25.36 Eremopterix leucotis NHMUK 1952.25.36 AmmodesU5748_S106 Galerida cristata magna DZUG U5748 G_deva_1949_whi_1_8153 Galerida deva NHMUK 1949.whi.1.8153 G_malabarica_1949_whi_1_7738 Galerida malabarica NHMUK 1949.whi.1.7738 HetarchU2811_S34 Heteromirafra archeri ("sidamoensis") DZUG U2811 Heteromirafra_ruddi Heteromirafra ruddi DZUG U5661 Lullula_arborea_U544 Lullula arborea UWBM 64680 MelcalaU0583_S124 Melanocorypha calandra psammochroa DZUG U0583 MelyeltU0580_S35 Melanocorypha yeltoniensis DZUG U0580 M_albicauda_1916_12_1_840 Mirafra albicauda NHMUK 1916.12.1.840 Mirafra_alb_ZMB_2000.9488 Mirafra albicauda ZMB 2000.9488 Mirafra_albi_ZMB_49.225 Mirafra albicauda ZMB 49.225 (holotype) MirchenU5660_S37 Mirafra cheniana DZUG U5660 MirchenU5759_S36 Mirafra cheniana DZUG U5759 M_cordofanica_1932_8_6_261 Mirafra cordofanica NHMUK 1932.8.6.261 M_cordofanica_1932_8_6_262 Mirafra cordofanica NHMUK 1932.8.6.262 Panurus Panurus biarmicus 1ET92164|SAMN13107499 Ramphocoris Ramphocoris clotbey CEFE Rhcl1 (= DZUG U5651) Spizocorys_fringillaris_U5659 Spizocorys fringillaris DZUG U5659 Spizocorys_obbiensis_1908_5_28_104 Spizocorys obbiensis NHMUK 1908.5.28.104 Questions can be directed to the corresponding authors. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://zenodo.org/record/7643431
 
Description Kick off event of the Omics Project - key note speaker 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Key note speaker at the Omicas project kick-off workshop in Cali (Colombia)
Year(s) Of Engagement Activity 2019
URL https://www.omicas.co/noticias/recomendaciones-y-agenda-del-taller-anual-omicas