IGC-Eddie3 high performance storage arrays

Lead Research Organisation: University of Edinburgh
Department Name: MRC Human Genetics Unit

Abstract

We combine the latest computational and experimental technologies to investigate how our genomes work to control the function of molecules, cells and tissues in people and populations. For more than half a century our research has been dedicated to understanding human genetic disease. Today we continue to apply our clinical and scientific expertise, harnessing the power of complex data, to improve health, and the lives of patients and their families. The IE3 high-performance computing cluster provides the analytic power to analyse rich and complex data that underpins most of our research and discoveries, for example in the mechanisms that drive cancer progression. The storage arrays requested here are needed so that data can be moved to computer processors and results received back quickly enough to efficiently make use of the computing power available and to be able to analyse the large quantities of data needed for these studies - for example the analysis of hundreds or thousands of whole genome sequences.

Technical Summary

The IGC-Eddie3 (IE3) cluster underpins all the research at the MRC Human Genetics Unit where we combine the latest computational and experimental technologies to investigate how our genomes work to control the function of molecules, cells and tissues in people and populations.

All seventeen core HGU groups and most affiliate groups make use of IE3, encompassing >200 individual researchers at all career stages. For at least nine of the research groups (Ponting, Semple, Taylor, Marsh, Ewing, Khamseh, Colhoun, Vallejos, QTL) IE3 can be considered to represent the principal research tool. It is a dedicated resource optimised and exclusively for the use of the MRC Human Genetics Unit and associated Institute of Genetics and Cancer (IGC) research groups.

IE3 is a modular system comprising compute units and high-performance storage arrays. The high-performance storage arrays provide a temporary data storage area with sufficient speed and capacity to provide data to, and receive from, the >2,500 CPU core compute cluster. This is essential for genomics research which heavily depends on large-scale data movement, e.g. parallel analysis of thousands of whole genome sequences or RNA-sequencing from thousands of single cells. Efficient large-scale data movement is also essential for high content image analysis, central to the 4D cellular medicine initiative at the HGU.

The current high-performance storage arrays are end of life and operating on an extended warranty. Individual disk failures are now becoming common and synchronous failures risk the loss of data and protracted system down-time if the storage infrastructure needs to be repaired. The proposed replacements are model updates of the existing end of life hardware components that have had an excellent track record in delivering reliable, sustained high-performance storage for the continuously running cluster over the last 5.5 years.
 
Description Detection of DNA embedded ribonucleotides in the mitochondrial genome 
Organisation Biodonostia Health Research Institute
Country Spain 
Sector Hospitals 
PI Contribution Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data.
Collaborator Contribution Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells.
Impact Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009
Start Year 2015
 
Description Detection of DNA embedded ribonucleotides in the mitochondrial genome 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data.
Collaborator Contribution Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells.
Impact Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009
Start Year 2015
 
Description Detection of DNA embedded ribonucleotides in the mitochondrial genome 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data.
Collaborator Contribution Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells.
Impact Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009
Start Year 2015
 
Description Duncan Odom 
Organisation German Cancer Research Center
Country Germany 
Sector Academic/University 
PI Contribution Co lead of collaboration along with Duncan Odom, DKFZ. Experimental design. Genomic data analysis following burst mutagenesis.
Collaborator Contribution Experimental design. Primary data generation.
Impact Outputs are currently in review.
Start Year 2022
 
Description FANTOM6 Consortium 
Organisation RIKEN
Department Institute of Physical and Chemical Research (RIKEN)
Country Japan 
Sector Public 
PI Contribution Planning of large scale systematic study on lncRNA and their effect on gene regulation. Planning and initiating analysis of the resulting data.
Collaborator Contribution Planning, coordination and primary data generation.
Impact Project is ongoing - no impact yet.
Start Year 2015
 
Description Liver Cancer Evolution Consortium 
Organisation Cancer Research UK Cambridge Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns.
Collaborator Contribution Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents.
Impact No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination.
Start Year 2017
 
Description Liver Cancer Evolution Consortium 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns.
Collaborator Contribution Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents.
Impact No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination.
Start Year 2017
 
Description Liver Cancer Evolution Consortium 
Organisation German Cancer Research Center
Country Germany 
Sector Academic/University 
PI Contribution Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns.
Collaborator Contribution Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents.
Impact No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination.
Start Year 2017
 
Description Liver Cancer Evolution Consortium 
Organisation Institute for Research in Biomedicine (IRB)
Country Spain 
Sector Academic/University 
PI Contribution Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns.
Collaborator Contribution Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents.
Impact No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination.
Start Year 2017
 
Description Sarah Aitken 
Organisation Medical Research Council (MRC)
Department MRC Toxicology Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution Joint supervision of ECAT PhD student John Connelly. Machine learning based analysis of cancer histology whole slide images.
Collaborator Contribution Joint supervision of ECAT PhD student John Connelly. Qualified pathologist evaluation of cancer histology whole slide images.
Impact Multidisciplinary: Computer machine learning, mathematical modelling, histopathology, oncology, genomics.
Start Year 2021
 
Description EdDASH 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Out reach training using software carpentries format. Multiple group members working as tutors and course organisers.
Year(s) Of Engagement Activity 2022,2023
URL https://edcarp.github.io/Ed-DaSH/