IGC-Eddie3 high performance storage arrays
Lead Research Organisation:
University of Edinburgh
Department Name: MRC Human Genetics Unit
Abstract
We combine the latest computational and experimental technologies to investigate how our genomes work to control the function of molecules, cells and tissues in people and populations. For more than half a century our research has been dedicated to understanding human genetic disease. Today we continue to apply our clinical and scientific expertise, harnessing the power of complex data, to improve health, and the lives of patients and their families. The IE3 high-performance computing cluster provides the analytic power to analyse rich and complex data that underpins most of our research and discoveries, for example in the mechanisms that drive cancer progression. The storage arrays requested here are needed so that data can be moved to computer processors and results received back quickly enough to efficiently make use of the computing power available and to be able to analyse the large quantities of data needed for these studies - for example the analysis of hundreds or thousands of whole genome sequences.
Technical Summary
The IGC-Eddie3 (IE3) cluster underpins all the research at the MRC Human Genetics Unit where we combine the latest computational and experimental technologies to investigate how our genomes work to control the function of molecules, cells and tissues in people and populations.
All seventeen core HGU groups and most affiliate groups make use of IE3, encompassing >200 individual researchers at all career stages. For at least nine of the research groups (Ponting, Semple, Taylor, Marsh, Ewing, Khamseh, Colhoun, Vallejos, QTL) IE3 can be considered to represent the principal research tool. It is a dedicated resource optimised and exclusively for the use of the MRC Human Genetics Unit and associated Institute of Genetics and Cancer (IGC) research groups.
IE3 is a modular system comprising compute units and high-performance storage arrays. The high-performance storage arrays provide a temporary data storage area with sufficient speed and capacity to provide data to, and receive from, the >2,500 CPU core compute cluster. This is essential for genomics research which heavily depends on large-scale data movement, e.g. parallel analysis of thousands of whole genome sequences or RNA-sequencing from thousands of single cells. Efficient large-scale data movement is also essential for high content image analysis, central to the 4D cellular medicine initiative at the HGU.
The current high-performance storage arrays are end of life and operating on an extended warranty. Individual disk failures are now becoming common and synchronous failures risk the loss of data and protracted system down-time if the storage infrastructure needs to be repaired. The proposed replacements are model updates of the existing end of life hardware components that have had an excellent track record in delivering reliable, sustained high-performance storage for the continuously running cluster over the last 5.5 years.
All seventeen core HGU groups and most affiliate groups make use of IE3, encompassing >200 individual researchers at all career stages. For at least nine of the research groups (Ponting, Semple, Taylor, Marsh, Ewing, Khamseh, Colhoun, Vallejos, QTL) IE3 can be considered to represent the principal research tool. It is a dedicated resource optimised and exclusively for the use of the MRC Human Genetics Unit and associated Institute of Genetics and Cancer (IGC) research groups.
IE3 is a modular system comprising compute units and high-performance storage arrays. The high-performance storage arrays provide a temporary data storage area with sufficient speed and capacity to provide data to, and receive from, the >2,500 CPU core compute cluster. This is essential for genomics research which heavily depends on large-scale data movement, e.g. parallel analysis of thousands of whole genome sequences or RNA-sequencing from thousands of single cells. Efficient large-scale data movement is also essential for high content image analysis, central to the 4D cellular medicine initiative at the HGU.
The current high-performance storage arrays are end of life and operating on an extended warranty. Individual disk failures are now becoming common and synchronous failures risk the loss of data and protracted system down-time if the storage infrastructure needs to be repaired. The proposed replacements are model updates of the existing end of life hardware components that have had an excellent track record in delivering reliable, sustained high-performance storage for the continuously running cluster over the last 5.5 years.
Organisations
- University of Edinburgh (Lead Research Organisation)
- Medical Research Council (MRC) (Collaboration)
- Francis Crick Institute (Collaboration)
- Cancer Research UK Cambridge Institute (Collaboration)
- BioDonostia Health Research Institute (Collaboration)
- EMBL European Bioinformatics Institute (EMBL - EBI) (Collaboration)
- University College London (Collaboration)
- RIKEN (Collaboration)
- German Cancer Research Center (Collaboration)
- Institute for Research in Biomedicine (IRB) (Collaboration)
Publications

Anderson C
(2022)
Strand-resolved mutagenicity of DNA damage and repair

Reijns MAM
(2022)
Signatures of TOP1 transcription-associated mutagenesis in cancer and germline.
in Nature

Young RS
(2022)
The contribution of evolutionarily volatile promoters to molecular phenotypes and human trait variation.
in Genome biology
Description | Detection of DNA embedded ribonucleotides in the mitochondrial genome |
Organisation | Biodonostia Health Research Institute |
Country | Spain |
Sector | Hospitals |
PI Contribution | Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data. |
Collaborator Contribution | Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells. |
Impact | Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009 |
Start Year | 2015 |
Description | Detection of DNA embedded ribonucleotides in the mitochondrial genome |
Organisation | Francis Crick Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data. |
Collaborator Contribution | Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells. |
Impact | Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009 |
Start Year | 2015 |
Description | Detection of DNA embedded ribonucleotides in the mitochondrial genome |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Developed computational tools, co-developed original emRibo-seq methodology and performed analysis on generated data. |
Collaborator Contribution | Generation of genetic model mouse, preparation of tissues and high purity mitochondrial DNA from cells and tissues. Experimental perturbation of cultured cells. |
Impact | Publication: Moss et al, Nucleic Acids Research 2017 doi:10.1093/nar/gkx1009 |
Start Year | 2015 |
Description | Duncan Odom |
Organisation | German Cancer Research Center |
Country | Germany |
Sector | Academic/University |
PI Contribution | Co lead of collaboration along with Duncan Odom, DKFZ. Experimental design. Genomic data analysis following burst mutagenesis. |
Collaborator Contribution | Experimental design. Primary data generation. |
Impact | Outputs are currently in review. |
Start Year | 2022 |
Description | FANTOM6 Consortium |
Organisation | RIKEN |
Department | Institute of Physical and Chemical Research (RIKEN) |
Country | Japan |
Sector | Public |
PI Contribution | Planning of large scale systematic study on lncRNA and their effect on gene regulation. Planning and initiating analysis of the resulting data. |
Collaborator Contribution | Planning, coordination and primary data generation. |
Impact | Project is ongoing - no impact yet. |
Start Year | 2015 |
Description | Liver Cancer Evolution Consortium |
Organisation | Cancer Research UK Cambridge Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns. |
Collaborator Contribution | Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents. |
Impact | No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination. |
Start Year | 2017 |
Description | Liver Cancer Evolution Consortium |
Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns. |
Collaborator Contribution | Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents. |
Impact | No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination. |
Start Year | 2017 |
Description | Liver Cancer Evolution Consortium |
Organisation | German Cancer Research Center |
Country | Germany |
Sector | Academic/University |
PI Contribution | Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns. |
Collaborator Contribution | Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents. |
Impact | No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination. |
Start Year | 2017 |
Description | Liver Cancer Evolution Consortium |
Organisation | Institute for Research in Biomedicine (IRB) |
Country | Spain |
Sector | Academic/University |
PI Contribution | Computational analysis of tumor whole genome and transcriptome sequence data to profile mutation patterns. |
Collaborator Contribution | Generation, histological profiling and whole genome and transcriptome sequencing of carcinogen induced tumors in rodents. |
Impact | No published outcomes yet, less that 1 year into project and data generation still under way. Scientific discovery & insight. Manuscript writing, project coordination. |
Start Year | 2017 |
Description | Sarah Aitken |
Organisation | Medical Research Council (MRC) |
Department | MRC Toxicology Unit |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Joint supervision of ECAT PhD student John Connelly. Machine learning based analysis of cancer histology whole slide images. |
Collaborator Contribution | Joint supervision of ECAT PhD student John Connelly. Qualified pathologist evaluation of cancer histology whole slide images. |
Impact | Multidisciplinary: Computer machine learning, mathematical modelling, histopathology, oncology, genomics. |
Start Year | 2021 |
Description | EdDASH |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Out reach training using software carpentries format. Multiple group members working as tutors and course organisers. |
Year(s) Of Engagement Activity | 2022,2023 |
URL | https://edcarp.github.io/Ed-DaSH/ |