Development of a high-performance data storage and sharing platform for the NERC/NBAF genomics community.

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

We propose hardware and software assets to implement a robust and secure infrastructure to store, archive, and share next-generation, high volume genomics data at collaborating nodes of the NERC Biomolecular Analysis Facilities in Liverpool and Edinburgh. The new e-infrastructure, built on existing core systems, will support a multi-tier storage architecture for: efficient data access for local processing; secure on-site storage; high-speed file transfer to the end user, to public repositories (e.g. European Nucleotide Archive) and to cloud computing resources (e.g. Cloud Bio-Linux); and fast data upload to NBAF servers. The assets will consist of network and server-attached storage, and industry leading Aspera file transfer technology to deliver speeds many times faster than FTP or HTTP.

Planned Impact

Environmental genomics is currently driving some of the most exciting environmental science across the NERC remit and investment in NBAF has allowed UK science to lead the world in this field. This is evidenced by the fact that environmental genomics is now included in 3 of 5 NERC panel portfolios and that NERC projects supported from our facilities have been published in Nature, Science and PNAS. The excitement of the sequencing technology is that it allows questions to be answered now that would have been impossible even 5 years ago. This sequencing technology can now be applied to any environmentally important organism, not just the limited number of biomedical models. Projects within the last 12 months through our facilities have included bacteria, grasses, butterflies, winkles, in terrestrial, freshwater and marine environments. These address NERC priorities including Biodiversity, Sustainable use of Natural Resources, LWEC, and Environment, Pollution and Human Health. Thus, environmental genomics is particularly adept at understanding the biodiversity of organisms within the natural environment and how these organisms respond to environmental change or pollution. The proposed assets will help to underpin the nascent Environmental 'Omics Synthesis (EOS) centre and fellowships, and the developing analytical challenges of environmental genomics integrated across multiple big data resources, including those from climatic and environmental monitoring. A high performance data storage and sharing platform will also benefit other NBAF nodes and NERC centres. For example, our ability to store and rapidly serve up data directly to any high performance computing environment will greatly complement a separate bid from Prof Field and collaborators to support an 'EOS cloud' for bioinformatics analysis.
 
Description This award supported the development of an informatics platform for storing and sharing high-volume DNA sequencing data for the NERC research community. The infrastructure supports a multi-tier storage architecture for efficient local data processing, secure on-site storage, high-speed file transfer to public repositories and cloud computing resources, and fast data upload to servers at the NERC Biomolecular Analysis Facility.
Exploitation Route The award did not generate findings per se but underpins a community resource used by NERC researchers for storing and sharing data/findings.
Sectors Agriculture

Food and Drink

Healthcare

Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology