Development of a high-performance data storage and sharing platform for the NERC/NBAF genomics community.
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Biological Sciences
Abstract
We propose hardware and software assets to implement a robust and secure infrastructure to store, archive, and share next-generation, high volume genomics data at collaborating nodes of the NERC Biomolecular Analysis Facilities in Liverpool and Edinburgh. The new e-infrastructure, built on existing core systems, will support a multi-tier storage architecture for: efficient data access for local processing; secure on-site storage; high-speed file transfer to the end user, to public repositories (e.g. European Nucleotide Archive) and to cloud computing resources (e.g. Cloud Bio-Linux); and fast data upload to NBAF servers. The assets will consist of network and server-attached storage, and industry leading Aspera file transfer technology to deliver speeds many times faster than FTP or HTTP.
Planned Impact
Environmental genomics is currently driving some of the most exciting environmental science across the NERC remit and investment in NBAF has allowed UK science to lead the world in this field. This is evidenced by the fact that environmental genomics is now included in 3 of 5 NERC panel portfolios and that NERC projects supported from our facilities have been published in Nature, Science and PNAS. The excitement of the sequencing technology is that it allows questions to be answered now that would have been impossible even 5 years ago. This sequencing technology can now be applied to any environmentally important organism, not just the limited number of biomedical models. Projects within the last 12 months through our facilities have included bacteria, grasses, butterflies, winkles, in terrestrial, freshwater and marine environments. These address NERC priorities including Biodiversity, Sustainable use of Natural Resources, LWEC, and Environment, Pollution and Human Health. Thus, environmental genomics is particularly adept at understanding the biodiversity of organisms within the natural environment and how these organisms respond to environmental change or pollution. The proposed assets will help to underpin the nascent Environmental 'Omics Synthesis (EOS) centre and fellowships, and the developing analytical challenges of environmental genomics integrated across multiple big data resources, including those from climatic and environmental monitoring. A high performance data storage and sharing platform will also benefit other NBAF nodes and NERC centres. For example, our ability to store and rapidly serve up data directly to any high performance computing environment will greatly complement a separate bid from Prof Field and collaborators to support an 'EOS cloud' for bioinformatics analysis.
Organisations
People |
ORCID iD |
Karim Gharbi (Principal Investigator) |
Publications
Clucas GV
(2016)
Dispersal in the sub-Antarctic: king penguins show remarkably little population genetic differentiation across their range.
in BMC evolutionary biology
Duvaux L
(2015)
Dynamics of copy number variation in host races of the pea aphid.
in Molecular biology and evolution
Eyres I
(2016)
Differential gene expression according to race and host plant in the pea aphid.
in Molecular ecology
Eyres I
(2017)
Targeted re-sequencing confirms the importance of chemosensory genes in aphid host race differentiation.
in Molecular ecology
Ford AG
(2015)
High levels of interspecific gene flow in an endemic cichlid fish adaptive radiation from an extreme lake environment.
in Molecular ecology
Gutierrez AP
(2017)
Development of a Medium Density Combined-Species SNP Array for Pacific and European Oysters (Crassostrea gigas and Ostrea edulis).
in G3 (Bethesda, Md.)
Jordan CY
(2018)
Maintaining their genetic distance: Little evidence for introgression between widely hybridizing species of Geum with contrasting mating systems.
in Molecular ecology
Leblois R
(2017)
Deciphering the demographic history of allochronic differentiation in the pine processionary moth Thaumetopoea pityocampa
in Molecular Ecology
Pascoal S
(2014)
Rapid convergent evolution in wild crickets.
in Current biology : CB
Qiu S
(2016)
RAD mapping reveals an evolving, polymorphic and fuzzy boundary of a plant pseudoautosomal region.
in Molecular ecology
Yarra T
(2016)
Characterization of the mantle transcriptome in bivalves: Pecten maximus, Mytilus edulis and Crassostrea gigas.
in Marine genomics
Younger JL
(2017)
The challenges of detecting subtle population structure and its importance for the conservation of emperor penguins.
in Molecular ecology
Description | This award supported the development of an informatics platform for storing and sharing high-volume DNA sequencing data for the NERC research community. The infrastructure supports a multi-tier storage architecture for efficient local data processing, secure on-site storage, high-speed file transfer to public repositories and cloud computing resources, and fast data upload to servers at the NERC Biomolecular Analysis Facility. |
Exploitation Route | The award did not generate findings per se but underpins a community resource used by NERC researchers for storing and sharing data/findings. |
Sectors | Agriculture Food and Drink Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |