EBI Metagenomics - enabling the reconstruction of microbial populations

Lead Research Organisation: Newcastle University
Department Name: Sch of Engineering

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Metagenomics is a widely used approach to investigate the composition and function of microbial communities. With the development of modern sequencing platforms, data generation is rarely the bottleneck, but rather its analysis. Even when researchers have access to large-scale computing facilities, two metagenomics datasets are rarely analysed in the same way and the workflows used to produce results are virtually impossible to reconstruct. The EBI metagenomics (EMG) resource solves all of the above problems by providing a freely available service for the analysis and archiving (via the European Nucleotide Archive, (ENA)) of metagenomics data. It also provides a platform for the discovery of analysed metagenomics datasets. As these are uniformly analysed, it enables comparability and meta-analysis across projects and biomes. Unlike any other public analysis service, EMG has an archiving remit. The capture of rich, contextual metadata associated with the sequencing data ensure maximal data longevity and reuse. Over and above this, EMG is also a data generator, in terms of functional and taxonomic annotations, and has already analysed a world leading 100,000 publicly available datasets.

To date, EMG has focused entirely on annotating raw reads. While this provides a comprehensive analysis of all sampled micro-organisms, the disconnected and fragmentary nature of the data has some limitations, e.g. lack of full length peptides. To overcome this, we will expand the service to include assembly of metagenomics data. We will build reproducible workflows (deployable within multiple cloud environments) and develop tools to reveal near complete genome maps for the more abundant organisms found within a sample, or that occur commonly across samples. ENA will be extended to allow more comprehensive capture of this assembly data. We will extend EMG to include a catalogue of metagenome assembled genomes, offering insights into 10,000s of novel microbial genomes.

Planned Impact

Metagenomics is a rapidly expanding field and the depth and breadth of data are constantly increasing. At the same time, experimental approaches for investigating different microbiomes are constantly improving, providing deeper insights into microbes occupying particular environments. The use of metagenomics is widespread in research projects associated with BBSRC strategic priorities - agriculture and food security, industrial biotechnology and bioscience for health - and the field represents the epitome of data driven biology. This proposal will contribute to the continued support and development of the world leading EBI metagenomics (EMG) resource. Moreover, its expansion to offer assembly (and genomic reconstruction) as a public service will make EMG unique in the world of metagenomics analysis provision. Moreover, the application of assembly workflows will be taken to an unprecedented level of scale, scope and precision, allowing even deeper insights into the microbial world. This will enable the scientific community to make the leap from correlative observations to mechanistic hypothesis generation. Such deep knowledge will be of particular importance for cross cutting themes, such as understanding antimicrobial resistance, discovery of new secondary metabolites (e.g antimicrobial agents), host-microbe interactions (plant/animal) and microbial ecology.

The scientific community benefits from EMG in many ways. Primarily it provides freely available services for analysis and archiving (via the ENA) of microbiome sequence data, helping democratise the research field by overcoming limitations of compute and informatics expertise. It also provides a platform for discovery of analysed metagenomics data, already amassing over 100,000 datasets (representing nearly a petabyte of processed data). These are uniformly analysed, enabling comparability and meta-analysis across projects and biomes. Archiving of sequence data with rich experimental metadata also encourages data re-use. Beyond this, EMG outputs will have applications in a wide range of academic and industrial fields, including enzyme discovery, environmental science, diagnostics and animal/human health, as assembly begins to provide a more complete picture of microbial communities.

The results of the project will be of exceptional value to the commercial sector, and the benefits will eventually feed through to the public, in the form of new antibiotics for humans and livestock, higher agricultural yields from the understanding of socio-ecological interplay (e.g., food chain microbes) and expanded discovery of novel enzymes capable of operating at extremes, such as psychrophilic enzymes for detergents, or with novel catalytic functionality (e.g., anaerobic digestion pathways in biofuel production). Industrial partnering has demonstrated that EMG data outputs have increased translation rates within this sector, and continued support for the resource will enhance this.

There are also many technical developments within this project that will have far reaching impacts and can be applied to other analytical disciplines. For example, the use of workflows and containerisation of software for Cloud compute infrastructures will enable a new level of reproducibility and sharing.

We will ensure impact to all academic and industrial audiences by the publication of software, workflows, compute containers and peer reviewed articles. To address the skills shortages in the field of metagenomics informatics, we will also deliver training and webinars.

Metagenomics is pivotal to the notion of One Health - the collaborative effort of multiple disciplines working at national and international levels to to attain optimal health for people, animals and the environment. This proposal (and EMG) encapsulates this philosophy, serving the major UK and international communities, and will deliver a cost effective resource that will become the world's leading microbiome data service.

Publications

10 25 50
publication icon
Allen B (2023) Diversity and metabolic energy in bacteria. in FEMS microbiology letters

 
Title Statistical Tools in Support of the EBI online facility 
Description We have developed simple tools to improve the analysis of metagenomic data submitted to the EBI portal 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Too early to say