EBI Metagenomics - enabling the reconstruction of microbial populations

Lead Research Organisation: University of Warwick

Department Name: Warwick Medical School

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Metagenomics is a widely used approach to investigate the composition and function of microbial communities. With the development of modern sequencing platforms, data generation is rarely the bottleneck, but rather its analysis. Even when researchers have access to large-scale computing facilities, two metagenomics datasets are rarely analysed in the same way and the workflows used to produce results are virtually impossible to reconstruct. The EBI metagenomics (EMG) resource solves all of the above problems by providing a freely available service for the analysis and archiving (via the European Nucleotide Archive, (ENA)) of metagenomics data. It also provides a platform for the discovery of analysed metagenomics datasets. As these are uniformly analysed, it enables comparability and meta-analysis across projects and biomes. Unlike any other public analysis service, EMG has an archiving remit. The capture of rich, contextual metadata associated with the sequencing data ensure maximal data longevity and reuse. Over and above this, EMG is also a data generator, in terms of functional and taxonomic annotations, and has already analysed a world leading 100,000 publicly available datasets.

To date, EMG has focused entirely on annotating raw reads. While this provides a comprehensive analysis of all sampled micro-organisms, the disconnected and fragmentary nature of the data has some limitations, e.g. lack of full length peptides. To overcome this, we will expand the service to include assembly of metagenomics data. We will build reproducible workflows (deployable within multiple cloud environments) and develop tools to reveal near complete genome maps for the more abundant organisms found within a sample, or that occur commonly across samples. ENA will be extended to allow more comprehensive capture of this assembly data. We will extend EMG to include a catalogue of metagenome assembled genomes, offering insights into 10,000s of novel microbial genomes.

Planned Impact

Metagenomics is a rapidly expanding field and the depth and breadth of data are constantly increasing. At the same time, experimental approaches for investigating different microbiomes are constantly improving, providing deeper insights into microbes occupying particular environments. The use of metagenomics is widespread in research projects associated with BBSRC strategic priorities - agriculture and food security, industrial biotechnology and bioscience for health - and the field represents the epitome of data driven biology. This proposal will contribute to the continued support and development of the world leading EBI metagenomics (EMG) resource. Moreover, its expansion to offer assembly (and genomic reconstruction) as a public service will make EMG unique in the world of metagenomics analysis provision. Moreover, the application of assembly workflows will be taken to an unprecedented level of scale, scope and precision, allowing even deeper insights into the microbial world. This will enable the scientific community to make the leap from correlative observations to mechanistic hypothesis generation. Such deep knowledge will be of particular importance for cross cutting themes, such as understanding antimicrobial resistance, discovery of new secondary metabolites (e.g antimicrobial agents), host-microbe interactions (plant/animal) and microbial ecology.

The scientific community benefits from EMG in many ways. Primarily it provides freely available services for analysis and archiving (via the ENA) of microbiome sequence data, helping democratise the research field by overcoming limitations of compute and informatics expertise. It also provides a platform for discovery of analysed metagenomics data, already amassing over 100,000 datasets (representing nearly a petabyte of processed data). These are uniformly analysed, enabling comparability and meta-analysis across projects and biomes. Archiving of sequence data with rich experimental metadata also encourages data re-use. Beyond this, EMG outputs will have applications in a wide range of academic and industrial fields, including enzyme discovery, environmental science, diagnostics and animal/human health, as assembly begins to provide a more complete picture of microbial communities.

The results of the project will be of exceptional value to the commercial sector, and the benefits will eventually feed through to the public, in the form of new antibiotics for humans and livestock, higher agricultural yields from the understanding of socio-ecological interplay (e.g., food chain microbes) and expanded discovery of novel enzymes capable of operating at extremes, such as psychrophilic enzymes for detergents, or with novel catalytic functionality (e.g., anaerobic digestion pathways in biofuel production). Industrial partnering has demonstrated that EMG data outputs have increased translation rates within this sector, and continued support for the resource will enhance this.

There are also many technical developments within this project that will have far reaching impacts and can be applied to other analytical disciplines. For example, the use of workflows and containerisation of software for Cloud compute infrastructures will enable a new level of reproducibility and sharing.

We will ensure impact to all academic and industrial audiences by the publication of software, workflows, compute containers and peer reviewed articles. To address the skills shortages in the field of metagenomics informatics, we will also deliver training and webinars.

Metagenomics is pivotal to the notion of One Health - the collaborative effort of multiple disciplines working at national and international levels to to attain optimal health for people, animals and the environment. This proposal (and EMG) encapsulates this philosophy, serving the major UK and international communities, and will deliver a cost effective resource that will become the world's leading microbiome data service.

Funded Value:

£192,609

Funded Period:

Sep 18 - Sep 20

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/R015171/1

Principal Investigator:

Christopher Quince

Research Subject:

Bioengineering (18%)

Microbial sciences (9%)

Omic sciences & technologies (27%)

Tools, technologies & methods (45%)

Research Topic:

Biochemical engineering (18%)

Bioinformatics (27%)

Environmental Informatics (18%)

Genomics (27%)

Responses to environment (9%)

Organisations

University of Warwick (Lead Research Organisation)

People	ORCID iD
Christopher Quince (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Delmont TO (2018) Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. in Nature microbiology

Lee K (2020) Mobile resistome of human gut and pathogen drives anthropogenic bloom of antibiotic resistance. in Microbiome

Pasolli E (2019) Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle. in Cell

Quince C (2021) STRONG: metagenomics strain resolution on assembly graphs. in Genome biology

Sheridan PO (2020) Gene duplication drives genome expansion in a major lineage of Thaumarchaeota. in Nature communications

Yang C (2019) Recent mixing of Vibrio parahaemolyticus populations. in The ISME journal

Key Findings
Impact Summary
Software and Technical Products


Description	We have generated over 2,000 genomes from metagenome samples from anaerobic digester reactors. This has expanded our understanding of the diversity of organisms associated with this green biotechnology. Potentially leading to more efficient biogas production in the future.
Exploitation Route	The organisms that we have found could form the basis of experimental investigations e.g. isolation and culturing for their biotech applications.
Sectors	Environment,Manufacturing, including Industrial Biotechology


Description	We have started feeding back the results of the anaerobic digester community analysis to the operators. We recently February 24th 2021 held a virtual meeting with five industrial AD operators involved in the study where we explained our results on community structure and discussed their relevance to reactor operation.
First Year Of Impact	2020
Sector	Manufacturing, including Industrial Biotechology
Impact Types	Economic


Title	Metahood
Description	Metahood is a snakemake based metagenomics pipeline. What does the pipeline do : sample qualitycheck/trimming assemblies / co-assemblies binning (Concoct/Metabat2) de novo tree construction for mags diamond annotation and profiles output annotated orf graphs (derived from assembly graph), TO_FIX Strain resolution (Desman)
Type Of Technology	Software
Year Produced	2020
Impact	This pipeline has been used for generating a large collections of genomes from anaerobic digesters.
URL	https://github.com/Sebastien-Raguideau/Metahood

Abstract

Technical Summary

Planned Impact

Organisations

People

ORCID iD

Publications