EBI Metagenomics - enabling the reconstruction of microbial populations

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty


Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Planned Impact

Metagenomics is a rapidly expanding field and the depth and breadth of data are constantly increasing. At the same time, experimental approaches for investigating different microbiomes are constantly improving, providing deeper insights into microbes occupying particular environments. The use of metagenomics is widespread in research projects associated with BBSRC strategic priorities - agriculture and food security, industrial biotechnology and bioscience for health - and the field represents the epitome of data driven biology. This proposal will contribute to the continued support and development of the world leading EBI metagenomics (EMG) resource. Moreover, its expansion to offer assembly (and genomic reconstruction) as a public service will make EMG unique in the world of metagenomics analysis provision. Moreover, the application of assembly workflows will be taken to an unprecedented level of scale, scope and precision, allowing even deeper insights into the microbial world. This will enable the scientific community to make the leap from correlative observations to mechanistic hypothesis generation. Such deep knowledge will be of particular importance for cross cutting themes, such as understanding antimicrobial resistance, discovery of new secondary metabolites (e.g antimicrobial agents), host-microbe interactions (plant/animal) and microbial ecology.

The scientific community benefits from EMG in many ways. Primarily it provides freely available services for analysis and archiving (via the ENA) of microbiome sequence data, helping democratise the research field by overcoming limitations of compute and informatics expertise. It also provides a platform for discovery of analysed metagenomics data, already amassing over 100,000 datasets (representing nearly a petabyte of processed data). These are uniformly analysed, enabling comparability and meta-analysis across projects and biomes. Archiving of sequence data with rich experimental metadata also encourages data re-use. Beyond this, EMG outputs will have applications in a wide range of academic and industrial fields, including enzyme discovery, environmental science, diagnostics and animal/human health, as assembly begins to provide a more complete picture of microbial communities.

The results of the project will be of exceptional value to the commercial sector, and the benefits will eventually feed through to the public, in the form of new antibiotics for humans and livestock, higher agricultural yields from the understanding of socio-ecological interplay (e.g., food chain microbes) and expanded discovery of novel enzymes capable of operating at extremes, such as psychrophilic enzymes for detergents, or with novel catalytic functionality (e.g., anaerobic digestion pathways in biofuel production). Industrial partnering has demonstrated that EMG data outputs have increased translation rates within this sector, and continued support for the resource will enhance this.

There are also many technical developments within this project that will have far reaching impacts and can be applied to other analytical disciplines. For example, the use of workflows and containerisation of software for Cloud compute infrastructures will enable a new level of reproducibility and sharing.

We will ensure impact to all academic and industrial audiences by the publication of software, workflows, compute containers and peer reviewed articles. To address the skills shortages in the field of metagenomics informatics, we will also deliver training and webinars.

Metagenomics is pivotal to the notion of One Health - the collaborative effort of multiple disciplines working at national and international levels to to attain optimal health for people, animals and the environment. This proposal (and EMG) encapsulates this philosophy, serving the major UK and international communities, and will deliver a cost effective resource that will become the world's leading microbiome data service.


10 25 50
Description We developed a pipeline during this project to extract genomes of microbes from time series of DNA extracted directly from communities. This enables us to understand what organisms are in a community and what they are doing. We have applied this pipeline in many areas of research relevant to health and biotechnology. Including studies of dietary treatments for Crohn's disease and industrial biotechnology.
Exploitation Route Other individuals can use our pipeline in their research projects whenever microbial communities are studied and the scientific conclusions from our study will be relevant in both medicine and engineering.
Sectors Agriculture, Food and Drink,Energy,Healthcare

Description We have generated a database of other 2,000 genomes generated from UK industrial anaerobic digestion reactors. This has enabled us to profile the community dynamics of these reactors over time and relate to operating conditions. The results are relevant to the industrial operators and were fed back to them during a workshop in 2021.
First Year Of Impact 2020
Sector Energy
Impact Types Economic

Title Metahood 
Description Metahood is a snakemake based metagenomics pipeline. What does the pipeline do : sample qualitycheck/trimming assemblies / co-assemblies binning (Concoct/Metabat2) de novo tree construction for mags diamond annotation and profiles output annotated orf graphs (derived from assembly graph), TO_FIX Strain resolution (Desman) 
Type Of Technology Software 
Year Produced 2020 
Impact This pipeline has been used for generating a large collections of genomes from anaerobic digesters. 
URL https://github.com/Sebastien-Raguideau/Metahood