Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications

Lead Research Organisation: University of Edinburgh

Department Name: The Roslin Institute

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Three major new areas of activity are proposed to enrich MGnify and meet the evolving demands of microbiome research: (i) improve the MGnify bacterial genomes and enable their incorporation into the Genome Taxonomy database (GTDB); (ii) develop pipelines to facilitate the recovery of Eukaryotic genomes; (iii) identify and annotate viruses found in MGnify assemblies to enrich MGnify genomes. This proposal also describes significant updates to the MGnify analysis pipelines and the infrastructure underpinning the resource. To achieve this we will undertake the following key developments:
1. Incorporate the latest biological information by updating the reference DB used in the MGnify analysis pipelines and the associated FAIR workflow descriptions.
2. Develop and apply an improved profile HMM library for the detection of CAZymes by utilising metagenomic sequences so as to improve their sensitivity. These will be integrated into an annotation system that will also help to detect polysaccharide utilisation loci.
3. Extend client side validation tools and interfaces to enable easier submission of metagenomics datasets, including MAGs, and enrich internal access and control mechanisms between ENA and MGnify.
4. Assemble a pipeline that extends beyond the standard single copy marker genes to facilitate the systematic detection of contaminating contigs within MAGs, to produce a refined set of prokaryotic MAGs.
5. Co-develop a cloud based framework to generate the non-redundant set of MGnify MAGs and the GTDB taxonomy, and extend GTDB to incorporate MAGs, thus accurately reflecting the taxonomic diversity of prokaryotes.
6. Initiate a collection of Eukaryotic MAGs by developing a novel binning and refinement workflow.
7. Systematically detect and cluster viral sequences, enriching them with taxonomy, functional annotations and environmental metadata to produce a viral catalogue. Use computational methods to link phages to bacterial hosts, thereby connecting catalogues.

Funded Value:

£219,202

Funded Period:

Dec 21 - Dec 24

Funder:

BBSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

BB/V018450/1

Principal Investigator:

Laura Glendinning

Michael Watson

Research Subject:

Microbial sciences (64%)

Omic sciences & technologies (32%)

Research Topic:

Environmental Genomics (32%)

Environmental Microbiology (32%)

Microbiology (32%)

Organisations

University of Edinburgh (Lead Research Organisation)

People	ORCID iD
Laura Glendinning (Principal Investigator)	http://orcid.org/0000-0003-4789-6644
Michael Watson (Principal Investigator)

Publications

Author Name Title Publication

Date Published

10 25 50

Mattock J (2023) KOunt: a reproducible KEGG orthologue abundance workflow. in Bioinformatics (Oxford, England)

Mattock J (2023) A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. in Nature methods

Research Databases and Models


Title	watson_and_mattock_v1.tar.gz
Description	Data in support of "A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination", Mick Watson and Jennifer Mattock, 2022
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://figshare.com/articles/dataset/watson_and_mattock_v1_tar_gz/19733509

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications