Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Three major new areas of activity are proposed to enrich MGnify and meet the evolving demands of microbiome research: (i) improve the MGnify bacterial genomes and enable their incorporation into the Genome Taxonomy database (GTDB); (ii) develop pipelines to facilitate the recovery of Eukaryotic genomes; (iii) identify and annotate viruses found in MGnify assemblies to enrich MGnify genomes. This proposal also describes significant updates to the MGnify analysis pipelines and the infrastructure underpinning the resource. To achieve this we will undertake the following key developments:
1. Incorporate the latest biological information by updating the reference DB used in the MGnify analysis pipelines and the associated FAIR workflow descriptions.
2. Develop and apply an improved profile HMM library for the detection of CAZymes by utilising metagenomic sequences so as to improve their sensitivity. These will be integrated into an annotation system that will also help to detect polysaccharide utilisation loci.
3. Extend client side validation tools and interfaces to enable easier submission of metagenomics datasets, including MAGs, and enrich internal access and control mechanisms between ENA and MGnify.
4. Assemble a pipeline that extends beyond the standard single copy marker genes to facilitate the systematic detection of contaminating contigs within MAGs, to produce a refined set of prokaryotic MAGs.
5. Co-develop a cloud based framework to generate the non-redundant set of MGnify MAGs and the GTDB taxonomy, and extend GTDB to incorporate MAGs, thus accurately reflecting the taxonomic diversity of prokaryotes.
6. Initiate a collection of Eukaryotic MAGs by developing a novel binning and refinement workflow.
7. Systematically detect and cluster viral sequences, enriching them with taxonomy, functional annotations and environmental metadata to produce a viral catalogue. Use computational methods to link phages to bacterial hosts, thereby connecting catalogues.

Publications

10 25 50
 
Title watson_and_mattock_v1.tar.gz 
Description Data in support of "A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination", Mick Watson and Jennifer Mattock, 2022 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/watson_and_mattock_v1_tar_gz/19733509