Enriching MGnify Genomes to capture the full spectrum of the microbiota and bolster taxonomic classifications
Lead Research Organisation:
University of Edinburgh
Department Name: The Roslin Institute
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Three major new areas of activity are proposed to enrich MGnify and meet the evolving demands of microbiome research: (i) improve the MGnify bacterial genomes and enable their incorporation into the Genome Taxonomy database (GTDB); (ii) develop pipelines to facilitate the recovery of Eukaryotic genomes; (iii) identify and annotate viruses found in MGnify assemblies to enrich MGnify genomes. This proposal also describes significant updates to the MGnify analysis pipelines and the infrastructure underpinning the resource. To achieve this we will undertake the following key developments:
1. Incorporate the latest biological information by updating the reference DB used in the MGnify analysis pipelines and the associated FAIR workflow descriptions.
2. Develop and apply an improved profile HMM library for the detection of CAZymes by utilising metagenomic sequences so as to improve their sensitivity. These will be integrated into an annotation system that will also help to detect polysaccharide utilisation loci.
3. Extend client side validation tools and interfaces to enable easier submission of metagenomics datasets, including MAGs, and enrich internal access and control mechanisms between ENA and MGnify.
4. Assemble a pipeline that extends beyond the standard single copy marker genes to facilitate the systematic detection of contaminating contigs within MAGs, to produce a refined set of prokaryotic MAGs.
5. Co-develop a cloud based framework to generate the non-redundant set of MGnify MAGs and the GTDB taxonomy, and extend GTDB to incorporate MAGs, thus accurately reflecting the taxonomic diversity of prokaryotes.
6. Initiate a collection of Eukaryotic MAGs by developing a novel binning and refinement workflow.
7. Systematically detect and cluster viral sequences, enriching them with taxonomy, functional annotations and environmental metadata to produce a viral catalogue. Use computational methods to link phages to bacterial hosts, thereby connecting catalogues.
1. Incorporate the latest biological information by updating the reference DB used in the MGnify analysis pipelines and the associated FAIR workflow descriptions.
2. Develop and apply an improved profile HMM library for the detection of CAZymes by utilising metagenomic sequences so as to improve their sensitivity. These will be integrated into an annotation system that will also help to detect polysaccharide utilisation loci.
3. Extend client side validation tools and interfaces to enable easier submission of metagenomics datasets, including MAGs, and enrich internal access and control mechanisms between ENA and MGnify.
4. Assemble a pipeline that extends beyond the standard single copy marker genes to facilitate the systematic detection of contaminating contigs within MAGs, to produce a refined set of prokaryotic MAGs.
5. Co-develop a cloud based framework to generate the non-redundant set of MGnify MAGs and the GTDB taxonomy, and extend GTDB to incorporate MAGs, thus accurately reflecting the taxonomic diversity of prokaryotes.
6. Initiate a collection of Eukaryotic MAGs by developing a novel binning and refinement workflow.
7. Systematically detect and cluster viral sequences, enriching them with taxonomy, functional annotations and environmental metadata to produce a viral catalogue. Use computational methods to link phages to bacterial hosts, thereby connecting catalogues.
Organisations
Publications
Mattock J
(2023)
KOunt: a reproducible KEGG orthologue abundance workflow.
in Bioinformatics (Oxford, England)
Mattock J
(2023)
A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination.
in Nature methods
Title | watson_and_mattock_v1.tar.gz |
Description | Data in support of "A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination", Mick Watson and Jennifer Mattock, 2022 |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://figshare.com/articles/dataset/watson_and_mattock_v1_tar_gz/19733509 |