Resolving fine scale variation from microbial metagenomes
Lead Research Organisation:
University of Warwick
Department Name: Warwick Medical School
Abstract
The direct sequencing of DNA from the environment, metagenomics, has transformed microbial ecology. Providing access to the genomic content of hitherto unculturable organisms. Statistical challenges still remain in interpreting this data, however. The most important is probably the high resolution assembly of microbial strains direct from short read metagenome sequences. This will form the terminal aim of this studentship. We will build towards this goal by beginning with the creation of suitable in silico data sets to test approaches. Followed by the tackling of the simpler problem of resolving variants that are present in amplicon data. Finally, we will integrate existing approaches that use co-occurrence across samples to resolve variants on contigs directly into metagenome assemblers. To achieve these aims we will exploit a variety of statistical methods including Bayesian non-parametrics and variational approximations.
People |
ORCID iD |
Christopher Quince (Primary Supervisor) | |
Leonidas Souliotis (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509401/1 | 30/09/2015 | 25/02/2022 | |||
1790490 | Studentship | EP/N509401/1 | 18/09/2016 | 16/09/2020 | Leonidas Souliotis |
Description | We developed a method to distinguish different genomes in a complex metagenomic sample on a species level. This method seems to outperform the golden standard when the number of present genomes is <100, but still competitive in larger communities |
Exploitation Route | More accurate metagenomic contigs binners can result as a stepping stone to deeper analyses (eg strain detection) or detecting new species, as they are not restricted by any reference-based methods. |
Sectors | Chemicals Healthcare Pharmaceuticals and Medical Biotechnology |
Description | As this research is co-funded by Unilever, we work with confidential data coming from Unilever, which Unilever can use the results for it own purposes. |
First Year Of Impact | 2017 |
Sector | Healthcare |
Title | Dirichlet process mixtures for binning |
Description | CONCOCT is a tool for automated binning of metagenomic contigs using Gaussian mixture models to model to model the different microbial community. The method the model is fitted is called automatic relevance determination. We propose a different fitting method which is called the Dirichlet Process Gaussian mixture model, which deals the number of communities differently than CONCOCT. We apply this models using both a Gibbs sampler and a standard Variational Inference |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | While testing and comparing our proposed method compared to CONCOCT (which is the golden standard among many available metagenomics contigs binners), we can see that it perform better than CONCOCT in communities where the number of communities is less than 100. In larger communities, our proposed method is still outperformed by CONCOCT. |