Resolving fine scale variation from microbial metagenomes

Lead Research Organisation: University of Warwick
Department Name: Warwick Medical School


The direct sequencing of DNA from the environment, metagenomics, has transformed microbial ecology. Providing access to the genomic content of hitherto unculturable organisms. Statistical challenges still remain in interpreting this data, however. The most important is probably the high resolution assembly of microbial strains direct from short read metagenome sequences. This will form the terminal aim of this studentship. We will build towards this goal by beginning with the creation of suitable in silico data sets to test approaches. Followed by the tackling of the simpler problem of resolving variants that are present in amplicon data. Finally, we will integrate existing approaches that use co-occurrence across samples to resolve variants on contigs directly into metagenome assemblers. To achieve these aims we will exploit a variety of statistical methods including Bayesian non-parametrics and variational approximations.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509401/1 01/10/2015 30/09/2020
1790490 Studentship EP/N509401/1 19/09/2016 18/09/2020 Leonidas Souliotis
Description We developed a method to distinguish different genomes in a complex metagenomic sample on a species level. This method seems to outperform the golden standard when the number of present genomes is <100, but still competitive in larger communities
Exploitation Route More accurate metagenomic contigs binners can result as a stepping stone to deeper analyses (eg strain detection) or detecting new species, as they are not restricted by any reference-based methods.
Sectors Chemicals,Healthcare,Pharmaceuticals and Medical Biotechnology

Description As this research is co-funded by Unilever, we work with confidential data coming from Unilever, which Unilever can use the results for it own purposes.
First Year Of Impact 2017
Sector Healthcare
Title Dirichlet process mixtures for binning 
Description CONCOCT is a tool for automated binning of metagenomic contigs using Gaussian mixture models to model to model the different microbial community. The method the model is fitted is called automatic relevance determination. We propose a different fitting method which is called the Dirichlet Process Gaussian mixture model, which deals the number of communities differently than CONCOCT. We apply this models using both a Gibbs sampler and a standard Variational Inference 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact While testing and comparing our proposed method compared to CONCOCT (which is the golden standard among many available metagenomics contigs binners), we can see that it perform better than CONCOCT in communities where the number of communities is less than 100. In larger communities, our proposed method is still outperformed by CONCOCT.