Strain resolved metagenomics for medical microbiology

Lead Research Organisation: University of Warwick
Department Name: Warwick Medical School


A great diversity of microbes, live on or in the human body. In particular, the human gut contains hundreds or thousands of species, what is more every individual's microbiota is different. The exact composition of species varies from one person to another. We now realise that this microbiome plays a key role in many aspects of human health including providing nutrients and protection against disease. They have even been implicated in mental health. To study the microbiome in its entirety we sequence the DNA of all members of the community simultaneously, the metagenome. The challenge is then to reconstruct, or assemble, the individual genomes of the community members particularly when the fragments of DNA obtained, the sequence reads, are short, and similar strains of the same species can co-occur.

This proposed project will develop a new software tool for resolving genomes from metagenomes. It will better exploit two sources of information that have been under-utilised previously. The first is time series information where multiple samples are available from the same community and the members of the community are fluctuating in abundance. This signal can be used to link small scale differences between very similar strains in the community and resolve genomes to high resolution. The second are long reads that are being generated by new sequencing technologies such as Nanopore. These are noisy but they can be used to determine large scale differences in genome structure unambiguously.

The resulting (bioinformatics) tools will enable the research community to understand microbiome composition at high resolution. This could have major implications for our understanding of the microbiome in human health. For example, we will use it to resolve changes in the gut microbiota of children with Crohn's disease when they undergo treatment with a therapeutic diet. Crohn's disease is an inflammatory condition of the gut, where the microbiome exhibits a disturbed or dysbiotic state. If we can resolve exactly which strains increase or decrease in abundance in children who get better when they are treated, then we should be able to understand how the diet works and develop better diets that are personalised to the very different microbiotas of the individual children.

This is just one example, these methods could result in a fundamental improvement in our understanding of the microbiome in other inflammatory bowel diseases, e.g. ulcerative colitis, and their treatment with methods such as fecal microbiome transplants. Furthermore, microbial communities (MCs) are important in multiple other areas in addition to human health, including agriculture and biotechnology. These methods will be a major step forward in our ability to study these communities.

Technical Summary

The aim of this proposal is to develop methods to extract strain genomes directly from metagenome data. This has the potential to revolutionise the in situ study of microbial communities by allowing us to determine the medically relevant microbial units, their abundances and their genomes. This will be a substantial leap forward from 16S rRNA gene taxonomic profiling or read based metagenomics that treats the community as a 'bag of genes'. Both ignore the genome context of predicted genes, crucial information for inferring pathogenicity or the functional role of commensal organisms in the human microbiota.

Assembly of metagenome data is challenging because species can be present in multiple strains. This leads to repeat regions between strains, and complex assembly graphs that give fragmentary assemblies. It is possible to bin these fragments together as a post assembly step using information on abundance across samples but it would be more powerful to integrate this information directly into the assembly process. We will develop Bayesian algorithms based on graphical models to find the optimal paths and their number through the graph. Each path will then define a strain haplotype at high resolution. Long repeats may still cause issues and to address this we will devise the graph algorithms such that long read technologies can also be used. This will allow simultaneous resolution of strain haplotypes on shared regions and their long-range accessory genome structure.

We will apply these methods to a longitudinal study of changes in the microbiota of patients with pediatric Crohn's disease (CD) during treatment with exclusive enteral nutrition (EEN). We will resolve changes in abundance of microbes at the strain level for children that do or do not enter remission. We will use CD as a test bed for these methods, but this paradigm will be universally applicable to microbial communities from diverse environments ranging from the human gut through to soil.

Planned Impact

The research by providing tools that will allow the improved resolution of microbial genomes from communities will have a diverse range of impacts beyond academia, wherever microbial communities are important. To maximize the industrial, regulatory, and scientific impact of our proposed methodology, we have identified the following major areas where the research will be relevant:

Human health and disease
Understanding the role of the microbiome in health and disease could have major impact both on people suffering from specific disorders where the microbiome plays a role to more general improvements in health. The first step in understanding the microbiome is to fully resolve the organisms that are there and our approach could provide a method to do this in a cost effective and convenient manner. An obvious example of impact in human health would be the pediatric Crohn's disease example that we include in the proposal. Identifying the key strains involved in improvement in symptoms during treatment with EEN should enable us to optimise treatment. For instance, we could tailor treatments depending on the initial microbial community of the individual, adding probiotics if they lack some of the key strains associated with reduced inflammation. The result could be more effective individualised treatment with major benefits for the patient. There are numerous other examples where our methods could enable improved treatments based on microbiome modulation. Fecal microbiome transplants are a well-established method for treating Clostridium difficile infections and are being trialled for other diseases such as ulcerative colitis. However, there are drawbacks to transferring an entire microbiota complete with potential pathogens and unknown metabolites. Our method could allow the critical strains in the response to FMT to be identified. These could then be targeted for isolation and used to create controlled mixtures of known organisms with the same effectiveness as the original FMT. The potential health benefits to patients of a more controlled alternative to FMT are enormous, as are the potential opportunities to biotech firms in commercialising them.

Agricultural Impact
The microbiome of domestic animals has a major impact on their health and growth. The rumen microbiome of cattle and sheep, in particular, converts indigestible cellulose into compounds that can be used by the animal. They are also responsible for methane emissions. The rumen microbiome is far less studied than the human microbiome and a method that can de novo resolve the genomes of the strains present may have even more impact in this field. Enabling the rumen microbiome to be engineered for improved growth with obvious benefits to the livestock industry but also potentially to reduce methane emissions, which could benefit the entirety of mankind as methane from livestock is a major contributor to global warming.

Industrial Impact
A better understanding of microbial communities (MCs) is needed to achieve improved outputs from biotechnological applications of MCs such as industrial anaerobic digestion (AD) plants. In particular, we need to identify possible correlations between MC structure/function and the performance/stability of AD reactors. Current approaches to this problem, however, are limited by the lack of high resolution data on community structure. The methods developed here will address this with resulting industrial impact potentially allowing improved reactor efficiencies.
Title STRONG - Strain Resolution ON Graphs 
Description STRONG resolves strains on assembly graphs by resolving variants on core COGs using co-occurrence across multiple samples. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Enabled strain resolution in microbiomes.