Undestanding microbial communities through in situ environmental 'omic data synthesis

Lead Research Organisation: University of Glasgow
Department Name: College of Science and Engineering

Abstract

The purpose of this research is to integrate different sources of 'omics data in environmental science for microbial community analysis. The computational based comparative analysis of DNA sequences may provide information about genome structure, gene function, metabolic and regulatory pathways and how microbial genomes evolve. However, to fully delineate microbial activity and its response to environmental factors, it is necessary to include all levels of gene products, mRNA, protein, metabolites, as well as their interactions. I propose to use large-scale whole genome metagenomic sequencing for assessment of taxonomic and functional diversity of microbial communities. The data generated by metagenomic experiments are both enormous and inherently noisy, containing fragmented DNA sequences representing as many as thousands of microbial species. After using pre-filtering steps, including removal of redundant, low quality sequences, the short DNA sequences are assembled together into longer contigs of overlapping reads, and these contigs may then be scaffolded into full genomes in a bottom-up approach. Having obtained the assembled contigs, the obvious next step is to use publically available databases to annotate the coding regions in these contigs. This will tell us WHAT functionality is available and provide information on WHO is there, the metagenomic sequences are binned, i.e., by associating a particular sequence with an organism. This can be done by either searching for phylogenetic markers or by looking for similar sequences in existing public databases. The end result is the community profile of different samples in terms of organismal abundances within each sample. Whilst metagenomic analysis gives a profile of the microbial community at a specific place or time, and their potential functional, it does not reveal which genes are actually being transcribed. I thus propose to integrate sequencing-based metatranscriptomics in which total RNA (a proxy for gene activity) is extracted from microbial community, converted to cDNA and sequenced without the need for cloning. This will provide information on the regulation and expression profiles of complex communities by enabling quantitative measurements of dynamic expression of RNA molecules and their variation between different states reflecting the genes that are being actively expressed at any given time. However, the story is still far from complete, as we do not have direct evidence of the metabolism within a cell. To give a more complete picture of living organisms, I will integrate metabolomics which will provide unique chemical fingerprints that are a function of specific cellular activity. In particular, the focus will be on identifying habitat-specific endogenous and exogenous metabolites along distinct geochemical conditions. These metabolites will be detected using two-dimensional gas chromatography coupled with mass spectrometry. They will be related to the expression levels from transcriptomes using information on metabolic pathways readily available from annotating metagenomic sequences. In this way we will integrate all three sources of information, mapping the metatranscriptome onto the assembled annotated metagenomes and reconciling the reconstructed metabolic pathways with observations on metabolite concentrations and fluxes. From this we will be able to predict the metabolic function of the entire community not simply who is there.

Planned Impact

The removal of complex organic contaminants from soils will be one of the major environmental challenges facing the United Kingdom over the coming decades and recommendations based on this proposal will be of use to stakeholders especially, the remediation consultants, industry regulators i.e. SEPA and local councils. Brownfield development is an important part of the societal shift towards sustainability. Many contaminated brownfield sites sit unused for decades because the cost of cleaning them is more than the land would be worth after redevelopment. This research will impact on our ability to achieve sustainable reclaim of environmental capital and will allow adaptive re-usability.
The Earth Microbiome Project has generated an enormous collection of data with the intention of producing a global Gene Atlas describing protein space, environmental metabolic models, and characterizing a global environmental parameter space for microbial communities. This global environmental sample database is an ambitious initiative that is community-driven. The tools developed in this fellowship will exploit this vast amount of information to provide useful insights on the Earth's microbiome and to catalogue all the microbes that live on earth. This will be of great benefit to mankind as whole, these microbes are performing vital functions, and to environmental researchers.
Methanogenesis is a key process in the carbon cycle, methane is a more potent greenhouse gas than carbon dioxide, therefore understanding its metabolism at a community level is of fundamental importance if we are to incorporate microbial processes into models of climate change. Methane is an important greenhouse gas yet its production could play a part in the transition to a low carbon economy. Water treatment is the fourth most energy intensive sector in the UK and consumes approximately 1% of the UK's electricity. Reducing the energy required to treat wastewater would therefore have major benefits both by reducing costs and carbon dioxide emissions. Anaerobic digestion (AD) reactors have the potential to provide these benefits. They do not require the same energetically costly aeration as aerobic methods and through the action of methanogens produce biogas. Better understanding of methanogenesis could lead to more efficient AD reactors.

Publications

10 25 50

publication icon
Yin J (2022) A droplet-based microfluidic approach to isolating functional bacteria from gut microbiota. in Frontiers in cellular and infection microbiology

 
Description I established my Environmental'Omics lab in School of Engineering, University of Glasgow in November 2014 which specialises in developing novel pipelines for analysing genomic data in an environmental context. My lab is centered around my fellowship and focuses on microbial ecology at both mesoscopic and macroscopic scales by integrating 'omics data (metagenomics, metatranscriptomics, metabolomics, and metaproteomics) for microbial community analysis.

Software (http://userweb.eng.gla.ac.uk/umer.ijaz/#bioinformatics): Under this grant I am developing software tools and methodologies to integrate different sources of omics data, namely, metagenomics, metatranscriptomics, metabolomics, and metaproteomics. Here are the list of major software I have contributed to during my fellowship:

RvLab (R virtual Laboratory for ecological community analysis)
Software:https://portal.lifewatchgreece.eu/
Reference: A. Oulas et al. Biodiversity Data Journal, 4, e8357, 2016.(doi:10.3897/BDJ.4.e8357)

CONCOCT (A software for binning metagenomic contigs with coverage and composition)
Software: https://github.com/BinPro/CONCOCT
Reference:J. Alneberg et al. Nature Methods, 11(11):144-1146, 2014. (doi:10.1038/NMETH.3103) (PMID:25218180)

TAXAassign (A bash based pipeline for generating taxonomic profiles using NCBI's Taxonomy)
Software: http://www.github.com/umerijaz/taxaassign
Reference: J. Alneberg et al. Nature Methods, 11(11):144-1146, 2014. (doi:10.1038/NMETH.3103) (PMID:25218180)

NMGS (A software for fitting the Unified Neutral Theory of Biodiversity with Hierarchical Dirichlet Proces)
Software: https://github.com/microbiome/NMGS
Reference: K. Harris et al. Proceedings of the IEEE, 105(3):516-529, 2017 (doi:10.1109/JPROC.2015.2428213)

seqenv (A pipeline capable of annotating genetic sequences with Environmental Ontology)
Software: https://bitbucket.org/seqenv/seqenv/src
Reference: L. Sinclair & U. Z. Ijaz et al. PeerJ, e2690, 2016. (doi: 10.7717/peerj.2690)

microbiomeSeq (An R package for microbial community analysis in an environmental context)
Software: https://github.com/umerijaz/microbiomeSeq
Tutorial/Demo: http://userweb.eng.gla.ac.uk/umer.ijaz/projects/microbiomeSeq_Tutorial.html

SeqEnv-Ext (A taxa-centric extension to seqenv pipeline, which consisted of two parts, each providing environmental annotations under different context, with first part providing taxon abundance on a per term basis while the second part lists environmental term abundance under a per taxon context. A separately developed program that required the use of the original seqenv pipeline, this enabled two different methods of viewing environmental annotations, which significantly augments the analysis capability of the pipeline.
Software: http://hie-pub.westernsydney.edu.au/0610b020-39fb-11e7-b55d-525400daae48/
Reference: A. Z. Ijaz, T. Jeffries, U. Z. Ijaz et al. PeerJ, 5:e3827, 2017. (doi:10.7717/peerj.3827)

pyTag (A tool for identification and analyses of ontological terms in application area specific literature surveys)
Software: https://github.com/KociOrges/pytag

NanoAmpli-Seq (A workflow for amplicon sequencing from mixed microbial communities on the nanopore sequencing platform)
Code: https://github.com/umerijaz/nanopore
Reference:
S. T. Calus, U. Z. Ijaz, and A. Pinto. bioRxiv 244517, 2018 (doi: 10.1101/244517)


Orion Cluster: Without any institutional or dedicated technical support, I have single-handedly built and managed an HPC facility in Engineering called Orion Cluster (http://userweb.eng.gla.ac.uk/umer.ijaz/#orion). I bought first server in 2012 through the Unilever grant and since then I have religiously pursued my collaborators for in-kind contributions, as well as allocating small equipment budget on every grant I am applying. Five years later, I have spent ~£114K on 13 servers with more equipment to be purchased in two months time through recently allocated £22K (on SAIC) grant. Orion Cluster stands at an operational capacity of 368 cores, ~450TB disk space, and will serve >70 PGR/T and staff (60 existing and regular users and hence the reason why I have an increasing supervision workload). This facility now sits at the heart of all major research groups I am involved with and is envy of many others. One of the reason why I have managed to attract funding and collaborators is through development of bespoke workflows (originating from my research) that I regularly updates and share on my website (http://userweb.eng.gla.ac.uk/umer.ijaz/#bioinformatics; http://www.tinyurl.com/JCBioinformatics; and http://www.tinyurl.com/JCBioinformatics2) as well as providing a single place for >400 bioinformatics tools. My cluster and bioinformatics tutorials are of strategic importance

Expansion to other technologies/hardware and award generation (http://userweb.eng.gla.ac.uk/umer.ijaz/#research_Grants): The developed tools/software methodologies and the research being conducted under my NERC fellowship was instrumental in getting further funding from numerous research councils. This includes recent expansion to population genomics and epidemiology (Scottish Infection Research Network/Chief Scientist Office Project entitled "Molecular epidemiology of Clostridium difficile in Scotland: developing novel, clinically applicable research methods to combine genomic analysis with health informatics"). For the past one year, I am trying to put my engineering experience to good use, by expanding my research to include: Raman spectroscopy enabled microfluidics (NERC NE/P003826/1 grant entitled "Stable Isotope Probing with Resonance Raman Cell Sorting to profile influence of ocean acidification on microbial carbon fixation"); hardware system integrating liquid handling, incubation and sensing with an embodied genetic algorithm, which directs evolutionary optimisation of microbial growth (with Professor William T Sloan, University of Glasgow; EPSRC Global Challenges Research Fund EP/P029329/1); and development of artificial intestinal Salmon gut system through bioreactors (BBSRC BB/P001203/1 grant entitled "A microbial basis for Atlantic Salmon energetics").

Supervision (http://userweb.eng.gla.ac.uk/umer.ijaz/#supervisions): I have been directly involved with the supervision of 13 PhD students and 2 PDRAs (with more to be recruited). Two PGR students (Caitlin Jukes, and Asha Rani) have recently defended their viva successfully. All of my supervisions involve utilisation of tools developed under my NERC grant.

Repute: I have gained considerable repute at both national and international levels. I am collaborating widely with academics located in Manchester, Warwick, Dundee, Aberdeen, Liverpool, Norwich, Reading, London, Belgium, Finland, Greece, Norway, Ireland, Austria, Thailand, Czech republic, Australia, Germany, France, and Netherlands. As a consequence I have been invited to visit/speak at numerous institutes including: Faculty of Science, Ceské Budejovice; Helenic Centre for Marine Research, Greece; Centre for Microbial Ecology and Technology, Ghent, Belgium; Edinburgh Amplicon Sequencing Group; Earhlam Institute (formerly TGAC); Unilever R&D laboratories (Colworth/Port Sunlight); London School of Hygiene and Tropical Medicine; and Health Informatics Centre Dundee. My research leadership potential was recognized by NERC who funded me to attend a £23,100 advanced leadership course in Cambridge.
Exploitation Route Please see the section on "What have you discovered or developed through the research funded on this grant"
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Healthcare

URL http://userweb.eng.gla.ac.uk/umer.ijaz
 
Description The analytical tools led to the exploration and in turn development of a CD-TREAT diet for treatment of Crohn's Disease https://www.medpagetoday.com/gastroenterology/inflammatoryboweldisease/76931and was published in Gastroenterology 2019
Sector Agriculture, Food and Drink,Education,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Title Supporting data for "NanoAmpli-Seq: A workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform." 
Description Amplicon sequencing on Illumina sequencing platforms leverages their deep sequencing and multiplexing capacity, but is limited in genetic resolution due to short read lengths. While Oxford Nanopore or Pacific Biosciences platforms overcome this limitation, their application has been limited due to higher error rates or smaller data output. In this study, we introduce an amplicon sequencing workflow, i.e., NanoAmpli-Seq, that builds on Intramolecular-ligated Nanopore Consensus Sequencing (INC-Seq) approach and demonstrate its application for full-length 16S rRNA gene sequencing. NanoAmpli-Seq includes vital improvements to the aforementioned protocol that reduces sample-processing time while significantly improving sequence accuracy. The developed protocol includes chopSeq software for fragmentation and read orientation correction of INC-Seq consensus reads while nanoClust algorithm was designed for read partitioning-based de novo clustering and within cluster consensus calling to obtain full-length 16S rRNA gene sequences. The datafiles and protocols provided here represent the intermediate files during data processing and associated detailed workflow. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes