Undestanding microbial communities through in situ environmental 'omic data synthesis

Lead Research Organisation: University of Glasgow
Department Name: College of Science and Engineering

Abstract

The purpose of this research is to integrate different sources of 'omics data in environmental science for microbial community analysis. The computational based comparative analysis of DNA sequences may provide information about genome structure, gene function, metabolic and regulatory pathways and how microbial genomes evolve. However, to fully delineate microbial activity and its response to environmental factors, it is necessary to include all levels of gene products, mRNA, protein, metabolites, as well as their interactions. I propose to use large-scale whole genome metagenomic sequencing for assessment of taxonomic and functional diversity of microbial communities. The data generated by metagenomic experiments are both enormous and inherently noisy, containing fragmented DNA sequences representing as many as thousands of microbial species. After using pre-filtering steps, including removal of redundant, low quality sequences, the short DNA sequences are assembled together into longer contigs of overlapping reads, and these contigs may then be scaffolded into full genomes in a bottom-up approach. Having obtained the assembled contigs, the obvious next step is to use publically available databases to annotate the coding regions in these contigs. This will tell us WHAT functionality is available and provide information on WHO is there, the metagenomic sequences are binned, i.e., by associating a particular sequence with an organism. This can be done by either searching for phylogenetic markers or by looking for similar sequences in existing public databases. The end result is the community profile of different samples in terms of organismal abundances within each sample. Whilst metagenomic analysis gives a profile of the microbial community at a specific place or time, and their potential functional, it does not reveal which genes are actually being transcribed. I thus propose to integrate sequencing-based metatranscriptomics in which total RNA (a proxy for gene activity) is extracted from microbial community, converted to cDNA and sequenced without the need for cloning. This will provide information on the regulation and expression profiles of complex communities by enabling quantitative measurements of dynamic expression of RNA molecules and their variation between different states reflecting the genes that are being actively expressed at any given time. However, the story is still far from complete, as we do not have direct evidence of the metabolism within a cell. To give a more complete picture of living organisms, I will integrate metabolomics which will provide unique chemical fingerprints that are a function of specific cellular activity. In particular, the focus will be on identifying habitat-specific endogenous and exogenous metabolites along distinct geochemical conditions. These metabolites will be detected using two-dimensional gas chromatography coupled with mass spectrometry. They will be related to the expression levels from transcriptomes using information on metabolic pathways readily available from annotating metagenomic sequences. In this way we will integrate all three sources of information, mapping the metatranscriptome onto the assembled annotated metagenomes and reconciling the reconstructed metabolic pathways with observations on metabolite concentrations and fluxes. From this we will be able to predict the metabolic function of the entire community not simply who is there.

Planned Impact

The removal of complex organic contaminants from soils will be one of the major environmental challenges facing the United Kingdom over the coming decades and recommendations based on this proposal will be of use to stakeholders especially, the remediation consultants, industry regulators i.e. SEPA and local councils. Brownfield development is an important part of the societal shift towards sustainability. Many contaminated brownfield sites sit unused for decades because the cost of cleaning them is more than the land would be worth after redevelopment. This research will impact on our ability to achieve sustainable reclaim of environmental capital and will allow adaptive re-usability.
The Earth Microbiome Project has generated an enormous collection of data with the intention of producing a global Gene Atlas describing protein space, environmental metabolic models, and characterizing a global environmental parameter space for microbial communities. This global environmental sample database is an ambitious initiative that is community-driven. The tools developed in this fellowship will exploit this vast amount of information to provide useful insights on the Earth's microbiome and to catalogue all the microbes that live on earth. This will be of great benefit to mankind as whole, these microbes are performing vital functions, and to environmental researchers.
Methanogenesis is a key process in the carbon cycle, methane is a more potent greenhouse gas than carbon dioxide, therefore understanding its metabolism at a community level is of fundamental importance if we are to incorporate microbial processes into models of climate change. Methane is an important greenhouse gas yet its production could play a part in the transition to a low carbon economy. Water treatment is the fourth most energy intensive sector in the UK and consumes approximately 1% of the UK's electricity. Reducing the energy required to treat wastewater would therefore have major benefits both by reducing costs and carbon dioxide emissions. Anaerobic digestion (AD) reactors have the potential to provide these benefits. They do not require the same energetically costly aeration as aerobic methods and through the action of methanogens produce biogas. Better understanding of methanogenesis could lead to more efficient AD reactors.

Publications

10 25 50
publication icon
Alneberg J (2014) Binning metagenomic contigs by coverage and composition. in Nature methods

publication icon
Bautista-De Los Santos Q (2016) Emerging investigators series: microbial communities in full-scale drinking water distribution systems - a meta-analysis in Environmental Science: Water Research & Technology

 
Description I established my Environmental'Omics lab in School of Engineering, University of Glasgow in November 2014 which specialises in developing novel pipelines for analysing genomic data in an environmental context. My lab is centered around my fellowship and focuses on microbial ecology at both mesoscopic and macroscopic scales by integrating 'omics data (metagenomics, metatranscriptomics, metabolomics, and metaproteomics) for microbial community analysis.

Software (http://userweb.eng.gla.ac.uk/umer.ijaz/#bioinformatics): Under this grant I am developing software tools and methodologies to integrate different sources of omics data, namely, metagenomics, metatranscriptomics, metabolomics, and metaproteomics. Here are the list of major software I have contributed to during my fellowship:

RvLab (R virtual Laboratory for ecological community analysis)
Software:https://portal.lifewatchgreece.eu/
Reference: A. Oulas et al. Biodiversity Data Journal, 4, e8357, 2016.(doi:10.3897/BDJ.4.e8357)

CONCOCT (A software for binning metagenomic contigs with coverage and composition)
Software: https://github.com/BinPro/CONCOCT
Reference:J. Alneberg et al. Nature Methods, 11(11):144-1146, 2014. (doi:10.1038/NMETH.3103) (PMID:25218180)

TAXAassign (A bash based pipeline for generating taxonomic profiles using NCBI's Taxonomy)
Software: http://www.github.com/umerijaz/taxaassign
Reference: J. Alneberg et al. Nature Methods, 11(11):144-1146, 2014. (doi:10.1038/NMETH.3103) (PMID:25218180)

NMGS (A software for fitting the Unified Neutral Theory of Biodiversity with Hierarchical Dirichlet Proces)
Software: https://github.com/microbiome/NMGS
Reference: K. Harris et al. Proceedings of the IEEE, 105(3):516-529, 2017 (doi:10.1109/JPROC.2015.2428213)

seqenv (A pipeline capable of annotating genetic sequences with Environmental Ontology)
Software: https://bitbucket.org/seqenv/seqenv/src
Reference: L. Sinclair & U. Z. Ijaz et al. PeerJ, e2690, 2016. (doi: 10.7717/peerj.2690)

microbiomeSeq (An R package for microbial community analysis in an environmental context)
Software: https://github.com/umerijaz/microbiomeSeq
Tutorial/Demo: http://userweb.eng.gla.ac.uk/umer.ijaz/projects/microbiomeSeq_Tutorial.html

SeqEnv-Ext (A taxa-centric extension to seqenv pipeline, which consisted of two parts, each providing environmental annotations under different context, with first part providing taxon abundance on a per term basis while the second part lists environmental term abundance under a per taxon context. A separately developed program that required the use of the original seqenv pipeline, this enabled two different methods of viewing environmental annotations, which significantly augments the analysis capability of the pipeline.
Software: http://hie-pub.westernsydney.edu.au/0610b020-39fb-11e7-b55d-525400daae48/
Reference: A. Z. Ijaz, T. Jeffries, U. Z. Ijaz et al. PeerJ, 5:e3827, 2017. (doi:10.7717/peerj.3827)

pyTag (A tool for identification and analyses of ontological terms in application area specific literature surveys)
Software: https://github.com/KociOrges/pytag

NanoAmpli-Seq (A workflow for amplicon sequencing from mixed microbial communities on the nanopore sequencing platform)
Code: https://github.com/umerijaz/nanopore
Reference:
S. T. Calus, U. Z. Ijaz, and A. Pinto. bioRxiv 244517, 2018 (doi: 10.1101/244517)


Orion Cluster: Without any institutional or dedicated technical support, I have single-handedly built and managed an HPC facility in Engineering called Orion Cluster (http://userweb.eng.gla.ac.uk/umer.ijaz/#orion). I bought first server in 2012 through the Unilever grant and since then I have religiously pursued my collaborators for in-kind contributions, as well as allocating small equipment budget on every grant I am applying. Five years later, I have spent ~£114K on 13 servers with more equipment to be purchased in two months time through recently allocated £22K (on SAIC) grant. Orion Cluster stands at an operational capacity of 368 cores, ~450TB disk space, and will serve >70 PGR/T and staff (60 existing and regular users and hence the reason why I have an increasing supervision workload). This facility now sits at the heart of all major research groups I am involved with and is envy of many others. One of the reason why I have managed to attract funding and collaborators is through development of bespoke workflows (originating from my research) that I regularly updates and share on my website (http://userweb.eng.gla.ac.uk/umer.ijaz/#bioinformatics; http://www.tinyurl.com/JCBioinformatics; and http://www.tinyurl.com/JCBioinformatics2) as well as providing a single place for >400 bioinformatics tools. My cluster and bioinformatics tutorials are of strategic importance

Expansion to other technologies/hardware and award generation (http://userweb.eng.gla.ac.uk/umer.ijaz/#research_Grants): The developed tools/software methodologies and the research being conducted under my NERC fellowship was instrumental in getting further funding from numerous research councils. This includes recent expansion to population genomics and epidemiology (Scottish Infection Research Network/Chief Scientist Office Project entitled "Molecular epidemiology of Clostridium difficile in Scotland: developing novel, clinically applicable research methods to combine genomic analysis with health informatics"). For the past one year, I am trying to put my engineering experience to good use, by expanding my research to include: Raman spectroscopy enabled microfluidics (NERC NE/P003826/1 grant entitled "Stable Isotope Probing with Resonance Raman Cell Sorting to profile influence of ocean acidification on microbial carbon fixation"); hardware system integrating liquid handling, incubation and sensing with an embodied genetic algorithm, which directs evolutionary optimisation of microbial growth (with Professor William T Sloan, University of Glasgow; EPSRC Global Challenges Research Fund EP/P029329/1); and development of artificial intestinal Salmon gut system through bioreactors (BBSRC BB/P001203/1 grant entitled "A microbial basis for Atlantic Salmon energetics").

Supervision (http://userweb.eng.gla.ac.uk/umer.ijaz/#supervisions): I have been directly involved with the supervision of 13 PhD students and 2 PDRAs (with more to be recruited). Two PGR students (Caitlin Jukes, and Asha Rani) have recently defended their viva successfully. All of my supervisions involve utilisation of tools developed under my NERC grant.

Repute: I have gained considerable repute at both national and international levels. I am collaborating widely with academics located in Manchester, Warwick, Dundee, Aberdeen, Liverpool, Norwich, Reading, London, Belgium, Finland, Greece, Norway, Ireland, Austria, Thailand, Czech republic, Australia, Germany, France, and Netherlands. As a consequence I have been invited to visit/speak at numerous institutes including: Faculty of Science, Ceské Budejovice; Helenic Centre for Marine Research, Greece; Centre for Microbial Ecology and Technology, Ghent, Belgium; Edinburgh Amplicon Sequencing Group; Earhlam Institute (formerly TGAC); Unilever R&D laboratories (Colworth/Port Sunlight); London School of Hygiene and Tropical Medicine; and Health Informatics Centre Dundee. My research leadership potential was recognized by NERC who funded me to attend a £23,100 advanced leadership course in Cambridge.
Exploitation Route Please see the section on "What have you discovered or developed through the research funded on this grant"
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Healthcare

URL http://userweb.eng.gla.ac.uk/umer.ijaz
 
Description Our Crohn's disease treatment diet (CD-TREAT) has received a lot of attention recently. As a result, our abstract was the winner of the Best investigator-initiated study (IIS) award (top 2 among 178 abstracts were selected as awardee) at a major IBD conference. Our recent work (A. J. Pinto, D. N. Marcus, U. Z. Ijaz et al. Metagenomic evidence for the presence of Comammox Nitrospira-like bacteria in drinking water system. mSphere, 1 (1): e00054-15, 2015) has led us to the co-discovery of novel Comammox Nitrospira-like bacteria, which uniquely harbor a full suite of ammonia oxidation genes and which upends this 100-year-old dogma that nitrification - the oxidation of ammonia to nitrite and ultimately nitrate is divided between two sets of microbes (Press Release: https://www.epsrc.ac.uk/newsevents/news/watertreatment/). This featured strongly in a Science perspective on Commamox (A. E. Santoro. The do-it-all nitrifier. Science, 351(6271):342-343,2016). Using the seqenv pipeline, we have been able to relate changes in ammonia oxidizing archaeal diversity with pH to the variety of environments available to these organisms. We have also applied it to a time series of Black Sea plankton paleomes extending twelve thousand years into the past obtained from 18S rRNA sediment sequencing. This has revealed that prior to the influx of the Mediterranean the Black Sea was a freshwater rather than a brackish environment. To author's knowledge this is the only pipeline to date that can textmine metadata from the web for given sequences. My paper in American Journal of Gastroenterology has shown that exclusive enteral nutrition (EEN), the primary treatment of active paediatric Crohn's disease, induced extensive changes in bacterial microbiome, several of which were associated with disease improvement and reduction in calprotectin (a biomarker for gut inflammation). Another paper (BN Parsons*, UZ Ijaz* et al. Comparison of the human gastric microbiota in hypochlorhydric states arising as a result of Helicobacter pylori-induced atrophic gastritis, autoimmune atrophic gastritis and proton pump inhibitor use. PLoS Pathogens, 13(11): e1006653, 2017) has shown that shifting communities in the stomach may influence cancer risk (Press Release: https://www.eurekalert.org/pub_releases/2017-11/p-sbc102417.php). Since the start of my fellowship, I have been a course leader of a 10 credits Metagenomics Module (BIOL 5172) for the Bioinformatics, Polyomics and Systems Biology MSc. I have developed online tutorials hosted on my website at (http://userweb.eng.gla.ac.uk/umer.ijaz#bioinformatics), that are open access and popular with PGR students not only from within UK but outside.
Sector Agriculture, Food and Drink,Education,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic