Open source pipelines for integrated metabolomics analysis by NMR and mass spectrometry

Lead Research Organisation: University of Birmingham
Department Name: Sch of Biosciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Metabolomics comprises an important suite of techniques in modern Life Sciences research, typically performed by NMR spectroscopy or mass spectrometry (MS), applied in a range of fields for biomarker discovery, as well as for understanding metabolic networks in complex and dynamic systems. One of the biggest challenges preventing more widespread adoption of these powerful techniques is that data analysis is difficult, especially when data sets are collected in high-throughput modes. Each technique presents its own challenges, requiring pipelines of (often poorly connected) tools for an end-to-end analysis, and a significant amount of manual analysis for steps where robust software is lacking. For individual steps within a workflow there exists commercial or free software at different stages of maturity, however there are few solutions that offer the capability for automated analysis from data collection through to statistical analysis. In the genomics and proteomics domains, the Galaxy framework has become a popular mechanism for building pipelines of modular tools (originally of command-line nature), through a web interface. Galaxy can be easily configured to run on single servers, compute clusters or cloud-based solutions. In this project our groups at the Universities of Liverpool and Birmingham, both of which have a track record in Galaxy development, will collaborate to build a set of metabolomics tools in Galaxy, enabling the construction of analysis pipelines for both NMR and MS analyses. Crucially, the pipelines will deliver data sets to a shared statistical analysis toolkit, enabling integrated analysis of data sets derived from both techniques. We will also contribute to the development of international data standards for metabolomics, and our new pipelines will facilitate the deposition of experimental metabolomics data into the MetaboLights database at the EBI.

Planned Impact

Impact on health and society: The overall purpose of the project is to make data analysis for metabolomics more straightforward. Metabolomics is a technique increasingly used in human, animal and plant research, and as such, there is the potential for longer term (indirect) impacts, for example through facilitating biomarker discovery and the understanding of molecular mechanisms in fields including ageing, human and environmental health, food safety, industrial biotechnology, bioenergy and synthetic biology.

Economic impact: The facilitation of public data deposition has the potential for long term (indirect) economic impact, since it provides the opportunity for data sets (often collected at great expenses) to be re-purposed or re-analysed, fostering new research areas or in some cases reducing the requirement to collect new data.

Staff development: The postdocs involved will have the opportunity to work as part of an international network (for example working with the EBI, COSMOS, MSI and PSI) in a cutting edge software project. The PIs will benefit through exchange of skills and expertise between partners (the team has strong expertise in software engineering, MS, NMR, data analysis and statistics).
 
Description Following an earlier NERC grant to develop Galaxy workflows, in this BBSRC grant we have continued the implementation of our existing metabolomics software tools into Galaxy workflows. This includes the signal processing and analysis of both direct infusion mass spectrometry and liquid chromatography mass spectrometry based metabolomics. As part of this effort we have also conducted an international survey (which we published) on the use of workflows.
Exploitation Route We anticipate widespread uptake of our Galaxy workflows for metabolomics research.
Sectors Agriculture

Food and Drink

Environment

Healthcare

 
Description The overall purpose of the project was to make data processing, analysis and dissemination for mass spectrometry (MS) and nuclear magnetic resonance spectroscopy metabolomics (NMR) more accessible, reproducible, and transparent. Galaxy has become a popular web-based platform for building computational workflows of modular tools [ref]. We have developed a complete set of Galaxy-based tools and training material, that cover a wide range of computational steps that are needed to get from the raw data as delivered by the instrument to the processed dataset available for biological interpretation. Additionally, we have extensively contributed in the development and integration of international data standards for metabolomics into (galaxy-based) web-established workflows [ref]. Finally, the tools and training material have been disseminated through a number of training courses and programs (see Impact). The Galaxy-based tools and workflows, including training material, that have been developed make it much easier for scientists to process, and analyse their MS and NMR datasets and subsequently deposit their datasets in public repositories [refs], such as MetaboLights. Tools and training material developed have been used to train several 100 scientists (i.e. Birmingham Metabolomics Training Centre, FutureLearn and other external training courses). As a result it has facilitated researchers, who currently lack skills and knowledge in metabolomics, to integrate metabolomics technology into their area of science (e.g. human and environmental health, industrial biotechnology, food safety, bioenergy and synthetic biology). The tools and workflows developed to assist in depositing metabolomics datasets to public repositories have the potential for a longer term scientific and economic impacts, such as facilitating biomarker discovery, reuse of data, or reducing the amount of unnecessary data collection [ref]. The activities and dissemination of the outputs of this project have supported the development of the Galaxy platform and the associated science communities. The project and its outputs have indirectly supported the growth of the Galaxy community for Metabolomics, which has resulted in the establishment of a number of galaxy-based initiatives (e.g. PhenoMeNal, Workflow4Metabolomics, ELXIR's Galaxy community) that use and actively develop galaxy to make computational tasks within Metabolomics more accessible, reproducible, and transparent. 1: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192046/ 2: https://academic.oup.com/bioinformatics/article/33/16/2598/3204983 3: https://www.nature.com/articles/nprot.2016.156
First Year Of Impact 2017
Sector Agriculture, Food and Drink,Chemicals,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

 
Title Galaxy-M metabolomics workflows 
Description Metabolomics data processing and analysis workflows embedded into Galaxy 
Type Of Material Data handling & control 
Year Produced 2014 
Provided To Others? Yes  
Impact International networking; other labs wanting us to join research grant applications 
 
Description Research collaboration with Gigascience 
Organisation GigaScience
Country United Kingdom 
Sector Private 
PI Contribution Provide domain expertise in metabolomics
Collaborator Contribution Provide technical expertise in tools such as Galaxy; facilitate a link to IT activities in China
Impact See publications
Start Year 2013
 
Description First ever Massive Open Online Course (MOOC) on metabolomics titled 'Metabolomics: Understanding Metabolism in the 21st Century' 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact We developed and ran the first ever Massive Open Online Course (MOOC) on metabolomics, title as above. The course ran for 4 weeks with >2000 active learners.
Year(s) Of Engagement Activity 2015
URL https://www.futurelearn.com/courses/metabolomics