Open source pipelines for integrated metabolomics analysis by NMR and mass spectrometry
Lead Research Organisation:
University of Birmingham
Department Name: Sch of Biosciences
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Metabolomics comprises an important suite of techniques in modern Life Sciences research, typically performed by NMR spectroscopy or mass spectrometry (MS), applied in a range of fields for biomarker discovery, as well as for understanding metabolic networks in complex and dynamic systems. One of the biggest challenges preventing more widespread adoption of these powerful techniques is that data analysis is difficult, especially when data sets are collected in high-throughput modes. Each technique presents its own challenges, requiring pipelines of (often poorly connected) tools for an end-to-end analysis, and a significant amount of manual analysis for steps where robust software is lacking. For individual steps within a workflow there exists commercial or free software at different stages of maturity, however there are few solutions that offer the capability for automated analysis from data collection through to statistical analysis. In the genomics and proteomics domains, the Galaxy framework has become a popular mechanism for building pipelines of modular tools (originally of command-line nature), through a web interface. Galaxy can be easily configured to run on single servers, compute clusters or cloud-based solutions. In this project our groups at the Universities of Liverpool and Birmingham, both of which have a track record in Galaxy development, will collaborate to build a set of metabolomics tools in Galaxy, enabling the construction of analysis pipelines for both NMR and MS analyses. Crucially, the pipelines will deliver data sets to a shared statistical analysis toolkit, enabling integrated analysis of data sets derived from both techniques. We will also contribute to the development of international data standards for metabolomics, and our new pipelines will facilitate the deposition of experimental metabolomics data into the MetaboLights database at the EBI.
Planned Impact
Impact on health and society: The overall purpose of the project is to make data analysis for metabolomics more straightforward. Metabolomics is a technique increasingly used in human, animal and plant research, and as such, there is the potential for longer term (indirect) impacts, for example through facilitating biomarker discovery and the understanding of molecular mechanisms in fields including ageing, human and environmental health, food safety, industrial biotechnology, bioenergy and synthetic biology.
Economic impact: The facilitation of public data deposition has the potential for long term (indirect) economic impact, since it provides the opportunity for data sets (often collected at great expenses) to be re-purposed or re-analysed, fostering new research areas or in some cases reducing the requirement to collect new data.
Staff development: The postdocs involved will have the opportunity to work as part of an international network (for example working with the EBI, COSMOS, MSI and PSI) in a cutting edge software project. The PIs will benefit through exchange of skills and expertise between partners (the team has strong expertise in software engineering, MS, NMR, data analysis and statistics).
Economic impact: The facilitation of public data deposition has the potential for long term (indirect) economic impact, since it provides the opportunity for data sets (often collected at great expenses) to be re-purposed or re-analysed, fostering new research areas or in some cases reducing the requirement to collect new data.
Staff development: The postdocs involved will have the opportunity to work as part of an international network (for example working with the EBI, COSMOS, MSI and PSI) in a cutting edge software project. The PIs will benefit through exchange of skills and expertise between partners (the team has strong expertise in software engineering, MS, NMR, data analysis and statistics).
Publications
Larralde M
(2017)
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data.
in Bioinformatics (Oxford, England)
Schober D
(2018)
nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data.
in Analytical chemistry
Southam AD
(2016)
A complete workflow for high-resolution spectral-stitching nanoelectrospray direct-infusion mass-spectrometry-based metabolomics and lipidomics.
in Nature protocols
Weber RJM
(2017)
Computational tools and workflows in metabolomics: An international survey highlights the opportunity for harmonisation through Galaxy.
in Metabolomics : Official journal of the Metabolomic Society
Description | Following an earlier NERC grant to develop Galaxy workflows, in this BBSRC grant we have continued the implementation of our existing metabolomics software tools into Galaxy workflows. This includes the signal processing and analysis of both direct infusion mass spectrometry and liquid chromatography mass spectrometry based metabolomics. As part of this effort we have also conducted an international survey (which we published) on the use of workflows. |
Exploitation Route | We anticipate widespread uptake of our Galaxy workflows for metabolomics research. |
Sectors | Agriculture Food and Drink Environment Healthcare |
Description | The overall purpose of the project was to make data processing, analysis and dissemination for mass spectrometry (MS) and nuclear magnetic resonance spectroscopy metabolomics (NMR) more accessible, reproducible, and transparent. Galaxy has become a popular web-based platform for building computational workflows of modular tools [ref]. We have developed a complete set of Galaxy-based tools and training material, that cover a wide range of computational steps that are needed to get from the raw data as delivered by the instrument to the processed dataset available for biological interpretation. Additionally, we have extensively contributed in the development and integration of international data standards for metabolomics into (galaxy-based) web-established workflows [ref]. Finally, the tools and training material have been disseminated through a number of training courses and programs (see Impact). The Galaxy-based tools and workflows, including training material, that have been developed make it much easier for scientists to process, and analyse their MS and NMR datasets and subsequently deposit their datasets in public repositories [refs], such as MetaboLights. Tools and training material developed have been used to train several 100 scientists (i.e. Birmingham Metabolomics Training Centre, FutureLearn and other external training courses). As a result it has facilitated researchers, who currently lack skills and knowledge in metabolomics, to integrate metabolomics technology into their area of science (e.g. human and environmental health, industrial biotechnology, food safety, bioenergy and synthetic biology). The tools and workflows developed to assist in depositing metabolomics datasets to public repositories have the potential for a longer term scientific and economic impacts, such as facilitating biomarker discovery, reuse of data, or reducing the amount of unnecessary data collection [ref]. The activities and dissemination of the outputs of this project have supported the development of the Galaxy platform and the associated science communities. The project and its outputs have indirectly supported the growth of the Galaxy community for Metabolomics, which has resulted in the establishment of a number of galaxy-based initiatives (e.g. PhenoMeNal, Workflow4Metabolomics, ELXIR's Galaxy community) that use and actively develop galaxy to make computational tasks within Metabolomics more accessible, reproducible, and transparent. 1: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192046/ 2: https://academic.oup.com/bioinformatics/article/33/16/2598/3204983 3: https://www.nature.com/articles/nprot.2016.156 |
First Year Of Impact | 2017 |
Sector | Agriculture, Food and Drink,Chemicals,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology |
Impact Types | Societal |
Title | Galaxy-M metabolomics workflows |
Description | Metabolomics data processing and analysis workflows embedded into Galaxy |
Type Of Material | Data handling & control |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | International networking; other labs wanting us to join research grant applications |
Description | Research collaboration with Gigascience |
Organisation | GigaScience |
Country | United Kingdom |
Sector | Private |
PI Contribution | Provide domain expertise in metabolomics |
Collaborator Contribution | Provide technical expertise in tools such as Galaxy; facilitate a link to IT activities in China |
Impact | See publications |
Start Year | 2013 |
Description | First ever Massive Open Online Course (MOOC) on metabolomics titled 'Metabolomics: Understanding Metabolism in the 21st Century' |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | We developed and ran the first ever Massive Open Online Course (MOOC) on metabolomics, title as above. The course ran for 4 weeks with >2000 active learners. |
Year(s) Of Engagement Activity | 2015 |
URL | https://www.futurelearn.com/courses/metabolomics |