Sharing of metabolomics data and their analyses as Galaxy workflows through a UK-China collaboration

Lead Research Organisation: European Bioinformatics Institute
Department Name: Chemoinformatics and Metabolism

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
publication icon
Ashrafian H (2021) Metabolomics: The Stethoscope for the Twenty-First Century. in Medical principles and practice : international journal of the Kuwait University, Health Science Centre

publication icon
Deutsch EW (2017) Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. in Journal of proteome research

publication icon
Edmunds SC (2017) Looking back: forward looking. in GigaScience

publication icon
Emami Khoonsari P (2019) Interoperable and scalable data analysis with microservices: applications in metabolomics. in Bioinformatics (Oxford, England)

publication icon
Haug K (2017) Global open data management in metabolomics. in Current opinion in chemical biology

publication icon
Salek RM (2015) COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access. in Metabolomics : Official journal of the Metabolomic Society

publication icon
Spicer R (2017) Navigating freely-available software tools for metabolomics analysis. in Metabolomics : Official journal of the Metabolomic Society

publication icon
Van Rijswijk M (2017) The future of metabolomics in ELIXIR. in F1000Research

 
Description One of the objectives of this award is a workshop for trainers hosted by GigaScience to develop teaching materials for enabling data reproducibility in metabolomics. We held the workshop late in 2017 with associated public engagement activities and a full report will be added to the cuddel.net site in due course.

This is a partnership between the European Bioinformatics Institute (EMBL-EBI), the Universities of Birmingham, Manchester and Oxford, The Sainsbury Laboratory and TGAC with BGI and its open-access journal, GigaScience to provide training workshops supporting the sharing of data and their analyses in metabolomics.
Exploitation Route The main objective of the consortium is to host training workshops to support scientists in the UK and China in managing and sharing their metabolomics data and analyses workflows. This partnership extends on the recently completed BBSRC award (BB/J020265/1) to the University of Oxford and BGI/GigaScience that kicked off the work around data sharing in metabolomics and omics, delivering two ISA-related events: the first in Hong Kong, 2014 (ISA hackathon - Bring Your Own Data Party) and the second in Oxford, 2015 (Hack-the-Spec - ISA as a FAIR research object). The outcomes will be updated in the next reporting period.
Sectors Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Environment,Healthcare

URL http://cuddel.net/
 
Description A public presentation was given by Susanna-Assunta Sansone at the Innocentre in Kowloon and was attended by representatives from Hong Universities, Hong Kong Science Park companies and Taylor and Francis publishing group. In her presentation, Susanna showed how the global science community is working on the FAIR data initiative to make data Findable, Accessible, Interoperable and Re-usable. With EU open science programs, the Go-FAIR initiative and the NIH Big Data 2 Knowledge program in place - what does Hong Kong need to do to keep up with these global policy movements? This talk was very pertinent to Hong Kong since its research universities produce data that are not easily accessible. Another activity was Tutorial on Common Workflow Language where a number of employees from GigaScience's parent company, BGI travelled from Shenzhen to Hong Kong attend the CUDDEL workshop: In addition to learning about the Phenomenal project from Reza Salek (EBI), Michael Crusoe from Common Workflow Language (CWL) gave them and people attending the workshop on the CUDDEL grant a hands-on tutorial on how to describe workflow analyses in CWL. There was a further CUDDEL workshop where the longitudinal study set was used to improve the tools and workflows provided to the scientific community to enable them to handle longitudinal data in the future
First Year Of Impact 2018
Sector Education,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic,Policy & public services

 
Title Continued improvements to the ISA toolkit and the new Datascriptor component 
Description Started in 2003 and first released in 2007, the ISA tools (http://isa-tools.org) have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community (https://www.isacommons.org). Key work over the last year is the development of a new component, the Datascriptor: https://datascriptor.org, as part of the Wellcome Trust award (2018-2021), a collaborative project with the University of Cambridge's InterMine team. Leveraging our experience and links with the communities, we are designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. In addition major advances have been made to the ISA API also working with the ELIXIR Plant and Metabolomics communities. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: (i) EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. (ii) BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. (iii) ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. (iv) ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. 
URL https://datascriptor.org
 
Title Longitudinal metabolomics data set 
Description This project involves reproducing the analysis of a metabolomics dataset using R and enabling its science to be as open as possible. The dataset was generated using UPLC-MS from samples of plasma taken from healthy volunteers undergoing a longitudinal study to determine the human metabolic profiles associated with food intake and exercise. Originally, the computational analysis of the mass spectrometry data was done using a combination of Matlab and Taverna workflows. Since Matlab requires a paid licence for its use and it is not known whether the workflows are executable, the decision was made to reproduce the complete analysis in R. The data set is now uploaded into MetaboLights (MTBLS82). The dataset will be be publicly released on publication of Data Note in GigaScience. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Until it is published externally through MetaboLights and Github, it is not possible to see how it will be used. However within the grant partners, it has enabled us to develop tools and workflows which are already used externally for the benefit of the scientific community 
 
Title Supporting data for "ISA API: An open platform for interoperable life science experimental metadata" 
Description The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open-source community specifications and software tools for enabling discovery, exchange and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab - a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, a JSON serialization ISA-JSON was developed. In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters and its growing user community. The ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL http://gigadb.org/dataset/100907
 
Description ELIXIR Interoperability Platform and ISA 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution ISA is part of the ELIXIR Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management.
Collaborator Contribution The ELIXIR Recommended Interoperability Resources have been selected by external panel of reviewers, based on the selection criteria published in the Call for RIR application, which measure how they facilitate scientific research and how they improve FAIRness of life science data.
Impact ISA is and will continue to be used by and further developed with ELIXIR communities, especially with Plant and Metabolomics use cases.
Start Year 2018
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Title Datascriptor 
Description From structured dataset to data article. Leveraging our experience and links with the communities, we are now designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. The user will be guided to provide (semi)structured descriptions of the experimental design, and of the post-processed data, to generate, respectively, the Methods and a set of statements to populate the Results section of a manuscript. Datascriptor will work: (i) as a stand-alone tool - for anyone to use - implementing generic metadata models, such as W3C Data Catalog vocabulary; and (ii) as a component of the ISA Tools - for its user communities - implementing the ISA metadata model. To output short sentences from the (semi)structured input, we will evaluate a mixed data-to-text approach using template-based and neural-based (i.e. machine learning) methods. To further enrich the content of the manuscript, Datascriptor will connect to existing authoring systems, including Substance, Texture, Stenci.la and Manuscripts, and export the result in JATS format. Our plans also include an export as a DAR file and in LaTeX format. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact Work has just started, but to ensure continued impact in the stakeholder community, the Datascriptor User Advisory Board includes a core group of existing collaborators: Thomas Lemberger (EMBO Press), Scott Edmunds (GigaScience), Holly Murray ( F1000), Varsha Khodiyar (Springer Nature). 
 
Title ISA-API (ISA-tools) Github repository - https://github.com/ISA-tools/isa-api Extension of ISAcreate mode of the ISA-API to support longitudinal, repeated exposure treatments 
Description ISA-API (ISA-tools) Github repository - https://github.com/ISA-tools/isa-api Extension of ISAcreate mode of the ISA-API to support longitudinal, repeated exposure treatments Testing against GSK use case Testing against more complex use case (nutrition studies, clinical trials) 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This will allow the collation of longitudinal studies in metabolomics in a more usable manner. 
URL https://github.com/ISA-tools/isa-api
 
Title mzml2isa (ISA-tools) 
Description Functionalities in the ISA-API and mzml2isa packages were extensively revised during the workshop. Revision of certain functionalities for both packages is still ongoing. A number of additional functionalities were written and implemented during the workshop to allow for a more extensive workflow to create ISA-TAB files (e.g. mzML files and user-based input prior to data collection). 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Improving the ability of the scientific community to manipulate longitudinal studies 
 
Description Biohackathon; ELIXIR, Paris 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The team participated to several tracks, especially working on ISA for plant and metabolomics community, as well as for use in Galaxy, and the bioschema work. The work carried our continue to embed ISA and FAIRsharing into ELIXIR-driven infrastructure and activities.
Year(s) Of Engagement Activity 2018
URL https://www.elixir-europe.org/events/biohackathon-2018-paris
 
Description CUDDEL Hong Kong 2017 workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The main aim of the CUDDEL workshop was to investigate how to implement a metabolomics data analysis to be as reproducible as possible using publicly available online tools. Work towards this aim was started on a case study using an unpublished liquid chromatography-mass spectrometry (LC-MS) dataset.
Year(s) Of Engagement Activity 2017
 
Description CUDDEL closing workshop/hackathon, EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Closing workshop of the CUDDEL grant, following up on issues outstanding from the 2017 Hong Kong workshop; discussion to explore the feasibility of making a follow up BBSRC Partnering application in the future.
Year(s) Of Engagement Activity 2018
URL https://github.com/ISA-tools/cuddel-mzml2isa-enhance
 
Description Datascriptor hackathon - eLife Innovation Sprint 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Hackathon on the Datascriptor prototype, part of the ISA toolkit. Datascriptor aims to taking the pain out of beginning to write papers, making it easy to automatically generate the parts of a paper that can be easily scaffolded and incentivising reproducible papers by ensuring the scaffolds include well-structured data and metadata. During the online event the prototype was fleshed out by user testing with hands-on use cases.
Year(s) Of Engagement Activity 2020
URL https://sprint.elifesciences.org/data-paper-skeleton-tools-for-life-sciences/
 
Description EMBO Practical Course on Metabolomics Bioinformatics for Life Scientists 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This course provided an overview of key issues that affect metabolomics studies, handling datasets and procedures for the analysis of metabolomics data using bioinformatics tools. It was delivered using a mixture of lectures, computer-based practical sessions and interactive discussions. The course provide a platform for discussion of the key questions and challenges in the field of metabolomics, from study design to metabolite identification. During this course the delegates learned about:
- Metabolomics study design, workflows and sources of experimental error, difference between target and un-target approaches
- Metabolomics data processing tools: hands on open source R based programs, XCMS, MetFrag, MetFusion, rNMR, BATMAN
- Metabolomics data analysis: Using R Bioconductor, understanding usage of univariate and multivariate data analysis, data fusion concepts, data clustering and regression methods
- Metabolomics downstream analyses: KEGG, BioCyc, and MetExplore for metabolic pathway and network analysis with visualisation of differential expression, understanding metabolomics flux analysis
- Metabolomics standards and databases: data dissemination and deposition in EMBL- EBI MetaboLights repository; PHENOMenal, workflows4metabolomics
- Metabolomics Flux and Stable Isotope Resolved Metabolomics (SIRM)
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/embo-practical-course-metabolomics-bioinformatics-life-sc...
 
Description Metabolomics 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This is the largest yearly International Metabolomics conference, the 13th International Conference of the Metabolomics Society
Year(s) Of Engagement Activity 2017
URL http://metabolomics2017.org
 
Description Metabolomics Data Sharing Hackathon - Hong Kong 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The hackathon focussed on the computational pipelines and tools processing and analysing that metabolimics data was hold at the BGI GigaScience office in Hong Kong, with the EBI, the Universities of Birmingham, Manchester and Oxford, the Sainsbury Laboratory and the Genome Analysis Centre (now Earlham), and guests from Australia. This hackathon extends on the recently completed BBSRC award (BB/J020265/1) to the University of Oxford and BGI/GigaScience that kicked off the work around data sharing in metabolomics and omics, delivering two ISA-related events: the first in Hong Kong, 2014, and the second in Oxford, 2015.
Year(s) Of Engagement Activity 2017
URL http://gigasciencejournal.com/blog/cuddeling-up-to-metabolomics-in-hong-kong
 
Description Metabolomics Workflows 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This course will provide an introduction to metabolomics data analysis using publicly available software and tools. Participants will become familiar with the current state of data sharing and data standards in metabolomics, particularly through using the EMBL-EBI's MetaboLights repository. In addition, participants will have a hands-on session using the PhenoMeNal Compute infrastructure. There will be a large practical component, where participants will learn throughhands-on tutorials on data submission, and using a workflow-based approach and compute infrastructure for data analysis, under the guidance of the lecturers and teaching assistants. Syllabus, tools and resources
- OpenMS workflows
- Knime workflows
- PhenoMeNal
- MetaboLights
Outcomes:
After this course you should be able to:
- Use open source software and web-based tools to construct metabolomics analysis workflows
- Access metabolomics databases
- Discuss the issues of data sharing and data standards in metabolomics
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/metabolomics-workflows
 
Description Open Data Hong Kong Debate- presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Organised by Knowledge Dialogues and Open Data Hong Kong, the presentation was given by Prof. Sansone to representatives from Hong Universities, Hong Kong Science Park companies and Taylor and Francis publishing group. The presentation showed how the global science community is working on the FAIR data initiative to make data Findable, Accessible, Interoperable and Re-usable. This talk was very pertinent to Hong Kong since its research universities produce data that are not easily accessible.
Year(s) Of Engagement Activity 2017
URL https://www.oerc.ox.ac.uk/news/professor-susanna-assunta-sansone-explores-open-data-hong-kong
 
Description Poster presentation: ISAcreate and Galaxy; Galaxy conference, Portland 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ISA-Tab format is now used by Galaxy tools; the discussion helped ensuring the uptake continue
Year(s) Of Engagement Activity 2018
URL https://gccbosc2018.sched.com/event/FEWs/g26-isacreate-a-galaxy-tool-for-prospective-data-management...
 
Description UK-China research partnership (Cuddel) workshop 2016, EBI, November 23-25 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact In total 15 participants attended this 3 days workshop, from GigaScience (China), EMBL-EBI, University of Birmingham and the University of Oxford based at EMBL-EBI 23-25 November 2016.
This meeting was both discussion and hands-on coding related to metabolomics data sharing and standards, details:
Updates on the ISA Model specification 2.0, Updates on ML activities and PhenoMeNal
Main objectives: development of mzML2 ISA tool: Integration with ML, Galaxy and PhenoMeNal, Working on examples - generated for submission

Integration with ISA API and to apply ISA 2.0 to metabolomics data sets:
GSK longitudinal dataset examined and used.
BGI lipidomics dataset from GigaScience paper in review: "Lipidomic profiling reveals progressive changes of plasma lipids from normal to type 2 diabetes" was examined and used.
Generation and visualisation of ISA files in GigaDB:
Other activities: New configuration files that are missing identified, Updates on the capturing imaging dataset, Link and collaboration with MetaSpace, Capturing SIRM data, Capturing chromatography dataset
Year(s) Of Engagement Activity 2016