Implementing Metabolomics Analyses into Galaxy Workflows: Towards Genome-Metabolome Large-Scale Data Fusion

Lead Research Organisation: University of Birmingham
Department Name: Sch of Biosciences

Abstract

"Genomic and post-genomic studies are transforming our mechanistic understanding of organism-environment interactions." While this statement is certainly true, it masks many of the major challenges that have had to be overcome during the last decade. Today, genomics approaches are widely used by researchers from across the breadth of NERC science, utilising established (and ever cheaper) technologies and analysis pipelines, and delivering high impact publications. The same cannot yet be said for metabolomics, which is a considerably less mature approach, both analytically and computationally. The analytical challenges in metabolomics have restricted its use to experts of analytical chemistry, while the computational challenges have restricted the knowledge that can be mined from these rich datasets. Here we address the latter point, drawing from the wisdom and experience of genomics researchers.

One of the reasons for the success of environmental genomics is that biologists, without an in-depth knowledge of biostatistics and programming, have been able to construct and execute Next Gen Sequencing (NGS) data analyses using standardised workflows. Galaxy (http://galaxyproject.org/) - headlined as "Online bioinformatics analysis for everyone" - has emerged as the leading open-source workflow platform for NGS data analysis, with many standard processing tools accessible from its Web-based user interface. This workflow software is also being applied successfully to proteomics and chemo-informatics. Researchers at BGI (Beijing Genomics Institute) in China, our Project Partner on this application, have considerable expertise in Galaxy, since this web-based data analysis and workflow system forms the basis of its data analysis platform. They also have close links with the Galaxy development team.

We propose to 'hop' Dr Davidson from Professor Viant's environmental metabolomics laboratory and NBAF-B at the University of Birmingham into a computational laboratory at BGI-Hong Kong. Here he will gain specialist expertise in Galaxy workflows and implement our existing metabolomics pipelines into Galaxy. This is an extremely important step towards making metabolomics analysis pipelines more effective (by integrating powerful algorithms from the ever growing toolbox of metabolomics analysis methods), more standardised (enabling greater cross comparison of results from different studies), and considerably more accessible to biologists. Our aim is for both data and analysis tools to be accessible from a software platform that provides a single, user-friendly interface for developing computational pipelines in a form that can be shared and reused by the environmental community. Ultimately this will facilitate the integration of genomic and metabolomic datasets, enabling novel studies of the mechanisms underpinning stress responses of organisms within our environment. Here we will focus on the analysis of multi-omics datasets of Daphnia spp., to further investigate the molecular responses to environmental toxicants.

Our international team of investigators provides a unique combination of expertise spanning environmental metabolomics (Viant, Davidson), environmental genomics (Colbourne, Zhou) and computational workflows (Li), and are all strongly tied by a common interest and track record in the handling, analyses and interpretation of large-scale 'omics datasets. While Colbourne, Davidson and Viant are based in the School of Biosciences, University of Birmingham, and Li and Zhou reside at BGI in China, all investigators are part of the newly launched Joint BGI-Birmingham Environment and Health Centre at Birmingham that will provide a world-class academic, research and training environment for the integration of state-of-the-art sequencing, metabolomic and bioinformatics technologies.

Planned Impact

We anticipate that many different groups will benefit directly from the developed workflows, including:
(i) academic researchers who are currently using or seeking to use metabolomics technologies in their research programmes (see separate section on JeS);

(ii) private sector scientists conducting applied research who use or plan to use metabolomics technologies for environmental analyses, e.g. Syngenta. The increased effectiveness of metabolomics research within the industrial sector could ultimately increase its economic competitiveness;

(iii) private sector scientists developing and marketing hardware and software solutions for metabolomics, e.g. Thermo Fisher Scientific, with whom we collaborate extensively. There is significant potential for increased marketing of software products from commercial providers if these are compatible with a standardised workflow system such as Galaxy. We will explore the compatibility of various Thermo Fisher Scientific software products and feed this information back to Thermo scientists during our scheduled project meetings (as part of other BBSRC and NERC projects in collaboration with this company);

(iv) UK national 'omics' facilities that conduct a wide range of analyses for the public and private sector, e.g. BBSRC metabolomics facility at Rothamsted, MRC Lipidomics facility at Cambridge, and the NERC environmental metabolomics facility at Birmingham. Increasing the ease of metabolomics data analysis will be far reaching, i.e. not limited to only the environmental sector. The University of Birmingham is the Sector lead for metabolomics within the UK's bid for an ELIXIR Training Node (the emerging pan-European research infrastructure for biological information) and this is anticipated to offer opportunities to disseminate the Galaxy workflows, benefiting scientists across Europe.

(v) through our BGI Project Partners, we will have access to the Galaxy development team and will highlight to them any specific needs of the metabolomics as well as environmental science communities.

(vi) the wider metabolomics research community will benefit from a better understanding of the knowledge gaps in standardising the tools, data formats and workflows in data analysis pipelines. As a co-Chair of the international Metabolomics Standards Initiative and President of the international Metabolomics Society, Viant sits on several relevant committees and can readily share this new knowledge with the community.
 
Description We successfully disciplined hopped Dr Robert Davidson from the School of Biosciences, University of Birmingham, to a computational laboratory at Gigascience (part of the BGI) to receive specialist training in Galaxy workflows, in particular their utilisation for genomics pipelines. Dr Davidson learnt all that was required during his few month visit, and then returned to Birmingham where he set about implementing our existing metabolomics software tools into Galaxy workflows. This included the signal processing and analysis of both direct infusion mass spectrometry and liquid chromatography mass spectrometry based metabolomics. The work proceeded so well that we expanded on our initial objectives, and additionally implemented the tools that we have developed for metabolite identification into Galaxy. We established a local Galaxy instance for use internally at Birmingham. The manuscript (for the open access journal GigaScience) has just been published and hence we are now widely disseminating the Galaxy workflows to benefit the wider community.
Exploitation Route This will allow others to readily use our computational metabolomics pipelines
Sectors Agriculture, Food and Drink,Environment,Healthcare

 
Description Both objectives for this NERC Discipline Hop have been met and in addition a manuscript has been published. Through conferences, the publication and other networking we have disseminated that we have built some of the first Galaxy pipelines for metabolomics (including the first pipelines in the UK, and the first pipelines for direct infusion mass spectrometry metabolomics internationally). This has resulted in several groups wishing to collaborate with us: 1. Building links to the proteomics community via the University of Liverpool, 2. Building links to the French Galaxy initiative, with whom we have agreed to exchange and continue to co-develop Galaxy pipelines, and 3. Continuing to develop the research collaboration with Gigascience, including the application for travel funds to aid this wider collaboration. Furthermore the new Galaxy pipelines will be available to users of the NERC Biomolecular Analysis Facility's metabolomics node.
First Year Of Impact 2015
 
Description Influencing the development of much needed training in OMICS technologies (in particular metabolomics)
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
 
Title Galaxy-M metabolomics workflows 
Description Metabolomics data processing and analysis workflows embedded into Galaxy 
Type Of Material Data handling & control 
Year Produced 2014 
Provided To Others? Yes  
Impact International networking; other labs wanting us to join research grant applications 
 
Description Research collaboration with Gigascience 
Organisation GigaScience
Country United Kingdom 
Sector Private 
PI Contribution Provide domain expertise in metabolomics
Collaborator Contribution Provide technical expertise in tools such as Galaxy; facilitate a link to IT activities in China
Impact See publications
Start Year 2013
 
Description First ever Massive Open Online Course (MOOC) on metabolomics titled 'Metabolomics: Understanding Metabolism in the 21st Century' 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact We developed and ran the first ever Massive Open Online Course (MOOC) on metabolomics, title as above. The course ran for 4 weeks with >2000 active learners.
Year(s) Of Engagement Activity 2015
URL https://www.futurelearn.com/courses/metabolomics
 
Description How to improve our engagement with industry (MetaboNews) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Article - 'The importance of industry partnerships in metabolomics research: from vendors to technology users' by Warwick Dunn, Ute Roessner and Mark Viant, MetaboNews Issue 29, January 2014 http://www.metabonews.ca/Jan2014/MetaboNews_Jan2014.htm

I was contacted by companies who were enthused by what we had written. Formed an Industry Engagement Task Group within the international Metabolomics Society.
Year(s) Of Engagement Activity 2014