Closing the gaps in metabolomics - Identifying unknown metabolites and mapping onto biochemical pathways

Lead Research Organisation: European Bioinformatics Institute
Department Name: Chemoinformatics and Metabolism

Abstract

Metabolic profiling provides one of the most complete sources of information for pinpointing physiological conditions at a moment in time. It therefore provides a vehicle for understanding the cellular processes involved in both normal functioning and dysfunctions caused by systemic diseases, mapping the interactions of the organism with its environment as mediated by genetic factors. Identification of the small molecular metabolites in the measured samples is essential to facilitate downstream research into underlying mechanisms.

Current techniques for analysis rely on the identification of known metabolites which are already mapped to reference metabolic networks and pathways, yet, such reference knowledge about the metabolome is far from complete. One of the major current challenges for the metabolomics/metabonomics community is the extensive and increasing number of unknown metabolites detected, as instrumental sensitivity, dynamic range, acquisition speed and mass accuracy increase exponentially. Knowledge about the structural identity of metabolites is essential to understand their interaction with their biological targets - an understanding that can then aid the development of methods to interact with biological processes in, for example, chemical biology. Identification of unknowns is a major unsolved problem in metabolomics, requiring time consuming further experimental work. We are proposing to develop a tool which addresses this gap, predicting unknown metabolites and projecting metabolic sample data containing knowns and "predicted unknowns" onto biochemical network and pathway knowledge bases. We will build on state of the art methods for structure elucidation and develop methods for bioinformatics reasoning about the likelihood that predicted structures fit into the biological context of the metabolic sample. This tool will be made available within the context of the BBSRC-funded cross-species, cross-platform MetaboLights database and repository for metabolomics experimental data, and will additionally be provided as open source, thus benefiting the broadest number of academic and industrial researchers. The work will be based on open data and will be published as open source software and open access publications, thus will be accessible to all interested parties.

Technical Summary

For many organisms of interest to biological research, only a fraction of the metabolome is known. Roughly more than half of all signals we routinely detect in a MS/MS metabolomics experiment are unknowns, and many metabolites are still not identifiable, a significant unsolved problem. We are proposing to develop a tool that will build on existing work in our group on I) cheminformatics-based structure elucidation of unknowns in organic chemistry, II) use existing reference databases of spectroscopically characterised compounds to assist with metabolite identification and III) bioinformatics reasoning used to assess the likelihood that a suggested structure fits into the biochemical repertoire of an organism metabolism and specific pathway, once the general chemical group has been identified.

For point I) we are going to develop our existing infrastructure for the stochastic screening of chemical spaces for compounds with a given set of computable properties to the problem of structure elucidation in metabolomics. The very large spaces to be screened when little spectroscopic information is present, will be restrained through the use of a yet to be developed notion of biosynthetic accessibility within the organism, tissue or cell type under investigation. These abbreviated candidate spaces will then be mapped to metabolites appearing in known pathways through the development of biologically relevant spectroscopic, chemical and semantic similarity measures based on data mentioned under II) above.

This tool will facilitate the integration of metabolomics fully into the rest of the bioinformatics analysis pipeline, supporting the identification of hypotheses for underlying disease mechanisms of action and pinpointing the mechanisms for individual differences in cellular phenotypes.

Planned Impact

A tool for the prediction of unknown metabolites in the context of metabolomics experiments and placing sample data in the context of biochemical pathways will benefit a number of significant communities performing biological research and development. Metabolomics is of major importance for our understanding of how biological systems behave under various conditions and for developing personalized medicine and nutraceuticals because metabolites, as end products of cellular regulatory processes, provide insights into the actual functioning of biological systems as a combined response of their genetic and environmental factors as well as diseases. Analysis of metabolic data is also less invasive than other techniques as they can often be achieved by saliva or small samples of blood rather than whole tissues. The small molecule (metabolite) content of such body fluid samples can provide indicators as to the presence or absence of disease, to the functioning of biological processes at the cellular level; early warning signs in adverse reactions and more. Our proposal is in congruence with a number of strategic research priorities of the BBSRC: In systems approaches to biological research, metabolomics allows us to study how the metabolic system reacts to changes in the environments, to stress, to disease and other boundary conditions with high time resolution. Identification of metabolites is essential to understand their interaction with their biological targets - an understanding that can then aid the development of methods to interact with biological processes in, for example, chemical biology approaches. For ageing research, metabolomics is used to study and characterize states and dynamics of the ageing organism with no (urine) or low (blood) evasiveness, or through tissue analysis. Both in bioenergy research as well as in crop science, metabolite identification allows us to study how plants or microbes used for energy harvesting react to environmental changes (robustness) or how their energy metabolism reacts to genetic manipulation or other perturbations (flexibility). The complexity of metabolism in the plant kingdom makes this a particularly challenging area. Small molecule metabolism is currently also becoming a major emphasis for UK industry including the drug safety assessment process in the pharmaceutical industry, pesticide toxicology in agrochemicals, biomarker discovery for medical diagnostics and plant fitness for crop development. Metabolites are used a) as a diagnostic biomarkers and b) for classifying patients by their phenotype. In the public sector, the flourishing fields of translational medicine and chemical biology will benefit through information about which metabolites in which pathways in the human, animal or plant metabolism are affected and in which way.

The scientific communities will be informed about the developments associated with this proposal through presentations (talks and posters) and workshops given by project members at scientific meetings and c) publications in peer reviewed journals and the member journals of the learned societies representing the communities mentioned above. As pointed out in the statement of data sharing, the software created in the course of this project will be fully open source and all the research will be conducted based on open data. All the results as well as the workflows and software themselves are therefore fully accessible and re-usable by the scientific community and secondary beneficiaries such as physicians or metabolic engineers. The European Bioinformatics Institute (EMBL-EBI) possesses a dedicated Outreach and Training department run by Dr Cath Brooksbanks and a team of eight co-workers. This department will coordinate a wide range of activities aimed at raising awareness about the proposed tool and associate activities of the EBI among potential users, our peers, our funders and the general public.

Publications

10 25 50
 
Description Metabolomics enables the study of how the metabolic system reacts to the environment. Metabolomics/metabonomics is a diverse field in terms of the technologies used for metabolic detection, as well as in terms of bio-sample diversity and fields of application such as environmental or nutritional. Yet much of the metabolome remains unknown. For many organisms of interest to biological research, only a fraction of the metabolome is known.
Towards addressing this gap we have been gathering reference metabolomes into our Public MetaboLights repository. To date we have imported around 11,168 metabolights, 8245 reference NMR and MS spectra and 1177 biological pathways from open databases into our MetaboLights repository. Besides data import, in order to support knowledge-based unknown metabolite prediction
we have developed open source solutions for NMR and MS data processing, structure identification and unknown structure prediction and subsequent pathway mapping. We have incorporated methods to narrow down list of possible unknown structures based on biological likelihood into structure prediction tool and successfully demonstrated its application. Efforts are ongoing to consolidate the above tools and neatly present them in the context of Metabolights database in an open manner to facilitate further downstream analysis of metabolomics studies.
Exploitation Route The developed tools will be consolidated and made available within the online web access interface of the MetaboLights database, freely available on the web with no restrictions on use or obstacles in the path of usage, both from within industry and the general public. All the open source nature of the tools enable other researchers to modify and enhance the functionalities. Prediction of unknown metabolites in the context of metabolomics experiments and placing sample data in the context of biochemical pathways will benefit a number of significant communities performing biological research and development.
Sectors Agriculture, Food and Drink,Education,Environment,Healthcare

URL http://www.ebi.ac.uk/metabolights/reference
 
Description Tools we have developed for prediction of unknown metabolites in the context of metabolomics experiments and placing sample data in the context of biochemical pathways benefit a number of significant communities performing biological research and development. The open-source nature of the reported tools enable others to freely access and modify it for their needs. Further, we have an on-going effort where we are consolidating all our reported tools to make it freely available within the online web access interface of the MetaboLights database. The information we could derive using the above mentioned tools, about which metabolites in which pathways in the human, animal or plant metabolism are affected will benefit the flourishing fields of translational medicine, ageing research and chemical biology.
First Year Of Impact 2014
Sector Agriculture, Food and Drink,Education,Environment,Healthcare
 
Description MetaboLights and MetaboAnalyst Collaboration 
Organisation University of Alberta
Country Canada 
Sector Academic/University 
PI Contribution MetaboAnalyst is a web application for doing comprehensive metabolomic data analysis. It is developed and hosted by Dr. Wishart's lab in University of Alberta. The analysis tool is very popular and would benefit from powerful compute resources at the EBI to meet its user load. We are in the process of migrating the current R backend of the tool to EBI in-house R-backend, to improve the overall resource usage. Further, the Metabolights users will benefit from having a integrated analysis tool to perform easy and comprehensive analysis of a metabolomic dataset.
Collaborator Contribution Dr. Wishart's lab has provided the Metabolights team with the code base for MetaboAnalyst and has a dedicated resource for combined development of MetaboAnalyst.
Impact - Shared code repository of MetaboAnalyst
Start Year 2014
 
Title MassCascade: Visual Programming for LC-MS Data Processing in Metabolomics 
Description We have developed MassCascade, an open-source library for processing LC-MSn metabolomics data, and its plug-in MassCascade-KNIME. The Java library can be used stand- alone or in combination with the plug-in. It comprises basic algorithms for frequent tasks in LC-MSn data processing. Through the plug-in, users can build complex workflows with other KNIME nodes for chem- or bioinformatics or with generic data analysis and visualisation tools, that go beyond the actual MassCascade functionality. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact No actual Impacts realised to date 
URL https://bitbucket.org/sbeisken/masscascadeknime/
 
Title NMR-FID tool 
Description Open source java library to read NMR Free Induction Decay (FID) and process the frequency domain data to Spectrum information. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact No actual Impacts realised to date 
URL https://github.com/LuisFF/nmr-fid-tool
 
Title New Plug-in for SENECA 
Description A new evolutionary algorithm for structure elucidation is implemented in the open-source SENECA package for CASE and is available as a GUI client or as a stand-alone command-line executable. New fitness. Fitness evaluators based on 13C NMR spectrum-to-structure associations in the NMRShiftDB database and an NP-likeness score have been integrated in the scoring function of the evolutionary algorithm scheme. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact No actual Impacts realised to date 
URL http://sourceforge.net/projects/seneca
 
Title Pathway Layout Pipeline 
Description A java-based pipeline to provide pathway layouts starting from a reactions/pathway data source. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact No actual Impacts realised to date 
URL https://github.com/pcm32/LayoutPipeline
 
Description Closing the gaps in metabolomics - Identifying unknown metabolites and mapping onto biochemical pathways 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Knowledge sharing about new open method in Computer Assisted Structure Elucidation.

Outreach of basics of Computer Assisted Structure Elucidation and Metabolomics to wider audience of young Indian graduates.
Year(s) Of Engagement Activity 2014
URL http://www.icbb.in/Oral_Presentation_Schedule_icbb2014.pdf
 
Description MetaboGaps poster at the ICBB 2014 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster session at the ICBB for MetaboGaps
Year(s) Of Engagement Activity 2014