Standardised metabolite annotation workflows for enhanced biological interpretation in metabolomic data repositories

Lead Research Organisation: University of Birmingham
Department Name: Sch of Biosciences

Abstract

Metabolites are small biochemicals which have many important roles in biological systems including metabolism. Metabolites are studied in many different biological systems including microbes, plants and humans to benefit the human population through increased crop yields, the manufacture of drugs and in understanding how humans age and how the process can be modified to improve our health. The study of metabolites in biological systems is called metabolomics which has the aim to study thousands of metabolites and investigate the biological processes they are involved in. These studies do not know which metabolites will be detected at the start of the study and instead translate the raw analytical data to more biologically meaningful data during the study by chemically identifying the chemical structure of each metabolite, for example to define that the metabolite is glucose or a lipid. The conversion of the data to a metabolite is required so to derive biological conclusions, no metabolite identification will result in no biological information being reported. Many metabolomic studies are made available to all of the scientific community in data repositories. One data repository is located in the UK and is called MetaboLights and the other large data repository is located in the USA and is called the Metabolomics Workbench. These two data repositories contain information from nearly 2000 metabolomic studies performed on microbes, plants and mammals including humans. Across all studies up to 89% of all detected metabolites are not identified, have no chemical structure assigned to them and so significant levels of information from which biological knowledge can not be derived are present. The project to be performed will construct a computational workflow to assign chemical structures to the majority of metabolites in datasets already present in both of these data repositories and also to be applied to all future datasets deposited to the repositories. On completion, the biological information available in the two data repositories will be greatly expanded and will allow further biological information to be derived.

Technical Summary

Metabolomics is a commonly applied scientific tool used to study the qualitative and quantitative changes in the metabolite composition of biological samples in relation to agriculture, biotechnology, human health and ageing. The chemical or structural identification of metabolites is a crucial step in untargeted metabolomic studies but is currently limited by our knowledge of the parts lists of metabolomes, by our understanding of how metabolites are chemically identified and the availability of authentic chemical standards to derive data for metabolite identification. Even though our understanding of metabolome compositions and metabolite detection is increasing, metabolite identification remains the rate limiting step in untargeted metabolomic studies. This is emphasised when investigating data present in metabolomic data repositories, of which the two largest repositories are MetaboLights and the Metabolomics Workbench. On investigating the number of features detected in liquid chromatography-mass spectrometry datasets deposited in MetaboLights, 89% were structurally unidentified, demonstrating the large volumes of accessible biological data available to all scientists globally which can not be applied in further translation to biological conclusions. The proposed research will develop a computational workflow to annotate metabolites present in untargeted LC-MS and NMR datasets and will apply the integrated computational workflow to all LC-MS and NMR untargeted metabolomics datasets submitted to MetaboLights and Metabolomics Workbench (approximately 2000 datasets). The research team will also disseminate the open access computational tool and will develop and release open access training courses for operation of the computational workflows.

Planned Impact

There will be a number of direct and indirect benefits observed by academic and industrial research groups, commercial industrial companies and the research staff employed for the proposed research. Many national and international academic groups and businesses will benefit from the publicly accessible datasets with significantly increased numbers of metabolites which are identified. These include:
1. Academic researchers performing non-targeted metabolomics using LC-MS and NMR. The resource developed will benefit
research in to microbes, plants and animals in areas including synthetic biology, crop production and human ageing in two different ways (i) an open access computational resource which will be available to all researchers globally and (ii) access to approximately 1900 currently deposited metabolomic datasets with enriched numbers of annotated metabolites.
2. Industry scientists performing metabolism research who can benefit in the same ways as for academic researchers in applications underlying the production of pharmaceuticals and chemicals and in improved crop production, as examples.
3. Government agencies in the UK performing metabolism research who can benefit in the same ways as for academic researchers. For example, the Department for Environment, Food and Rural Affairs in the UK who through the FERA facility apply non-targeted metabolomics for food safety and food authenticity testing and crop protection.
4. Commercial instrument suppliers, specifically those supplying mass spectrometers and nuclear magnetic resonance spectrometers as the resource will be applicable to a range of different analytical platforms from different commercial instrument suppliers.
5. Post-doctoral research associates employed during the research through training in different scientific disciplines and through personal and organisational development.

We will apply multiple approaches to engage with the metabolomics community and disseminate the resource including:
Strategy 1. Feedback during design process. To ensure feedback from the community is included in the design process we will hold two events for the metabolomics community to provide feedback. The first event will be an online interactive webinar session as a forum scheduled for months 9 and 14 and will use Zoom to operate. WD has significant experience of operating these events for online training course at UB where the work will be presented and feedback collected and acted upon. The second event will be a face-to-face workshop at the international Metabolomics Society annual conference in June 2021 where the work will be presented and feedback collected and acted upon.
Strategy 2. Dissemination and training will be an essential output from the proposed research to provide awareness training of the new software workflow and its applications as well as training on how to use the new software package. The workflow will be hosted at EMBL-EBI in MetaboLights Labs and will include its own website with SOP, example data and training videos/online tutorials. Dissemination will be applied through publication of new and innovative components of the workflows and an application-based publication demonstrating the workflows operability. Dissemination will also be performed through workshop presentations and oral/poster presentations at national (e.g. MetaboMeeting) and international (e.g. Metabolomics Society annual meeting). Two different routes of training will be operated. The first will be inclusion in face-to-face training courses operating at the Birmingham Metabolomics Training Centre (WD is Director), the Imperial International Phenome Training Centre (TE is co-director) and EMBL-EBI (operated through CO). Examples of courses where the workflow will be demonstrated include three face-to-face courses (One each of UB, ICL and EMBL-EBI) and one online SPOC course (UB).

Publications

10 25 50
 
Description 2020BBSRC-NSF/BIO: Linking Mass Spectrometry Computational Ecosystems to Enhance Biological Insights of Publicly-Available Data
Amount £638,975 (GBP)
Funding ID BB/W002345/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 12/2024
 
Description Collaboration with Dr Tim Ebbels (Imperial College London) 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution My research team lead the project grant and its management as well as providing scientific input to the work being performed by the Ebbels research team.
Collaborator Contribution Dr Tim Ebbels is a co-investigator on this grant and collaborative research has further driven tool development for LC-MS data processinga nd metabolite annotation.
Impact None currently
Start Year 2020