Next generation tools for the annotation of metabolites in global LC-MS metabolomic studies

Lead Research Organisation: University of Birmingham
Department Name: Sch of Biosciences


The metabolism of a living organism reacts rapidly and sensitively to environmental change, disease conditions or simply the organism's age. Capturing how metabolism and metabolites change provides an exquisite insight into the health status of an individual. The discipline of metabolomics seeks to describe the entire population of metabolites in a cell or tissue. Its key challenge is to identify sometimes thousands of different molecules simultaneously. With its unmatched precision and sensitivity, mass spectrometry has become the tool of choice in this context. However, this technique requires ionized metabolites, so they can be accelerated and analysed in an electro-magnetic field. While ionization techniques are well established, the diversity of charged molecular species generated in this process is poorly understood. As a result, many metabolites are not identified and only a fraction of the data a mass spectrometric experiment provides really informs the biological conclusions. For the first time we have enough assets in our toolbox to assemble and optimise into a new workflow; this is the TOOL we will construct, validate and apply to generate a new computational RESOURCE for the metabolomics community, a publicly accessible computational software for performing metabolite annotation and calculating the statistical probability that the identification is correct. We will make this available via the BBSRC-funded MetaboLights database to be supported long-term at the European Bioinformatics Institute in the UK. The new resource will be widely used, both nationally and internationally, by academic, government and industry scientists. All data will be free to access, training videos will be included and the resource will be widely publicised. This cost effective proposal will collectively develop a new TOOL and new RESOURCE, and by embedding it at the European Bioinformatics Institute will transform the metabolomics community's ability to transform data to new knowledge, allowing metabolomics to deliver on its promises to achieve impact.

Technical Summary

The chemical identification of metabolites is a crucial step in untargeted metabolomic studies but is currently limited by our understanding of the complex processes operating in electrospray ion sources when coupled with liquid chromatography-mass spectrometry. Metabolite identification remains the rate limiting step in metabolomics studies. Research by the investigators has shown that a large number of different adducts, isotopic, multiply charged and fragment peaks can be observed when analyzing crude biological extracts applying liquid chromatography-electrospray mass spectrometry. Here we propose to further develop, validate and make publicly available an integrated computational workflow (PUTMEDID) for the grouping annotation of each metabolite with a molecular formula and metabolite name(s) with a statistical score of confidence. We will (i) for the first time fully characterize the large diversity of adducts, isotopic, multiply charged and fragment peaks detected when analyzing crude biological extracts applying liquid chromatography-electrospray mass spectrometry; (ii) further develop computational tools to identify all adducts, isotopic, multiply charged and fragment peaks in a sample, instrument and analytical method specificity; (iii) enhance a currently available computational resource (PUTMEDID) to increase the number of true positive annotations. To allow the greatest impact the bioinformatics resource and associated code will be made available to researchers globally through incorporation in the well-maintained and stable computational infrastructure at the European Bioinformatics Institute in the UK. In summary we will DEVELOP A COMPUTATIONAL TOOL for enhanced metabolite annotation, APPLY the tool to construct an OPEN ACCESS RESOURCE for all researchers and DISSEMINATE AND TRAIN the scientific community. This innovative new approach will significantly enhance our capabilities to annotate all metabolites detected in metabolomics studies.

Planned Impact

There are many national and international groups who will benefit from the publicly accessible metabolite annotation bioinformatics resource to be constructed. These include
(i) Academic researchers performing non-targeted metabolomics using LC-MS. The resource developed will benefit research in to microbes, plants and animals in areas including synthetic biology, crop production and human ageing.
(ii) Industry scientists performing non-targeted metabolomics research with LC-MS. The resource developed will provide greater understanding of the metabolism underlying the production of pharmaceuticals and chemicals and in improved crop production.
(iii) Government agencies in the UK performing non-targeted metabolomics research with LC-MS platforms. For example, the Department for Environment, Food and Rural Affairs in the UK who through the FERA facility apply non-targeted metabolomics for food safety and food authenticity testing and crop protection.
(iv) Commercial instrument suppliers, specifically those supplying mass spectrometers as the resource will be applicable to a range of different mass spectrometers from different commercial instrument suppliers.
(vi) A post-doctoral research associate employed during the research through training in different scientific disciplines and through personal development.

There will be a number of direct or indirect benefits observed by academic and industrial research groups, commercial industrial companies, and the research staff employed for the proposed research. The first direct benefit will be the ability of academic and industrial research groups to perform higher quality biological research through increased abilities to annotate a larger number of metabolites in non-targeted metabolomics studies and therefore provide higher quality data for biological interpretation at the systems level. This will readily be achieved via the metabolite annotation bioinformatics resource. The ability of researchers to apply systems-level approaches to understand the interactions of metabolites with other metabolites and biochemicals and to be able to integrate large biochemical databases from holistic data acquisition at different functional levels is a growing requirement in biological research. The second benefit will be to provide a commercial impact through higher quality biological research performed in industry and which increases the efficiency of production of chemicals, pharmaceuticals and crops. The third benefit will be to provide a further commercial impact, specifically to mass spectrometer suppliers through their improved ability to annotate metabolites in biological studies and the impact this will have on mass spectrometer sales versus, for example, the relative decline in the use of NMR spectroscopy over the last few years. The public will indirectly benefit from the developed resource through higher quality biological research and the impact on improvements in crop and drug production and in our understanding of healthy ageing. Finally, the post-doctoral researcher and investigators performing the proposed research will benefit from training in new concepts in a multi-disciplinary environment.

Mechanisms for dissemination to scientists and to the general public are described in our Pathways to Impact document.
Description The software has been developed to allow scientists to identify metabolites in biological samples including blood and urine. Scientific data is collected to be analysed by the software and this uses an instrument called electrospray mass spectrometry. The electrospray component of the mass spectrometer is like a chemical reactor and many different chemical forms of metabolites can be formed. Through research in this grant we have discovered that the number of differenttypes of chemical forms is much larger than we thought. By using these new data we have increased the accuracy of the software for metabolite identification.
Exploitation Route The findings related to the larger number of metabolite chemical forms will be published in the next 12 month and all results made available to the global metabolomics community for inclusion in other software packages.
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

Description Standardised metabolite annotation workflows for enhanced biological interpretation in metabolomic data repositories
Amount £770,000 (GBP)
Funding ID BB/T007974/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2023
Title BEAMS: Birmingham mEtabolite Annotation for Mass Spectrometry 
Description This computational research tool has three functions (1) grouping of metabolite features in untargeted LC-MS datasets in to metabolite-focussed groups; (2) calculation of the molecular formula for the neutral metabolite for each group and (3) matching of the molecular formula to metabolites available in multiple open access databases. The tool is currently being tested and will be made available in Spring 2019. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? No  
Impact Currently no notable impacts 
Description Collaboration with Thermo Fisher Scientific to develop new metabolomic assays and software for metabolic phenotyping 
Organisation Thermo Fisher Scientific
Country United States 
Sector Private 
PI Contribution 1. Development and validation and UPLC-MS assays for untargeted metabolomics. 2. Sharing of developed assays for distribution by Thermo Fisher Scientific. 3. Development of new metabolite annotation software. 4. Development of optimal approaches for the collection of MS/MS data in metabolic phenotyping.
Collaborator Contribution 1. Significant reduction in purchase costs of instruments (45% discount). 2. Early beta testing of new software and scientific instruments. 3. Priority engineer visits for scientific instruments.
Impact 1. Loan of a UPLC-MS instrument to the University of Birmingham to be applied for training courses and for assay development work. 2. This collaboration is multi-disciplinary and includes bioinformatics, analytical chemistry and clinical research.
Start Year 2013
Description EMBL-EBI hosting of software developed 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution My research team are developing a software for metabolite annotation of liquid chromatography-mass spectrometry datasets acquired applying metabolomic approaches. The research team is currently developing the software and will test and validate the software. Additional functions are being developed outside those defined in the grant proposal including new approaches to report confidence of accurate matching to metabolites.
Collaborator Contribution The EMBL-EBI team will host the software as a web-based and accessed tool for the metabolomics comunity as part of the MetaboLights data repository.
Impact The collaboration is multi-disciplinary (analytical chemistry, metabolomics, bioinformatics)
Start Year 2016
Title BEAMS 
Description The BEAMS (Birmingham mEtabolite Annotation for Mass Spectrometry) package includes several automated and seamless computational modules that are applied to putatively annotate metabolites detected in untargeted ultra (high) performance liquid chromatography-mass spectrometry or untargeted direct infusion mass spectrometry metabolomic assays in a single and automated process. The package is highly flexible to suit the diversity of sample types studied and mass spectrometers applied in untargeted metabolomics studies. The user can use the standard reference files included in the package or can develop their own reference files. The package is available to be used by computational experts (in python and Galaxy) and by laboratory researchers who are not experts in bioinformatics (using a GUI interface) 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact No impacts currently 
Description Presentation at the first Cambrisge Metabolomics Forum 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The Cambrisge Metabolomics Forum held a two day event to promote metabolomics to the research community in the UK and Europe including good practices and impact science case studies. I presented a 15 minute talk on metabolite identification in metabolomics including tools and resources (associated with this grant). A 30 minute open room discussion followed which included best practices, difficulties in metabolite identification and reporting standards.
Year(s) Of Engagement Activity 2017
Description Scientific conference - Gordon Research Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Oral presentation at scientific conference
Year(s) Of Engagement Activity 2019
Description Scientific conference presentation - Metabolomics 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Oral scientific presentation at the Metabolomics 2018 conference
Year(s) Of Engagement Activity 2018
Description Training courses including metabolite identification 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The Birmingham Metabolomics Training Centre operate a number of face-to-face hands-on courses available to scientific practioners from across the world. We operate three courses which includes training in metabolite identification and which applies the software developed. One of these courses is focussed solely on metabolite identification and operates across 2 days.
Year(s) Of Engagement Activity 2016,2017