2020BBSRC-NSF/BIO: Linking Mass Spectrometry Computational Ecosystems to Enhance Biological Insights of Publicly-Available Data

Lead Research Organisation: University of Liverpool
Department Name: Biochemistry & Systems Biology


All biological systems are composed of important chemicals which help the systems grow, respond to the environment and communicate. The genome is composed of DNA which carries genetic information required for reproduction, growth and development and whose composition remains relatively static throughout a lifetime. The metabolome is composed of metabolites derived from food (e.g. sugars) and the environment (e.g. prescribed drugs) and whose composition is dynamic in relation to which metabolites are present and at what concentration. The scientific research community studies metabolites to understand how we metabolise food and drugs, how we respond dynamically to the environment and also to identify metabolites which are important for a biological system to function. Many of these investigations are discovery studies which apply a scientific technique called metabolomics that can detect hundreds or low thousands of different metabolites in a biological sample. These studies discover new metabolites which then have to be analysed to understand their chemical structure which is essential for biological interpretation.
Many metabolomic studies performed across the world are being released to the scientific community so that these data can be re-used and re-analysed to derive new biological information. The data from these studies are stored in data repositories and two examples of these are MetaboLights in the UK and GNPS in the USA. However, many of the metabolites detected and reported in these data repositories do not have a chemical identity assigned to them. Therefore there is an essential requirement to assign chemical structures to as many metabolites as is possible so that the data can be reused and biological information derived from the data. The planned project will further develop and apply computational approaches to all data deposited in MetaboLights and GNPS. This will allow more metabolites to be assigned a chemical identity and for the confidence that the correct structure has been assigned to be increased. When completed, the volume and quality of biological information available in the deposited datasets will be much greater and will allow new research questions to be asked and answered without the need to collect new data.

Technical Summary

The chemicals present in a biological system play many important roles including in reproduction, growth and survival. DNA contained in the genome provides a recipe for reproduction and is relatively static across an organism's lifetime. In comparison, metabolites contained in the metabolome are very dynamic in their presence and concentration in response to perturbations from within the organism or in response to external stimuli. Many studies to investigate the dynamics of metabolites apply a discovery-based approach called metabolic phenotyping (or metabolomics) where a chemical assay is applied to collect as great a volume of biological information as possible. However, the chemical structure or biological identity of many metabolites are not known prior to data collection and have to be derived from the data collected. This process of metabolite annotation is a significant hurdle because we do not yet have reference metabolomes and many metabolites are unavailable as chemical standards from which libraries metabolite identification can be applied. Many of the data deposited in metabolomics data repositories do not have an assigned chemical structure or metabolite name and therefore biological knowledge cannot be derived from these data. In the proposed research we will develop new open access computational tools and integrate these with existing open access tools to significantly increase the number of metabolites identified in two data repositories, MetaboLights in the UK and GNPS in the USA,. It will also establish common data standards for capturing and curating metabolomics data and for data exchange between the repositories. These approaches will increase the confidence in the annotations provided and greatly enhance the reusability of these data by the global scientific community.