Re-engineering ChEBI for a sustainable future

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio

Abstract

Endogenous small molecules play important roles in regulating complex biological processes and, thus, life itself. Small molecules also serve as powerful tools, with wide-ranging applications in medicine (i.e. as drugs), the biological sciences and biotechnology. An ever-increasing number of novel compounds with a wide range of interesting and potentially useful properties are being identified from sources such as plants, fungi and microorganisms.

Small molecules are thus clearly of critical interest to the scientific community. However, many biologists lack the detailed expertise and knowledge to fully understand and appreciate the many complex and subtle aspects of small molecules, and in particular the many nuances associated with the accurate representation of chemical structures. A further challenge is that the same small molecule will often be referenced by multiple names and synonyms in the scientific literature and in databases. To take one very simple example, the non-steroidal anti-inflammatory drug aspirin is also referred to as acetylsalicylic acid, 2-(acetyloxy)benzoic acid and o-acetylsalicylic acid among many other synonyms. This complexity and ambiguity is a significant obstacle and can lead to wasted effort, inaccurate results and misleading conclusions. The Chemical Entities of Biological Interest (ChEBI) database acts as a reliable and trusted resource that provides "definitive" information about small molecules, thereby delivering a solution to many of these challenges. ChEBI provides biological, chemical and semantic information for small chemical compounds relevant in biology to the community. ChEBI also creates for each distinct molecular structure a stable and unchanging identifier, which is used by multiple other resources to definitively identify that specific compound, much as a grid reference unambiguously identifies a specific location on the earth's surface. In addition, ChEBI incorporates standard naming systems from global bodies such as the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). All of the information and data in ChEBI is freely available and downloadable without restriction. For these reasons, ChEBI is very widely used as a small molecule reference database by a number of leading biomedical databases. ChEBI is also used by a very large number of users who access its information via the public web site.

The aim of our proposal is to ensure the continued availability and growth of this critical resource for the bioscience community. ChEBI was originally developed in 2004 and as a consequence its underlying computer code is now out-of-date and increasingly difficult to support and maintain. Indeed, there is a growing risk that it will in the near future become incompatible with current computer systems. We therefore propose to completely overhaul and modernise the ChEBI infrastructure, code base and associated software tools. A new user-friendly website will be developed which will enable users to search, retrieve and download data. Advanced users will benefit from the superior programmatic access mechanisms to the data. We will develop a new annotation, curation and submission tool that will improve the overall efficiency of our expert ChEBI curators, for example by automating a number of currently time-consuming manual processes. This will reduce the time and effort required to create new entries. This tool will also benefit users who submit entries to ChEBI by significantly streamlining the submissions process. Our project will enable ChEBI to benefit from recent advances in software development techniques and deliver the new infrastructure platform, critical to enabling ChEBI to continue to fulfil the critical role it plays in the global bioinformatics community.

Technical Summary

ChEBI is a database and ontology containing information about chemical entities of biological interest. It is widely used as a 'small molecule' reference database by a number of leading global resources such as Gene Ontology, UniProt and Rhea, providing identifiers, structures and annotations to enable chemical entities to be unambiguously identified within biological databases, ontologies, models and the literature. ChEBI is also widely used through its public website and API as a rich source of information about small molecules. ChEBI is curated by human experts, and provides a reliable, non-redundant collection of chemical entities and related data such as detailed structure, synonyms, chemical formula, charge, molecular mass and links to external databases. Furthermore, ChEBI also contains an extensive ontology which enables the relationships between chemical entities to be defined on the basis of their shared chemical structure features together with their biological properties and roles.

Since its creation in 2004, ChEBI's software infrastructure has not undergone any major enhancements and is now significantly outdated, resulting in a large and growing maintenance burden. The overall goal of our project is to completely overhaul and modernise ChEBI's software infrastructure to enable ChEBI to continue to provide its critical service to the bioscience community. The work will be divided into four distinct work packages covering (1) the core database and web services, (2) more powerful and scalable searching capabilities using elastic and RDkit, (3) a new web interface and ontology visualisation tool and (4) a new suite of curator tools that will improve efficiency and enable a wider pool of curators to contribute to ChEBI. Documentation and training will be developed to enable users to benefit from these developments which will not only impact on ChEBI itself but also on a multitude of other global bioinformatics resources.

Publications

10 25 50
publication icon
Andrés-Hernández L (2022) Establishing a Common Nutritional Vocabulary - From Food Production to Diet. in Frontiers in nutrition

publication icon
Witting M (2024) Challenges and perspectives for naming lipids in the context of lipidomics. in Metabolomics : Official journal of the Metabolomic Society

 
Title ChEBI 
Description Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds, which are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact ChEBI is a key component of multiple global biodata resources, which draw upon various aspects of the database including chemical structures, the ontology, molecule names and stable molecule identifiers. 
URL https://www.ebi.ac.uk/chebi/
 
Description 16th Annual International Biocuration Conference in Padua, Italy 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participtated in a pre-conference workshop (Functional impact of glycans and their curation) with the glycan community. This involved discussions with how each resource curated glycans and how we can learn from each other. Also presented a poster about ChEBI at the main conference. A conference paper was subsequently written and submitted to the Journal of Glycobiology (pending publication).
Year(s) Of Engagement Activity 2023
URL https://wiki.glygen.org/Glycan_Function_Workshop_2023
 
Description 19th Annual conference of the Metabolomics Society (Metabolomics 2023 in Niagara Falls, Canada) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a poster about ChEBI and involved with MetaboLights workshop (panel discussion).
Year(s) Of Engagement Activity 2023
URL https://www.metabolomics2023.org/
 
Description 2nd Ontologies4Chem Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited by NFDI4Chem in 2023, to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity 2023
URL https://www.nfdi4chem.de/event/2nd-ontologies4chem-workshop/
 
Description IUPAC WorldFAIR Chemistry Webinar (Online) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a flash talk about ChEBI to the general chemistry community and participated in a panel discussion. The audience was a combination of researchers, publishers, data community, etc.
Year(s) Of Engagement Activity 2023
URL https://zenodo.org/records/7683072
 
Description Ontologies4Chem workshop (Online) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited by NFDI4Chem to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity 2022
URL https://www.nfdi4chem.de/event/ontologies4chem-workshop/
 
Description Studying metabolites and small molecules with MetaboLights and ChEBI (EMBL-EBI online webinar) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a talk about ChEBI. The webinar was part of the molecular building blocks of life series open to anyone who was interested in studying small molecules.
Year(s) Of Engagement Activity 2023
URL https://www.ebi.ac.uk/training/events/studying-metabolites-and-small-molecules-metabolights-and-cheb...