📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Re-engineering ChEBI for a sustainable future

Lead Research Organisation: European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio

Abstract

Endogenous small molecules play important roles in regulating complex biological processes and, thus, life itself. Small molecules also serve as powerful tools, with wide-ranging applications in medicine (i.e. as drugs), the biological sciences and biotechnology. An ever-increasing number of novel compounds with a wide range of interesting and potentially useful properties are being identified from sources such as plants, fungi and microorganisms.

Small molecules are thus clearly of critical interest to the scientific community. However, many biologists lack the detailed expertise and knowledge to fully understand and appreciate the many complex and subtle aspects of small molecules, and in particular the many nuances associated with the accurate representation of chemical structures. A further challenge is that the same small molecule will often be referenced by multiple names and synonyms in the scientific literature and in databases. To take one very simple example, the non-steroidal anti-inflammatory drug aspirin is also referred to as acetylsalicylic acid, 2-(acetyloxy)benzoic acid and o-acetylsalicylic acid among many other synonyms. This complexity and ambiguity is a significant obstacle and can lead to wasted effort, inaccurate results and misleading conclusions. The Chemical Entities of Biological Interest (ChEBI) database acts as a reliable and trusted resource that provides "definitive" information about small molecules, thereby delivering a solution to many of these challenges. ChEBI provides biological, chemical and semantic information for small chemical compounds relevant in biology to the community. ChEBI also creates for each distinct molecular structure a stable and unchanging identifier, which is used by multiple other resources to definitively identify that specific compound, much as a grid reference unambiguously identifies a specific location on the earth's surface. In addition, ChEBI incorporates standard naming systems from global bodies such as the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). All of the information and data in ChEBI is freely available and downloadable without restriction. For these reasons, ChEBI is very widely used as a small molecule reference database by a number of leading biomedical databases. ChEBI is also used by a very large number of users who access its information via the public web site.

The aim of our proposal is to ensure the continued availability and growth of this critical resource for the bioscience community. ChEBI was originally developed in 2004 and as a consequence its underlying computer code is now out-of-date and increasingly difficult to support and maintain. Indeed, there is a growing risk that it will in the near future become incompatible with current computer systems. We therefore propose to completely overhaul and modernise the ChEBI infrastructure, code base and associated software tools. A new user-friendly website will be developed which will enable users to search, retrieve and download data. Advanced users will benefit from the superior programmatic access mechanisms to the data. We will develop a new annotation, curation and submission tool that will improve the overall efficiency of our expert ChEBI curators, for example by automating a number of currently time-consuming manual processes. This will reduce the time and effort required to create new entries. This tool will also benefit users who submit entries to ChEBI by significantly streamlining the submissions process. Our project will enable ChEBI to benefit from recent advances in software development techniques and deliver the new infrastructure platform, critical to enabling ChEBI to continue to fulfil the critical role it plays in the global bioinformatics community.

Technical Summary

ChEBI is a database and ontology containing information about chemical entities of biological interest. It is widely used as a 'small molecule' reference database by a number of leading global resources such as Gene Ontology, UniProt and Rhea, providing identifiers, structures and annotations to enable chemical entities to be unambiguously identified within biological databases, ontologies, models and the literature. ChEBI is also widely used through its public website and API as a rich source of information about small molecules. ChEBI is curated by human experts, and provides a reliable, non-redundant collection of chemical entities and related data such as detailed structure, synonyms, chemical formula, charge, molecular mass and links to external databases. Furthermore, ChEBI also contains an extensive ontology which enables the relationships between chemical entities to be defined on the basis of their shared chemical structure features together with their biological properties and roles.

Since its creation in 2004, ChEBI's software infrastructure has not undergone any major enhancements and is now significantly outdated, resulting in a large and growing maintenance burden. The overall goal of our project is to completely overhaul and modernise ChEBI's software infrastructure to enable ChEBI to continue to provide its critical service to the bioscience community. The work will be divided into four distinct work packages covering (1) the core database and web services, (2) more powerful and scalable searching capabilities using elastic and RDkit, (3) a new web interface and ontology visualisation tool and (4) a new suite of curator tools that will improve efficiency and enable a wider pool of curators to contribute to ChEBI. Documentation and training will be developed to enable users to benefit from these developments which will not only impact on ChEBI itself but also on a multitude of other global bioinformatics resources.

Publications

10 25 50
 
Title ChEBI 
Description Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds, which are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact ChEBI is a key component of multiple global biodata resources, which draw upon various aspects of the database including chemical structures, the ontology, molecule names and stable molecule identifiers. 
URL https://www.ebi.ac.uk/chebi/
 
Description EBI communications team collaboration with ChEBI 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution We asked the EBI's communications team to redesign the ChEBI website hompage and shared our design requirements and required content.
Collaborator Contribution New Logos, hero image, SVG icons were designed by EBI communications team for the new ChEBI website.
Impact The new ChEBI website has now been rebranded.
Start Year 2023
 
Description OBO community collaboration 
Organisation Northeastern University - Boston
Country United States 
Sector Academic/University 
PI Contribution Integrated RO and ChemROF into the new ChEBI ontology.
Collaborator Contribution Gave valuable suggestions and helped improve the ChEBI ontology (Created pull requests on GitLab)
Impact The relationships in the ChEBI ontology are now alligned with RO, and the annotation to ChemROF
Start Year 2023
 
Description Visual framework collaboration 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution We were one of the early users of this framework.
Collaborator Contribution EBI developed a open-source toolkit for life science websites and is responsible for its maintanance.
Impact The framework was used to develop ChEBI's new public interface.
Start Year 2023
 
Description 16th Annual International Biocuration Conference in Padua, Italy 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participtated in a pre-conference workshop (Functional impact of glycans and their curation) with the glycan community. This involved discussions with how each resource curated glycans and how we can learn from each other. Also presented a poster about ChEBI at the main conference. A conference paper was subsequently written and submitted to the Journal of Glycobiology (pending publication).
Year(s) Of Engagement Activity 2023
URL https://wiki.glygen.org/Glycan_Function_Workshop_2023
 
Description 19th Annual conference of the Metabolomics Society (Metabolomics 2023 in Niagara Falls, Canada) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a poster about ChEBI and involved with MetaboLights workshop (panel discussion).
Year(s) Of Engagement Activity 2023
URL https://www.metabolomics2023.org/
 
Description 2nd Ontologies4Chem Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited by NFDI4Chem in 2023, to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity 2023
URL https://www.nfdi4chem.de/event/2nd-ontologies4chem-workshop/
 
Description 3rd Ontologies4Chem workshop (Online) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented an update on efforts to migrate ChEBI to a more robust infrastructure.
Year(s) Of Engagement Activity 2024
URL https://www.nfdi4chem.de/3rd-ontologies4chem-workshop-2024/
 
Description ChEBI Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Organised a 1-day workshop at EMBL-EBI (hybrid event, with 24 in-person and 12 remote participants). The primary aim of the ChEBI workshop was to bring together stakeholders from major bioinformatics resources that rely on ChEBI, provide updates on ChEBI's redevelopment, receive feedback, and solicit further input.
Year(s) Of Engagement Activity 2024
URL https://drive.google.com/drive/folders/1XMlzZFXXm7styUhzOBgpWchiwLutQj4f
 
Description ChEMBL & SureChEMBL anniversary symposium (EMBL-EBI) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a poster highlighting ChEBI's redevelopment
Year(s) Of Engagement Activity 2024
URL https://www.eventsforce.net/embl/frontend/reg/thome.csp?pageID=96136&eventID=151&traceRedir=2
 
Description Chemistry Day (EMBL-EBI) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presented a talk about ChEBI and future plans at the Chemistry Day.
Year(s) Of Engagement Activity 2023
URL https://docs.google.com/document/d/1xbY9Qj8rndiIdfkKIb0Mk1FByGZILX8rur-bo_yTCVc/edit?tab=t.0#heading...
 
Description GCBR of the week social media campaign 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact ChEBI was featured in the GCBR of the week social media campaign, giving the general public/scientific community the chance to learn more about ChEBI. ChEBI's profile and animation was published on LinkedIn and X.
Year(s) Of Engagement Activity 2024
URL https://www.linkedin.com/pulse/chebi-global-biodata-coalition-jv3ae
 
Description IUPAC WorldFAIR Chemistry Webinar (Online) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a flash talk about ChEBI to the general chemistry community and participated in a panel discussion. The audience was a combination of researchers, publishers, data community, etc.
Year(s) Of Engagement Activity 2023
URL https://zenodo.org/records/7683072
 
Description MetFAIR working group (Online) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a talk about ChEBI at the MetFAIR - Reprodicible Reporting and Metabolite Annotation Task Group. The task group plans to write a white paper around metabolite annotation.
Year(s) Of Engagement Activity 2025
URL https://metabolomicssociety.org/board-committees/scientific-task-groups/
 
Description OBO Foundry Newsletter Issue 7 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Wrote a summary of the outcomes from the ChEBI workshop to inform/update the OBO (Open Biological and Biomedical Ontology) community.
Year(s) Of Engagement Activity 2025
URL http://obofoundry.org/newsletter/2025/01/23/7th-issue-newsletter.html
 
Description Ontologies4Chem workshop (Online) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited by NFDI4Chem to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity 2022
URL https://www.nfdi4chem.de/event/ontologies4chem-workshop/
 
Description Studying metabolites and small molecules with MetaboLights and ChEBI (EMBL-EBI online webinar) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a talk about ChEBI. The webinar was part of the molecular building blocks of life series open to anyone who was interested in studying small molecules.
Year(s) Of Engagement Activity 2023
URL https://www.ebi.ac.uk/training/events/studying-metabolites-and-small-molecules-metabolights-and-cheb...
 
Description Transformation workflow using dbt for ChEBI Database (EMBL's Scientific Workflow Club) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented a talk about the new database managememnt strategy for the new ChEBI and the great results we are getting using it.
Year(s) Of Engagement Activity 2024
URL https://oc.embl.de/index.php/s/LcMIsohL6XOgApR#/files_mediaviewer/2024-07-04-Transformation-workflow...